--- category: academic type: academic person: Yanxin Lu date: 2016-01 source: codecomplete_spring2016.pptx --- # COMP 600 Spring 2016: Data Driven Program Completion Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Drew Dehaas, Vineeth Kashyap, and David Melski. Presented by Yanxin Lu. 29 slides. ## Slide 2: Title Data Driven Program Completion ## Slide 3–4: Programming is difficult - Longest Common Subsequence example ## Slide 5: Program Synthesis - Automatically generating programs - Specification: logic formula, unit testing, natural language ## Slide 6: Related work - Deductive and solver-aided synthesis - Constraint-based synthesis: syntax-guided synthesis, Sketching, Template - Inductive synthesis: input-output examples ## Slide 7: Big data - GitHub, SourceForge, Google Code, StackOverflow ## Slide 8–9: Summary - Data-driven program completion, corpus and Pliny database, synthesis algorithm, initial experiment and future work ## Slide 10–11: Program completion - Sketch + programs in DB + test cases - LCS example: LCS("123", "123") = "123", LCS("123", "234") = "23" ## Slide 12–13: Workflow - Synthesis ↔ PDB - Incomplete program → query → programs → completed program ## Slide 14: PDB - Thousands of programs with features, similarity metrics - Fast top-k query: 1-2 orders of magnitude faster than no-SQL systems ## Slide 15: Corpus - 100,000+ projects, C/C++/Java - 50GB source code, 480+ C projects ## Slide 16: Feature Extraction - Names: X, s, n, j, Y, index, lcs - TF/IDF: "charact": 0.158, "reduc": 0.158, "result": 0.316, "lc": 0.791, "index": 0.316 ## Slides 18–21: Synthesis Algorithm - Search PDB for similar programs - Fill holes via enumerative search - Merge undefined variables - Test to filter incorrect programs ## Slides 22–24: Heuristics - Types: ignore incompatible types - Context: ignore expressions with no common parents - Huge search space reduction ## Slides 25–26: Initial experiment and future work - LCS: less than 10 seconds - Future work: more benchmarks, closure, search PDB using types ## Slides 27–28: Program repair - Use PDB to find most similar correct program - Bug localization → holes → completion ## Slide 29: Conclusion - Program Completion: no more copy and paste, focus on important tasks