--- category: academic type: academic person: Yanxin Lu date: 2015-08 source: comp600_fall2015.pptx --- # COMP 600 Fall 2015: Data Driven Program Completion Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Vijayaraghavan Murali. Presented by Yanxin Lu. 26 slides. ## Slide 2: Title Data Driven Program Completion ## Slide 3: Programming is difficult ## Slide 4: Program Synthesis - Automatically generating programs - Specification: logic formula, unit testing, natural language - Hard problem! ## Slide 5: Related work - Deductive and solver-aided synthesis (IEEE Trans. Software Eng. 18(8), PLDI 2014) - Constraint-based synthesis: syntax-guided synthesis (FMCAD 2013), Sketching (ASPLOS 2006), Template (STTT 15(5-6)) - Inductive synthesis: input-output examples (POPL 2011, PLDI 2015) ## Slide 6: Big data - GitHub, SourceForge, Google Code, StackOverflow ## Slide 7: Summary - Data-driven program completion, demo, corpus and Pliny database, synthesis algorithm, initial experiment and future work, program repair ## Slide 8: Program Completion - A subset of C ## Slide 9: Demo ## Slide 10–11: Architecture - Synthesis ↔ PDB (Pliny Database) - Incomplete program → query → top-k similar programs → completed program ## Slide 12: PDB - Thousands of programs with features - Similarity metrics, fast top-k query - 1-2 orders of magnitude faster than no-SQL database systems (Chris Jermaine) ## Slide 13: Corpus - More than 100,000 projects from GitHub, SourceForge, Google Code - C, C++, Java - Preprocessing: 50GB source code, 480+ projects, C ## Slide 14: Feature Extraction - Lightweight program analysis capturing characteristics - Abstract Structural Skeleton: (seq (loop (seq (cond ())))) - Coupling: ('int', 'c:unary-'), ('int', 'c:/'), ('int*', 'c:+'), etc. ## Slides 16–19: Synthesis Algorithm - Finding similar programs from PDB - Filling in the holes via search - Variable renaming for undefined variables - Unit testing to filter incorrect programs ## Slides 20–22: Heuristics - Types: ignore expressions with incompatible types - Context: ignore expressions with no common parents - Huge search space reduction ## Slide 23: Initial experiment and future work - Binary search: less than 10 seconds - Future work: more benchmark problems, performance increase ## Slides 24–25: Program Repair - Use PDB to find most similar correct program - Bug localization → program completion problem ## Slide 26: Conclusion - Program completion + program repair using big data + programming languages