--- category: academic type: academic person: Yanxin Lu date: 2018-05 source: splicing_comp600_2018.pdf --- # Program Splicing — COMP 600 Spring 2018 (PDF Export) Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. 31 slides. PDF export of the Keynote presentation splicing_comp600_2018.key. Title: "Program Splicing: Data-driven Program Synthesis". This is a revised version of the earlier splicing_comp600_slides_2018.pdf. Key differences: - Title slide has subtitle "Data-driven Program Synthesis" (vs just "Presented by Yanxin Lu") - Adds "Efficient relevant code retrieval" and "KNN search" to PDB slide - Adds "Programming time" to user study setup - User study result slides titled differently: "Deceptively simple", "No standard solutions", "Good documentations and tests were hard to write" - Conclusion adds "Efficient algorithm", "Fast code reuse", "Easy to test", "Future work: synthesis algorithm improvement" ## Slide 2: Title Program Splicing: Data-driven Program Synthesis ## Slides 3–7: Motivation and Approach - Copying and pasting is time consuming and introduces bugs - Program synthesis: automatically generate programs from specifications - Problem: can we use program synthesis to improve copying and pasting? - Related work: Sketching (PLDI 2005), Code Transplantation (ISSTA 2015) - Program Splicing: automate process, large corpus (3.5M programs), ensure correctness ## Slide 8: Demo - How does a programmer use program splicing? ## Slides 9–12: Architecture - User → draft program → Synthesis ↔ PDB → completed program - PDB: efficient relevant code retrieval, 3.5M Java programs, NL features, similarity metrics, KNN search, fast top-k query ## Slides 13–18: Synthesis Algorithm - Find relevant programs from PDB - Fill holes via enumerative search - Variable renaming for undefined variables - Testing to filter incorrect programs ## Slides 19–20: Benchmark Same benchmark table as the earlier version. Efficient synthesis algorithm highlighted. ## Slide 21: No need to write many tests ## Slides 22–26: User study - 18 participants, 4 problems, programming time measured - Sieve: deceptively simple - Files/CSV: no standard solutions — splicing most helpful - HTML: good documentation and tests were hard to write ## Slide 27: Conclusion - Program Splicing: large code corpus, enumerative search, efficient algorithm - Fast code reuse: no standard solutions, easy to test - Future work: synthesis algorithm improvement ## Slides 29–31: Appendix (Heuristics) - Type-based pruning: ignore incompatible types - Context-based pruning: ignore expressions with no common parents - Huge search space reduction