obsidian-yanxin/documents/academic/presentations/splicing_comp600_2018_pdf.md

---
category: academic
type: academic
person: Yanxin Lu
date: 2018-05
source: splicing_comp600_2018.pdf
---

# Program Splicing — COMP 600 Spring 2018 (PDF Export)

Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. 31 slides.

PDF export of the Keynote presentation splicing_comp600_2018.key. Title: "Program Splicing: Data-driven Program Synthesis".

This is a revised version of the earlier splicing_comp600_slides_2018.pdf. Key differences:
- Title slide has subtitle "Data-driven Program Synthesis" (vs just "Presented by Yanxin Lu")
- Adds "Efficient relevant code retrieval" and "KNN search" to PDB slide
- Adds "Programming time" to user study setup
- User study result slides titled differently: "Deceptively simple", "No standard solutions", "Good documentations and tests were hard to write"
- Conclusion adds "Efficient algorithm", "Fast code reuse", "Easy to test", "Future work: synthesis algorithm improvement"

## Slide 2: Title
Program Splicing: Data-driven Program Synthesis

## Slides 3–7: Motivation and Approach
- Copying and pasting is time consuming and introduces bugs
- Program synthesis: automatically generate programs from specifications
- Problem: can we use program synthesis to improve copying and pasting?
- Related work: Sketching (PLDI 2005), Code Transplantation (ISSTA 2015)
- Program Splicing: automate process, large corpus (3.5M programs), ensure correctness

## Slide 8: Demo
- How does a programmer use program splicing?

## Slides 9–12: Architecture
- User → draft program → Synthesis ↔ PDB → completed program
- PDB: efficient relevant code retrieval, 3.5M Java programs, NL features, similarity metrics, KNN search, fast top-k query

## Slides 13–18: Synthesis Algorithm
- Find relevant programs from PDB
- Fill holes via enumerative search
- Variable renaming for undefined variables
- Testing to filter incorrect programs

## Slides 19–20: Benchmark
Same benchmark table as the earlier version. Efficient synthesis algorithm highlighted.

## Slide 21: No need to write many tests

## Slides 22–26: User study
- 18 participants, 4 problems, programming time measured
- Sieve: deceptively simple
- Files/CSV: no standard solutions — splicing most helpful
- HTML: good documentation and tests were hard to write

## Slide 27: Conclusion
- Program Splicing: large code corpus, enumerative search, efficient algorithm
- Fast code reuse: no standard solutions, easy to test
- Future work: synthesis algorithm improvement

## Slides 29–31: Appendix (Heuristics)
- Type-based pruning: ignore incompatible types
- Context-based pruning: ignore expressions with no common parents
- Huge search space reduction