Files
obsidian-yanxin/documents/academic/presentations/splicing_comp600_slides_2018.md
Yanxin Lu b85169f4e7 Archive 10 academic presentations from ~/Downloads/slides/ (2014-2018)
- PhD defense slides (defense.key, Nov 2018) → phd_defense/
- Master's defense on MOOC peer evaluation (Dec 2014)
- ENGI 600 data-driven program repair (Apr 2015)
- COMP 600 data-driven program completion (Fall 2015, Spring 2016)
- COMP 600 Program Splicing presentation + feedback + response (Spring 2018)
- Program Splicing slides in .key and .pdf formats (Spring 2018)

Each file has a .md transcription with academic frontmatter.
Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
2026-04-06 12:00:27 -07:00

96 lines
3.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
category: academic
type: academic
person: Yanxin Lu
date: 2018-04
source: splicing_comp600_slides_2018.pdf
---
# Program Splicing — COMP 600 Slides (PDF)
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. Presented by Yanxin Lu. 31 slides.
PDF export of COMP 600 presentation on Program Splicing. This is an earlier version of the presentation (title slide says "Presented by Yanxin Lu"). See also splicing_comp600_2018.pdf for a slightly revised version with subtitle "Data-driven Program Synthesis".
## Slide 2: Title
Program Splicing — Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. Presented by Yanxin Lu.
## Slide 3: Copying and Pasting
- Problem: developers search online, copy code, adapt it — time consuming and bugs introduced
## Slide 4: Program Synthesis
- Automatically generating programs
- Specification: logic formula, unit testing, natural language
- Correctness
## Slide 5: Problem
Can we use program synthesis to help the process of copying and pasting?
## Slide 6: Related work
- Sketch (FMCAD 2013) — cannot synthesize statements, does not use a code database
- Code Transplantation (ISSTA 2015) — not efficient, does not search for relevant code snippets
## Slide 7: Program Splicing
- Use a large corpus of over 3.5 million programs
- Automate the process of copying and pasting
- Ensure correctness
## Slide 8: Summary
- Architecture (corpus and Pliny database, synthesis algorithm)
- Experiment
- Conclusion
## Slide 910: Architecture
- User provides draft program → Synthesis queries PDB → Top-k relevant programs → Completed program
## Slide 11: PDB
- 3.5 million Java programs with features from GitHub, SourceForge
- Natural language terms: "read": 0.10976, "matrix": 0.65858, ...
- Similarity metrics, fast top-k query (1-2 orders of magnitude faster than no-SQL)
## Slide 13: Relevant programs
- Draft program with holes + COMMENT/REQ specification → PDB returns similar programs
## Slides 1416: Filling in the holes
- Enumerative search: try candidate expressions from relevant programs
- Progressive selection of code fragments
## Slide 17: Variable Renaming
- Resolve undefined variables by mapping from relevant program's variables
## Slide 18: Testing
- Filter out incorrect programs using unit tests
## Slide 1920: Benchmark
| Benchmark | Synthesis Time (s) | LOC | Var | Holes (expr-stmt) | Test | uScalpel |
|---|---|---|---|---|---|---|
| Sieve Prime | 4.6 | 12-17 | 2 | 2-1 | 3 | 162.1 |
| Collision Detection | 4.2 | 10-15 | 2 | 2-1 | 4 | N/A |
| Collecting Files | 3.0 | 13-25 | 2 | 1-1 | 2 | timeout |
| Binary Search | 15.4 | 12-20 | 5 | 1-1 | 3 | timeout |
| HTTP Server | 41.1 | 24-45 | 6 | 1-2 | 2 | N/A |
| Prim's Distance Update | 61.1 | 53-58 | 11 | 1-1 | 4 | timeout |
| Quick Sort | 77.2 | 11-18 | 6 | 1-1 | 1 | timeout |
| CSV | 88.4 | 13-23 | 4 | 1-2 | 2 | timeout |
| Matrix Multiplication | 108.9 | 13-15 | 8 | 1-1 | 1 | timeout |
| Floyd Warshall | 110.4 | 9-12 | 7 | 1-1 | 7 | timeout |
| HTML Parsing | 140.4 | 20-34 | 5 | 1-2 | 2 | N/A |
| LCS | 161.5 | 29-36 | 10 | 0-1 | 1 | timeout |
Synthesis algorithm is efficient. No need to write many tests.
## Slides 2226: User study
- 12 graduate students and 6 professionals
- Web-based programming environment
- 4 programming problems (2 with splicing, 2 without)
- Internet search encouraged
- Results: splicing reduced time for algorithmic tasks (sieve, files)
- Sieve: appears simple but was not (deceptively simple)
- Files/CSV: no standard solutions — splicing helps most
- HTML: good documentation and tests were hard to write
## Slide 27: Conclusion
- Data-driven program synthesis using large code corpus
- Enumerative search
- User study: good for tasks without standard solutions