Archive 10 academic presentations from ~/Downloads/slides/ (2014-2018)
- PhD defense slides (defense.key, Nov 2018) → phd_defense/ - Master's defense on MOOC peer evaluation (Dec 2014) - ENGI 600 data-driven program repair (Apr 2015) - COMP 600 data-driven program completion (Fall 2015, Spring 2016) - COMP 600 Program Splicing presentation + feedback + response (Spring 2018) - Program Splicing slides in .key and .pdf formats (Spring 2018) Each file has a .md transcription with academic frontmatter. Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
This commit is contained in:
@@ -0,0 +1,95 @@
|
||||
---
|
||||
category: academic
|
||||
type: academic
|
||||
person: Yanxin Lu
|
||||
date: 2018-04
|
||||
source: splicing_comp600_slides_2018.pdf
|
||||
---
|
||||
|
||||
# Program Splicing — COMP 600 Slides (PDF)
|
||||
|
||||
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. Presented by Yanxin Lu. 31 slides.
|
||||
|
||||
PDF export of COMP 600 presentation on Program Splicing. This is an earlier version of the presentation (title slide says "Presented by Yanxin Lu"). See also splicing_comp600_2018.pdf for a slightly revised version with subtitle "Data-driven Program Synthesis".
|
||||
|
||||
## Slide 2: Title
|
||||
Program Splicing — Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. Presented by Yanxin Lu.
|
||||
|
||||
## Slide 3: Copying and Pasting
|
||||
- Problem: developers search online, copy code, adapt it — time consuming and bugs introduced
|
||||
|
||||
## Slide 4: Program Synthesis
|
||||
- Automatically generating programs
|
||||
- Specification: logic formula, unit testing, natural language
|
||||
- Correctness
|
||||
|
||||
## Slide 5: Problem
|
||||
Can we use program synthesis to help the process of copying and pasting?
|
||||
|
||||
## Slide 6: Related work
|
||||
- Sketch (FMCAD 2013) — cannot synthesize statements, does not use a code database
|
||||
- Code Transplantation (ISSTA 2015) — not efficient, does not search for relevant code snippets
|
||||
|
||||
## Slide 7: Program Splicing
|
||||
- Use a large corpus of over 3.5 million programs
|
||||
- Automate the process of copying and pasting
|
||||
- Ensure correctness
|
||||
|
||||
## Slide 8: Summary
|
||||
- Architecture (corpus and Pliny database, synthesis algorithm)
|
||||
- Experiment
|
||||
- Conclusion
|
||||
|
||||
## Slide 9–10: Architecture
|
||||
- User provides draft program → Synthesis queries PDB → Top-k relevant programs → Completed program
|
||||
|
||||
## Slide 11: PDB
|
||||
- 3.5 million Java programs with features from GitHub, SourceForge
|
||||
- Natural language terms: "read": 0.10976, "matrix": 0.65858, ...
|
||||
- Similarity metrics, fast top-k query (1-2 orders of magnitude faster than no-SQL)
|
||||
|
||||
## Slide 13: Relevant programs
|
||||
- Draft program with holes + COMMENT/REQ specification → PDB returns similar programs
|
||||
|
||||
## Slides 14–16: Filling in the holes
|
||||
- Enumerative search: try candidate expressions from relevant programs
|
||||
- Progressive selection of code fragments
|
||||
|
||||
## Slide 17: Variable Renaming
|
||||
- Resolve undefined variables by mapping from relevant program's variables
|
||||
|
||||
## Slide 18: Testing
|
||||
- Filter out incorrect programs using unit tests
|
||||
|
||||
## Slide 19–20: Benchmark
|
||||
| Benchmark | Synthesis Time (s) | LOC | Var | Holes (expr-stmt) | Test | uScalpel |
|
||||
|---|---|---|---|---|---|---|
|
||||
| Sieve Prime | 4.6 | 12-17 | 2 | 2-1 | 3 | 162.1 |
|
||||
| Collision Detection | 4.2 | 10-15 | 2 | 2-1 | 4 | N/A |
|
||||
| Collecting Files | 3.0 | 13-25 | 2 | 1-1 | 2 | timeout |
|
||||
| Binary Search | 15.4 | 12-20 | 5 | 1-1 | 3 | timeout |
|
||||
| HTTP Server | 41.1 | 24-45 | 6 | 1-2 | 2 | N/A |
|
||||
| Prim's Distance Update | 61.1 | 53-58 | 11 | 1-1 | 4 | timeout |
|
||||
| Quick Sort | 77.2 | 11-18 | 6 | 1-1 | 1 | timeout |
|
||||
| CSV | 88.4 | 13-23 | 4 | 1-2 | 2 | timeout |
|
||||
| Matrix Multiplication | 108.9 | 13-15 | 8 | 1-1 | 1 | timeout |
|
||||
| Floyd Warshall | 110.4 | 9-12 | 7 | 1-1 | 7 | timeout |
|
||||
| HTML Parsing | 140.4 | 20-34 | 5 | 1-2 | 2 | N/A |
|
||||
| LCS | 161.5 | 29-36 | 10 | 0-1 | 1 | timeout |
|
||||
|
||||
Synthesis algorithm is efficient. No need to write many tests.
|
||||
|
||||
## Slides 22–26: User study
|
||||
- 12 graduate students and 6 professionals
|
||||
- Web-based programming environment
|
||||
- 4 programming problems (2 with splicing, 2 without)
|
||||
- Internet search encouraged
|
||||
- Results: splicing reduced time for algorithmic tasks (sieve, files)
|
||||
- Sieve: appears simple but was not (deceptively simple)
|
||||
- Files/CSV: no standard solutions — splicing helps most
|
||||
- HTML: good documentation and tests were hard to write
|
||||
|
||||
## Slide 27: Conclusion
|
||||
- Data-driven program synthesis using large code corpus
|
||||
- Enumerative search
|
||||
- User study: good for tasks without standard solutions
|
||||
Reference in New Issue
Block a user