Files

Yanxin Lu b85169f4e7 Archive 10 academic presentations from ~/Downloads/slides/ (2014-2018)

- PhD defense slides (defense.key, Nov 2018) → phd_defense/
- Master's defense on MOOC peer evaluation (Dec 2014)
- ENGI 600 data-driven program repair (Apr 2015)
- COMP 600 data-driven program completion (Fall 2015, Spring 2016)
- COMP 600 Program Splicing presentation + feedback + response (Spring 2018)
- Program Splicing slides in .key and .pdf formats (Spring 2018)

Each file has a .md transcription with academic frontmatter.
Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.

2026-04-06 12:00:27 -07:00

3.7 KiB

Raw Blame History

category, type, person, date, source

category	type	person	date	source
academic	academic	Yanxin Lu	2018-04	splicing_comp600_slides_2018.pdf

Program Splicing — COMP 600 Slides (PDF)

Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. Presented by Yanxin Lu. 31 slides.

PDF export of COMP 600 presentation on Program Splicing. This is an earlier version of the presentation (title slide says "Presented by Yanxin Lu"). See also splicing_comp600_2018.pdf for a slightly revised version with subtitle "Data-driven Program Synthesis".

Slide 2: Title

Program Splicing — Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. Presented by Yanxin Lu.

Slide 3: Copying and Pasting

Problem: developers search online, copy code, adapt it — time consuming and bugs introduced

Slide 4: Program Synthesis

Automatically generating programs
Specification: logic formula, unit testing, natural language
Correctness

Slide 5: Problem

Can we use program synthesis to help the process of copying and pasting?

Sketch (FMCAD 2013) — cannot synthesize statements, does not use a code database
Code Transplantation (ISSTA 2015) — not efficient, does not search for relevant code snippets

Slide 7: Program Splicing

Use a large corpus of over 3.5 million programs
Automate the process of copying and pasting
Ensure correctness

Slide 8: Summary

Architecture (corpus and Pliny database, synthesis algorithm)
Experiment
Conclusion

Slide 9–10: Architecture

User provides draft program → Synthesis queries PDB → Top-k relevant programs → Completed program

Slide 11: PDB

3.5 million Java programs with features from GitHub, SourceForge
Natural language terms: "read": 0.10976, "matrix": 0.65858, ...
Similarity metrics, fast top-k query (1-2 orders of magnitude faster than no-SQL)

Slide 13: Relevant programs

Draft program with holes + COMMENT/REQ specification → PDB returns similar programs

Slides 14–16: Filling in the holes

Enumerative search: try candidate expressions from relevant programs
Progressive selection of code fragments

Slide 17: Variable Renaming

Resolve undefined variables by mapping from relevant program's variables

Slide 18: Testing

Filter out incorrect programs using unit tests

Slide 19–20: Benchmark

Benchmark	Synthesis Time (s)	LOC	Var	Holes (expr-stmt)	Test	uScalpel
Sieve Prime	4.6	12-17	2	2-1	3	162.1
Collision Detection	4.2	10-15	2	2-1	4	N/A
Collecting Files	3.0	13-25	2	1-1	2	timeout
Binary Search	15.4	12-20	5	1-1	3	timeout
HTTP Server	41.1	24-45	6	1-2	2	N/A
Prim's Distance Update	61.1	53-58	11	1-1	4	timeout
Quick Sort	77.2	11-18	6	1-1	1	timeout
CSV	88.4	13-23	4	1-2	2	timeout
Matrix Multiplication	108.9	13-15	8	1-1	1	timeout
Floyd Warshall	110.4	9-12	7	1-1	7	timeout
HTML Parsing	140.4	20-34	5	1-2	2	N/A
LCS	161.5	29-36	10	0-1	1	timeout

Synthesis algorithm is efficient. No need to write many tests.

Slides 22–26: User study

12 graduate students and 6 professionals
Web-based programming environment
4 programming problems (2 with splicing, 2 without)
Internet search encouraged
Results: splicing reduced time for algorithmic tasks (sieve, files)
Sieve: appears simple but was not (deceptively simple)
Files/CSV: no standard solutions — splicing helps most
HTML: good documentation and tests were hard to write

Slide 27: Conclusion

Data-driven program synthesis using large code corpus
Enumerative search
User study: good for tasks without standard solutions

3.7 KiB Raw Blame History Unescape Escape