- PhD defense slides (defense.key, Nov 2018) → phd_defense/ - Master's defense on MOOC peer evaluation (Dec 2014) - ENGI 600 data-driven program repair (Apr 2015) - COMP 600 data-driven program completion (Fall 2015, Spring 2016) - COMP 600 Program Splicing presentation + feedback + response (Spring 2018) - Program Splicing slides in .key and .pdf formats (Spring 2018) Each file has a .md transcription with academic frontmatter. Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
2.2 KiB
2.2 KiB
category, type, person, date, source
| category | type | person | date | source |
|---|---|---|---|---|
| academic | academic | Yanxin Lu | 2016-01 | codecomplete_spring2016.pptx |
COMP 600 Spring 2016: Data Driven Program Completion
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Drew Dehaas, Vineeth Kashyap, and David Melski. Presented by Yanxin Lu. 29 slides.
Slide 2: Title
Data Driven Program Completion
Slide 3–4: Programming is difficult
- Longest Common Subsequence example
Slide 5: Program Synthesis
- Automatically generating programs
- Specification: logic formula, unit testing, natural language
Slide 6: Related work
- Deductive and solver-aided synthesis
- Constraint-based synthesis: syntax-guided synthesis, Sketching, Template
- Inductive synthesis: input-output examples
Slide 7: Big data
- GitHub, SourceForge, Google Code, StackOverflow
Slide 8–9: Summary
- Data-driven program completion, corpus and Pliny database, synthesis algorithm, initial experiment and future work
Slide 10–11: Program completion
- Sketch + programs in DB + test cases
- LCS example: LCS("123", "123") = "123", LCS("123", "234") = "23"
Slide 12–13: Workflow
- Synthesis ↔ PDB
- Incomplete program → query → programs → completed program
Slide 14: PDB
- Thousands of programs with features, similarity metrics
- Fast top-k query: 1-2 orders of magnitude faster than no-SQL systems
Slide 15: Corpus
- 100,000+ projects, C/C++/Java
- 50GB source code, 480+ C projects
Slide 16: Feature Extraction
- Names: X, s, n, j, Y, index, lcs
- TF/IDF: "charact": 0.158, "reduc": 0.158, "result": 0.316, "lc": 0.791, "index": 0.316
Slides 18–21: Synthesis Algorithm
- Search PDB for similar programs
- Fill holes via enumerative search
- Merge undefined variables
- Test to filter incorrect programs
Slides 22–24: Heuristics
- Types: ignore incompatible types
- Context: ignore expressions with no common parents
- Huge search space reduction
Slides 25–26: Initial experiment and future work
- LCS: less than 10 seconds
- Future work: more benchmarks, closure, search PDB using types
Slides 27–28: Program repair
- Use PDB to find most similar correct program
- Bug localization → holes → completion
Slide 29: Conclusion
- Program Completion: no more copy and paste, focus on important tasks