- PhD defense slides (defense.key, Nov 2018) → phd_defense/ - Master's defense on MOOC peer evaluation (Dec 2014) - ENGI 600 data-driven program repair (Apr 2015) - COMP 600 data-driven program completion (Fall 2015, Spring 2016) - COMP 600 Program Splicing presentation + feedback + response (Spring 2018) - Program Splicing slides in .key and .pdf formats (Spring 2018) Each file has a .md transcription with academic frontmatter. Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
75 lines
2.2 KiB
Markdown
75 lines
2.2 KiB
Markdown
---
|
||
category: academic
|
||
type: academic
|
||
person: Yanxin Lu
|
||
date: 2016-01
|
||
source: codecomplete_spring2016.pptx
|
||
---
|
||
|
||
# COMP 600 Spring 2016: Data Driven Program Completion
|
||
|
||
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Drew Dehaas, Vineeth Kashyap, and David Melski. Presented by Yanxin Lu. 29 slides.
|
||
|
||
## Slide 2: Title
|
||
Data Driven Program Completion
|
||
|
||
## Slide 3–4: Programming is difficult
|
||
- Longest Common Subsequence example
|
||
|
||
## Slide 5: Program Synthesis
|
||
- Automatically generating programs
|
||
- Specification: logic formula, unit testing, natural language
|
||
|
||
## Slide 6: Related work
|
||
- Deductive and solver-aided synthesis
|
||
- Constraint-based synthesis: syntax-guided synthesis, Sketching, Template
|
||
- Inductive synthesis: input-output examples
|
||
|
||
## Slide 7: Big data
|
||
- GitHub, SourceForge, Google Code, StackOverflow
|
||
|
||
## Slide 8–9: Summary
|
||
- Data-driven program completion, corpus and Pliny database, synthesis algorithm, initial experiment and future work
|
||
|
||
## Slide 10–11: Program completion
|
||
- Sketch + programs in DB + test cases
|
||
- LCS example: LCS("123", "123") = "123", LCS("123", "234") = "23"
|
||
|
||
## Slide 12–13: Workflow
|
||
- Synthesis ↔ PDB
|
||
- Incomplete program → query → programs → completed program
|
||
|
||
## Slide 14: PDB
|
||
- Thousands of programs with features, similarity metrics
|
||
- Fast top-k query: 1-2 orders of magnitude faster than no-SQL systems
|
||
|
||
## Slide 15: Corpus
|
||
- 100,000+ projects, C/C++/Java
|
||
- 50GB source code, 480+ C projects
|
||
|
||
## Slide 16: Feature Extraction
|
||
- Names: X, s, n, j, Y, index, lcs
|
||
- TF/IDF: "charact": 0.158, "reduc": 0.158, "result": 0.316, "lc": 0.791, "index": 0.316
|
||
|
||
## Slides 18–21: Synthesis Algorithm
|
||
- Search PDB for similar programs
|
||
- Fill holes via enumerative search
|
||
- Merge undefined variables
|
||
- Test to filter incorrect programs
|
||
|
||
## Slides 22–24: Heuristics
|
||
- Types: ignore incompatible types
|
||
- Context: ignore expressions with no common parents
|
||
- Huge search space reduction
|
||
|
||
## Slides 25–26: Initial experiment and future work
|
||
- LCS: less than 10 seconds
|
||
- Future work: more benchmarks, closure, search PDB using types
|
||
|
||
## Slides 27–28: Program repair
|
||
- Use PDB to find most similar correct program
|
||
- Bug localization → holes → completion
|
||
|
||
## Slide 29: Conclusion
|
||
- Program Completion: no more copy and paste, focus on important tasks
|