Files
obsidian-yanxin/documents/academic/presentations/codecomplete_spring2016.md
Yanxin Lu b85169f4e7 Archive 10 academic presentations from ~/Downloads/slides/ (2014-2018)
- PhD defense slides (defense.key, Nov 2018) → phd_defense/
- Master's defense on MOOC peer evaluation (Dec 2014)
- ENGI 600 data-driven program repair (Apr 2015)
- COMP 600 data-driven program completion (Fall 2015, Spring 2016)
- COMP 600 Program Splicing presentation + feedback + response (Spring 2018)
- Program Splicing slides in .key and .pdf formats (Spring 2018)

Each file has a .md transcription with academic frontmatter.
Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
2026-04-06 12:00:27 -07:00

75 lines
2.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
category: academic
type: academic
person: Yanxin Lu
date: 2016-01
source: codecomplete_spring2016.pptx
---
# COMP 600 Spring 2016: Data Driven Program Completion
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Drew Dehaas, Vineeth Kashyap, and David Melski. Presented by Yanxin Lu. 29 slides.
## Slide 2: Title
Data Driven Program Completion
## Slide 34: Programming is difficult
- Longest Common Subsequence example
## Slide 5: Program Synthesis
- Automatically generating programs
- Specification: logic formula, unit testing, natural language
## Slide 6: Related work
- Deductive and solver-aided synthesis
- Constraint-based synthesis: syntax-guided synthesis, Sketching, Template
- Inductive synthesis: input-output examples
## Slide 7: Big data
- GitHub, SourceForge, Google Code, StackOverflow
## Slide 89: Summary
- Data-driven program completion, corpus and Pliny database, synthesis algorithm, initial experiment and future work
## Slide 1011: Program completion
- Sketch + programs in DB + test cases
- LCS example: LCS("123", "123") = "123", LCS("123", "234") = "23"
## Slide 1213: Workflow
- Synthesis ↔ PDB
- Incomplete program → query → programs → completed program
## Slide 14: PDB
- Thousands of programs with features, similarity metrics
- Fast top-k query: 1-2 orders of magnitude faster than no-SQL systems
## Slide 15: Corpus
- 100,000+ projects, C/C++/Java
- 50GB source code, 480+ C projects
## Slide 16: Feature Extraction
- Names: X, s, n, j, Y, index, lcs
- TF/IDF: "charact": 0.158, "reduc": 0.158, "result": 0.316, "lc": 0.791, "index": 0.316
## Slides 1821: Synthesis Algorithm
- Search PDB for similar programs
- Fill holes via enumerative search
- Merge undefined variables
- Test to filter incorrect programs
## Slides 2224: Heuristics
- Types: ignore incompatible types
- Context: ignore expressions with no common parents
- Huge search space reduction
## Slides 2526: Initial experiment and future work
- LCS: less than 10 seconds
- Future work: more benchmarks, closure, search PDB using types
## Slides 2728: Program repair
- Use PDB to find most similar correct program
- Bug localization → holes → completion
## Slide 29: Conclusion
- Program Completion: no more copy and paste, focus on important tasks