obsidian-yanxin/documents/academic/presentations/codecomplete_spring2016.md

---
category: academic
type: academic
person: Yanxin Lu
date: 2016-01
source: codecomplete_spring2016.pptx
---

# COMP 600 Spring 2016: Data Driven Program Completion

Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Drew Dehaas, Vineeth Kashyap, and David Melski. Presented by Yanxin Lu. 29 slides.

## Slide 2: Title
Data Driven Program Completion

## Slide 3–4: Programming is difficult
- Longest Common Subsequence example

## Slide 5: Program Synthesis
- Automatically generating programs
- Specification: logic formula, unit testing, natural language

## Slide 6: Related work
- Deductive and solver-aided synthesis
- Constraint-based synthesis: syntax-guided synthesis, Sketching, Template
- Inductive synthesis: input-output examples

## Slide 7: Big data
- GitHub, SourceForge, Google Code, StackOverflow

## Slide 8–9: Summary
- Data-driven program completion, corpus and Pliny database, synthesis algorithm, initial experiment and future work

## Slide 10–11: Program completion
- Sketch + programs in DB + test cases
- LCS example: LCS("123", "123") = "123", LCS("123", "234") = "23"

## Slide 12–13: Workflow
- Synthesis ↔ PDB
- Incomplete program → query → programs → completed program

## Slide 14: PDB
- Thousands of programs with features, similarity metrics
- Fast top-k query: 1-2 orders of magnitude faster than no-SQL systems

## Slide 15: Corpus
- 100,000+ projects, C/C++/Java
- 50GB source code, 480+ C projects

## Slide 16: Feature Extraction
- Names: X, s, n, j, Y, index, lcs
- TF/IDF: "charact": 0.158, "reduc": 0.158, "result": 0.316, "lc": 0.791, "index": 0.316

## Slides 18–21: Synthesis Algorithm
- Search PDB for similar programs
- Fill holes via enumerative search
- Merge undefined variables
- Test to filter incorrect programs

## Slides 22–24: Heuristics
- Types: ignore incompatible types
- Context: ignore expressions with no common parents
- Huge search space reduction

## Slides 25–26: Initial experiment and future work
- LCS: less than 10 seconds
- Future work: more benchmarks, closure, search PDB using types

## Slides 27–28: Program repair
- Use PDB to find most similar correct program
- Bug localization → holes → completion

## Slide 29: Conclusion
- Program Completion: no more copy and paste, focus on important tasks