Files
obsidian-yanxin/documents/academic/presentations/comp600_fall2015.md
Yanxin Lu b85169f4e7 Archive 10 academic presentations from ~/Downloads/slides/ (2014-2018)
- PhD defense slides (defense.key, Nov 2018) → phd_defense/
- Master's defense on MOOC peer evaluation (Dec 2014)
- ENGI 600 data-driven program repair (Apr 2015)
- COMP 600 data-driven program completion (Fall 2015, Spring 2016)
- COMP 600 Program Splicing presentation + feedback + response (Spring 2018)
- Program Splicing slides in .key and .pdf formats (Spring 2018)

Each file has a .md transcription with academic frontmatter.
Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
2026-04-06 12:00:27 -07:00

79 lines
2.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
category: academic
type: academic
person: Yanxin Lu
date: 2015-08
source: comp600_fall2015.pptx
---
# COMP 600 Fall 2015: Data Driven Program Completion
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Vijayaraghavan Murali. Presented by Yanxin Lu. 26 slides.
## Slide 2: Title
Data Driven Program Completion
## Slide 3: Programming is difficult
## Slide 4: Program Synthesis
- Automatically generating programs
- Specification: logic formula, unit testing, natural language
- Hard problem!
## Slide 5: Related work
- Deductive and solver-aided synthesis (IEEE Trans. Software Eng. 18(8), PLDI 2014)
- Constraint-based synthesis: syntax-guided synthesis (FMCAD 2013), Sketching (ASPLOS 2006), Template (STTT 15(5-6))
- Inductive synthesis: input-output examples (POPL 2011, PLDI 2015)
## Slide 6: Big data
- GitHub, SourceForge, Google Code, StackOverflow
## Slide 7: Summary
- Data-driven program completion, demo, corpus and Pliny database, synthesis algorithm, initial experiment and future work, program repair
## Slide 8: Program Completion
- A subset of C
## Slide 9: Demo
## Slide 1011: Architecture
- Synthesis ↔ PDB (Pliny Database)
- Incomplete program → query → top-k similar programs → completed program
## Slide 12: PDB
- Thousands of programs with features
- Similarity metrics, fast top-k query
- 1-2 orders of magnitude faster than no-SQL database systems (Chris Jermaine)
## Slide 13: Corpus
- More than 100,000 projects from GitHub, SourceForge, Google Code
- C, C++, Java
- Preprocessing: 50GB source code, 480+ projects, C
## Slide 14: Feature Extraction
- Lightweight program analysis capturing characteristics
- Abstract Structural Skeleton: (seq (loop (seq (cond ()))))
- Coupling: ('int', 'c:unary-'), ('int', 'c:/'), ('int*', 'c:+'), etc.
## Slides 1619: Synthesis Algorithm
- Finding similar programs from PDB
- Filling in the holes via search
- Variable renaming for undefined variables
- Unit testing to filter incorrect programs
## Slides 2022: Heuristics
- Types: ignore expressions with incompatible types
- Context: ignore expressions with no common parents
- Huge search space reduction
## Slide 23: Initial experiment and future work
- Binary search: less than 10 seconds
- Future work: more benchmark problems, performance increase
## Slides 2425: Program Repair
- Use PDB to find most similar correct program
- Bug localization → program completion problem
## Slide 26: Conclusion
- Program completion + program repair using big data + programming languages