Archive 10 academic presentations from ~/Downloads/slides/ (2014-2018)

- PhD defense slides (defense.key, Nov 2018) → phd_defense/
- Master's defense on MOOC peer evaluation (Dec 2014)
- ENGI 600 data-driven program repair (Apr 2015)
- COMP 600 data-driven program completion (Fall 2015, Spring 2016)
- COMP 600 Program Splicing presentation + feedback + response (Spring 2018)
- Program Splicing slides in .key and .pdf formats (Spring 2018)

Each file has a .md transcription with academic frontmatter.
Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
This commit is contained in:
Yanxin Lu
2026-04-06 12:00:27 -07:00
parent 180c615170
commit b85169f4e7
20 changed files with 602 additions and 0 deletions

View File

@@ -0,0 +1,74 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2016-01
source: codecomplete_spring2016.pptx
---
# COMP 600 Spring 2016: Data Driven Program Completion
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Drew Dehaas, Vineeth Kashyap, and David Melski. Presented by Yanxin Lu. 29 slides.
## Slide 2: Title
Data Driven Program Completion
## Slide 34: Programming is difficult
- Longest Common Subsequence example
## Slide 5: Program Synthesis
- Automatically generating programs
- Specification: logic formula, unit testing, natural language
## Slide 6: Related work
- Deductive and solver-aided synthesis
- Constraint-based synthesis: syntax-guided synthesis, Sketching, Template
- Inductive synthesis: input-output examples
## Slide 7: Big data
- GitHub, SourceForge, Google Code, StackOverflow
## Slide 89: Summary
- Data-driven program completion, corpus and Pliny database, synthesis algorithm, initial experiment and future work
## Slide 1011: Program completion
- Sketch + programs in DB + test cases
- LCS example: LCS("123", "123") = "123", LCS("123", "234") = "23"
## Slide 1213: Workflow
- Synthesis ↔ PDB
- Incomplete program → query → programs → completed program
## Slide 14: PDB
- Thousands of programs with features, similarity metrics
- Fast top-k query: 1-2 orders of magnitude faster than no-SQL systems
## Slide 15: Corpus
- 100,000+ projects, C/C++/Java
- 50GB source code, 480+ C projects
## Slide 16: Feature Extraction
- Names: X, s, n, j, Y, index, lcs
- TF/IDF: "charact": 0.158, "reduc": 0.158, "result": 0.316, "lc": 0.791, "index": 0.316
## Slides 1821: Synthesis Algorithm
- Search PDB for similar programs
- Fill holes via enumerative search
- Merge undefined variables
- Test to filter incorrect programs
## Slides 2224: Heuristics
- Types: ignore incompatible types
- Context: ignore expressions with no common parents
- Huge search space reduction
## Slides 2526: Initial experiment and future work
- LCS: less than 10 seconds
- Future work: more benchmarks, closure, search PDB using types
## Slides 2728: Program repair
- Use PDB to find most similar correct program
- Bug localization → holes → completion
## Slide 29: Conclusion
- Program Completion: no more copy and paste, focus on important tasks