Files
obsidian-yanxin/documents/academic/presentations/codecomplete_spring2016.md
Yanxin Lu b85169f4e7 Archive 10 academic presentations from ~/Downloads/slides/ (2014-2018)
- PhD defense slides (defense.key, Nov 2018) → phd_defense/
- Master's defense on MOOC peer evaluation (Dec 2014)
- ENGI 600 data-driven program repair (Apr 2015)
- COMP 600 data-driven program completion (Fall 2015, Spring 2016)
- COMP 600 Program Splicing presentation + feedback + response (Spring 2018)
- Program Splicing slides in .key and .pdf formats (Spring 2018)

Each file has a .md transcription with academic frontmatter.
Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
2026-04-06 12:00:27 -07:00

2.2 KiB
Raw Blame History

category, type, person, date, source
category type person date source
academic academic Yanxin Lu 2016-01 codecomplete_spring2016.pptx

COMP 600 Spring 2016: Data Driven Program Completion

Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Drew Dehaas, Vineeth Kashyap, and David Melski. Presented by Yanxin Lu. 29 slides.

Slide 2: Title

Data Driven Program Completion

Slide 34: Programming is difficult

  • Longest Common Subsequence example

Slide 5: Program Synthesis

  • Automatically generating programs
  • Specification: logic formula, unit testing, natural language
  • Deductive and solver-aided synthesis
  • Constraint-based synthesis: syntax-guided synthesis, Sketching, Template
  • Inductive synthesis: input-output examples

Slide 7: Big data

  • GitHub, SourceForge, Google Code, StackOverflow

Slide 89: Summary

  • Data-driven program completion, corpus and Pliny database, synthesis algorithm, initial experiment and future work

Slide 1011: Program completion

  • Sketch + programs in DB + test cases
  • LCS example: LCS("123", "123") = "123", LCS("123", "234") = "23"

Slide 1213: Workflow

  • Synthesis ↔ PDB
  • Incomplete program → query → programs → completed program

Slide 14: PDB

  • Thousands of programs with features, similarity metrics
  • Fast top-k query: 1-2 orders of magnitude faster than no-SQL systems

Slide 15: Corpus

  • 100,000+ projects, C/C++/Java
  • 50GB source code, 480+ C projects

Slide 16: Feature Extraction

  • Names: X, s, n, j, Y, index, lcs
  • TF/IDF: "charact": 0.158, "reduc": 0.158, "result": 0.316, "lc": 0.791, "index": 0.316

Slides 1821: Synthesis Algorithm

  • Search PDB for similar programs
  • Fill holes via enumerative search
  • Merge undefined variables
  • Test to filter incorrect programs

Slides 2224: Heuristics

  • Types: ignore incompatible types
  • Context: ignore expressions with no common parents
  • Huge search space reduction

Slides 2526: Initial experiment and future work

  • LCS: less than 10 seconds
  • Future work: more benchmarks, closure, search PDB using types

Slides 2728: Program repair

  • Use PDB to find most similar correct program
  • Bug localization → holes → completion

Slide 29: Conclusion

  • Program Completion: no more copy and paste, focus on important tasks