Files
obsidian-yanxin/documents/academic/presentations/comp600_fall2015.md
Yanxin Lu b85169f4e7 Archive 10 academic presentations from ~/Downloads/slides/ (2014-2018)
- PhD defense slides (defense.key, Nov 2018) → phd_defense/
- Master's defense on MOOC peer evaluation (Dec 2014)
- ENGI 600 data-driven program repair (Apr 2015)
- COMP 600 data-driven program completion (Fall 2015, Spring 2016)
- COMP 600 Program Splicing presentation + feedback + response (Spring 2018)
- Program Splicing slides in .key and .pdf formats (Spring 2018)

Each file has a .md transcription with academic frontmatter.
Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
2026-04-06 12:00:27 -07:00

2.5 KiB
Raw Permalink Blame History

category, type, person, date, source
category type person date source
academic academic Yanxin Lu 2015-08 comp600_fall2015.pptx

COMP 600 Fall 2015: Data Driven Program Completion

Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Vijayaraghavan Murali. Presented by Yanxin Lu. 26 slides.

Slide 2: Title

Data Driven Program Completion

Slide 3: Programming is difficult

Slide 4: Program Synthesis

  • Automatically generating programs
  • Specification: logic formula, unit testing, natural language
  • Hard problem!
  • Deductive and solver-aided synthesis (IEEE Trans. Software Eng. 18(8), PLDI 2014)
  • Constraint-based synthesis: syntax-guided synthesis (FMCAD 2013), Sketching (ASPLOS 2006), Template (STTT 15(5-6))
  • Inductive synthesis: input-output examples (POPL 2011, PLDI 2015)

Slide 6: Big data

  • GitHub, SourceForge, Google Code, StackOverflow

Slide 7: Summary

  • Data-driven program completion, demo, corpus and Pliny database, synthesis algorithm, initial experiment and future work, program repair

Slide 8: Program Completion

  • A subset of C

Slide 9: Demo

Slide 1011: Architecture

  • Synthesis ↔ PDB (Pliny Database)
  • Incomplete program → query → top-k similar programs → completed program

Slide 12: PDB

  • Thousands of programs with features
  • Similarity metrics, fast top-k query
  • 1-2 orders of magnitude faster than no-SQL database systems (Chris Jermaine)

Slide 13: Corpus

  • More than 100,000 projects from GitHub, SourceForge, Google Code
  • C, C++, Java
  • Preprocessing: 50GB source code, 480+ projects, C

Slide 14: Feature Extraction

  • Lightweight program analysis capturing characteristics
  • Abstract Structural Skeleton: (seq (loop (seq (cond ()))))
  • Coupling: ('int', 'c:unary-'), ('int', 'c:/'), ('int*', 'c:+'), etc.

Slides 1619: Synthesis Algorithm

  • Finding similar programs from PDB
  • Filling in the holes via search
  • Variable renaming for undefined variables
  • Unit testing to filter incorrect programs

Slides 2022: Heuristics

  • Types: ignore expressions with incompatible types
  • Context: ignore expressions with no common parents
  • Huge search space reduction

Slide 23: Initial experiment and future work

  • Binary search: less than 10 seconds
  • Future work: more benchmark problems, performance increase

Slides 2425: Program Repair

  • Use PDB to find most similar correct program
  • Bug localization → program completion problem

Slide 26: Conclusion

  • Program completion + program repair using big data + programming languages