Archive 10 academic presentations from ~/Downloads/slides/ (2014-2018)

- PhD defense slides (defense.key, Nov 2018) → phd_defense/
- Master's defense on MOOC peer evaluation (Dec 2014)
- ENGI 600 data-driven program repair (Apr 2015)
- COMP 600 data-driven program completion (Fall 2015, Spring 2016)
- COMP 600 Program Splicing presentation + feedback + response (Spring 2018)
- Program Splicing slides in .key and .pdf formats (Spring 2018)

Each file has a .md transcription with academic frontmatter.
Skipped www2015.pdf (duplicate of existing www15.zip) and syncthing conflict copy.
This commit is contained in:
Yanxin Lu
2026-04-06 12:00:27 -07:00
parent 180c615170
commit b85169f4e7
20 changed files with 602 additions and 0 deletions

Binary file not shown.

View File

@@ -0,0 +1,17 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2018-11
source: defense_slides.key
---
# PhD Thesis Defense Slides
Keynote presentation for Yanxin Lu's PhD thesis defense at Rice University, November 2018.
Topic: Program Splicing — Data-driven Program Synthesis
The defense covers the same material as the PhD thesis: using a large corpus of programs (3.5 million from GitHub and SourceForge) to automatically synthesize code by splicing together relevant code fragments. The system uses the Pliny database (PDB) for efficient top-k retrieval of similar programs, enumerative search to fill in program holes, variable renaming to resolve undefined variables, and unit testing to filter out incorrect candidates. Benchmarks demonstrate efficient synthesis times (3161 seconds) across problems like sieve prime, binary search, CSV parsing, matrix multiplication, and LCS. A user study with 12 graduate students and 6 professionals showed program splicing significantly reduced programming time, especially for algorithmic tasks and tasks without standard solutions.
Note: The preview image shows only the title slide (blank/white). The full Keynote file contains the complete presentation.

View File

@@ -0,0 +1,74 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2016-01
source: codecomplete_spring2016.pptx
---
# COMP 600 Spring 2016: Data Driven Program Completion
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Drew Dehaas, Vineeth Kashyap, and David Melski. Presented by Yanxin Lu. 29 slides.
## Slide 2: Title
Data Driven Program Completion
## Slide 34: Programming is difficult
- Longest Common Subsequence example
## Slide 5: Program Synthesis
- Automatically generating programs
- Specification: logic formula, unit testing, natural language
## Slide 6: Related work
- Deductive and solver-aided synthesis
- Constraint-based synthesis: syntax-guided synthesis, Sketching, Template
- Inductive synthesis: input-output examples
## Slide 7: Big data
- GitHub, SourceForge, Google Code, StackOverflow
## Slide 89: Summary
- Data-driven program completion, corpus and Pliny database, synthesis algorithm, initial experiment and future work
## Slide 1011: Program completion
- Sketch + programs in DB + test cases
- LCS example: LCS("123", "123") = "123", LCS("123", "234") = "23"
## Slide 1213: Workflow
- Synthesis ↔ PDB
- Incomplete program → query → programs → completed program
## Slide 14: PDB
- Thousands of programs with features, similarity metrics
- Fast top-k query: 1-2 orders of magnitude faster than no-SQL systems
## Slide 15: Corpus
- 100,000+ projects, C/C++/Java
- 50GB source code, 480+ C projects
## Slide 16: Feature Extraction
- Names: X, s, n, j, Y, index, lcs
- TF/IDF: "charact": 0.158, "reduc": 0.158, "result": 0.316, "lc": 0.791, "index": 0.316
## Slides 1821: Synthesis Algorithm
- Search PDB for similar programs
- Fill holes via enumerative search
- Merge undefined variables
- Test to filter incorrect programs
## Slides 2224: Heuristics
- Types: ignore incompatible types
- Context: ignore expressions with no common parents
- Huge search space reduction
## Slides 2526: Initial experiment and future work
- LCS: less than 10 seconds
- Future work: more benchmarks, closure, search PDB using types
## Slides 2728: Program repair
- Use PDB to find most similar correct program
- Bug localization → holes → completion
## Slide 29: Conclusion
- Program Completion: no more copy and paste, focus on important tasks

View File

@@ -0,0 +1,78 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2015-08
source: comp600_fall2015.pptx
---
# COMP 600 Fall 2015: Data Driven Program Completion
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Vijayaraghavan Murali. Presented by Yanxin Lu. 26 slides.
## Slide 2: Title
Data Driven Program Completion
## Slide 3: Programming is difficult
## Slide 4: Program Synthesis
- Automatically generating programs
- Specification: logic formula, unit testing, natural language
- Hard problem!
## Slide 5: Related work
- Deductive and solver-aided synthesis (IEEE Trans. Software Eng. 18(8), PLDI 2014)
- Constraint-based synthesis: syntax-guided synthesis (FMCAD 2013), Sketching (ASPLOS 2006), Template (STTT 15(5-6))
- Inductive synthesis: input-output examples (POPL 2011, PLDI 2015)
## Slide 6: Big data
- GitHub, SourceForge, Google Code, StackOverflow
## Slide 7: Summary
- Data-driven program completion, demo, corpus and Pliny database, synthesis algorithm, initial experiment and future work, program repair
## Slide 8: Program Completion
- A subset of C
## Slide 9: Demo
## Slide 1011: Architecture
- Synthesis ↔ PDB (Pliny Database)
- Incomplete program → query → top-k similar programs → completed program
## Slide 12: PDB
- Thousands of programs with features
- Similarity metrics, fast top-k query
- 1-2 orders of magnitude faster than no-SQL database systems (Chris Jermaine)
## Slide 13: Corpus
- More than 100,000 projects from GitHub, SourceForge, Google Code
- C, C++, Java
- Preprocessing: 50GB source code, 480+ projects, C
## Slide 14: Feature Extraction
- Lightweight program analysis capturing characteristics
- Abstract Structural Skeleton: (seq (loop (seq (cond ()))))
- Coupling: ('int', 'c:unary-'), ('int', 'c:/'), ('int*', 'c:+'), etc.
## Slides 1619: Synthesis Algorithm
- Finding similar programs from PDB
- Filling in the holes via search
- Variable renaming for undefined variables
- Unit testing to filter incorrect programs
## Slides 2022: Heuristics
- Types: ignore expressions with incompatible types
- Context: ignore expressions with no common parents
- Huge search space reduction
## Slide 23: Initial experiment and future work
- Binary search: less than 10 seconds
- Future work: more benchmark problems, performance increase
## Slides 2425: Program Repair
- Use PDB to find most similar correct program
- Bug localization → program completion problem
## Slide 26: Conclusion
- Program completion + program repair using big data + programming languages

Binary file not shown.

View File

@@ -0,0 +1,65 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2018-03
source: comp600_feedback_2018.pages
---
# COMP 600 Presentation Feedback
Peer and instructor feedback on Yanxin Lu's COMP 600 presentation (Program Splicing), March 2018.
Good pace for the motivation part which explained the problem really well.
Stance is not good. Kept moving. Kept looking back to the screen.
Not really smooth in the beginning of the talk.
Related work is a little bit long which takes a lot of time.
Energy is not enough when introducing program splicing.
Good pace for the demo. But I was moving all the time. Need to stand still.
Kept moving in the architecture slide, and kept looking back.
Good gesture for demonstrating the KNN search.
PDB went a little bit fast. Need more details.
Enumerative search could be faster and don't need to show the process of enumeration.
Kept moving all the time in the benchmark problems.
Not very smooth in the benchmark problems.
Too much text in the user study slide.
Not smooth in the user study result slide, especially in the sieve problem.
Stance is not good through out the talk.
Went a little bit fast at the end of the talk because of time.
**Best feature**
1. PDB and related work could be shorter.
2. The motivation, example makes the problem easy to understand.
3. Mention the limit of the related work.
4. Good demo
1. Confident
2. Good voice control and eye contact
1. Tables were well explained.
1. Good handling questions.
**Message/organizations**
1. Need more technical details.
2. Source code license has to be covered
3. Demo was not very useful while a few slides can do the work.
4. Programming problems might not reflect the real-world improvements.
5. Explain more on the experiments.
6. Need statistically significance and power for the hypothesis test. Too few samples.
7. Not clear what the contribution is
8. Not mention limitations of the work.
9. The example in the demo might not be a good one.
**Delivery**
1. Keep his stance
2. Not showing enough passion
3. Louder
**Visuals**
1. Architecture flowchart could be made better

View File

@@ -0,0 +1,21 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2018-04
source: comp600_response_2018.pages
---
# COMP 600 Response
Yanxin Lu
COMP 600 Response
Monday, April 2, 2018
The best moment in this presentation was from the motivation section to the demo section. I talked slowly and paused at important points. This created a sense of emphasis on some very important point I wanted to make. In addition, by talking slowly, the audience was able to understand the motivation and the work I have been doing, even for the people that do not have any background knowledge. The demo also helped people understand what the tool does and it also drew a lot of attention.
After I viewed the video and peer's reviews, I was surprised that my stance look very awkward and I did not realize this at all during the presentation. I kept making unnecessary moves and looked not very serious. Another thing that surprises me is that people complains about me not providing enough technical details. I thought that the technical details were given enough, but that did not seem to be the case.
One of my greatest strength is the ability to motivate the talk and explain a complex problem very clearly. I used a very easy example throughout the talk and the audience was able to understand the talk through the example easily. Another strength is the ability to explain the data. I highlighted some important points when explaining the data, because most of the time it is hard to understand what data implies without any guidance. The third strength is the delivery. I showed confidence by talking slowly and making good eye contact.
The thing I need to improve is stance. Moving too much not only looks awkward, but it also creates an impression of not being serious and lack of authority. Standing still also makes me look more confident. The second area I need to improve is handling questions and the ability to control the situation. Sometimes I had hard time understanding people and could have done better on controlling the situation when people are having discussions among themselves. To actually improve in those areas, I will attend talks, focus on how good presenters do in those areas and try to learn from them.

View File

@@ -0,0 +1,55 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2015-04
source: engi600_2015.pptx
---
# ENGI 600: Data-Driven Program Repair
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, Joe Warren, and Scott Rixner. 12 slides.
## Slide 1: Title
Data-Driven Program Repair
## Slide 2: Debugging is difficult
## Slide 3: Related work
- Talus, tutoring system (Murray, 1986) — reference program, program analysis
- Mutation (Debroy and Wong, 2010) — predefined rules for mutating programs
## Slide 4: Data-driven program repair
- Code database for evaluatePoly
- Incorrect Program → Correct Program
## Slide 5: EvaluatePoly
- A program which evaluates polynomials
- Poly: a list of coefficients
- X: the x value in the polynomial
## Slide 6: Similar and correct implementations
- Distance between programs
- Incorrect Program → Code Database → Correct Programs → Template Generation
## Slide 7: Template Generation
- Find differences and replace them with holes
- Ignore variable names
## Slide 8: Filling in the Holes
- Search for ways to replace holes
- Variable Renaming
## Slide 9: Variable Renaming
- Rename variables in the good program
## Slide 10: Unit Testing
- Filter all incorrect fixes using unit testing
- If multiple correct fixes, choose the most similar one
## Slide 11: Experiment
## Slide 12: Conclusion
- Data-driven program repair
- Effective in fixing small incorrect programs
- Computer science education — same mistakes

Binary file not shown.

View File

@@ -0,0 +1,102 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2014-12
source: master_defense_2014.pptx
---
# Master's Thesis Defense: Improving Peer Evaluation Quality in MOOCs
Yanxin Lu, December 2014. 40 slides.
## Slide 2: Title
Improving Peer Evaluation Quality in MOOCs — Yanxin Lu, December 2014
## Slide 34: Summary
- Motivations and Problems
- Experiment
- Statistical Analysis
- Results
- Conclusion
## Slide 5: What is MOOC?
## Slide 6: Intro to Interactive Programming in Python
- Coursera course, 120,000 enrolled, 7,500 completed
## Slide 78: Example Assignments
- Stopwatch
- Memory game
## Slide 9: Grading Rubric for Stopwatch
- 1 pt: Program successfully opens a frame with the stopwatch stopped
- 2 pts: Program correctly draws number of successful stops at whole second vs total stops
## Slide 10: Peer Grading
- Example scores: 1, 9, 9, 9, 10 → Score = 9
## Slide 11: Quality is Highly Variable
- Lack of effort
- Small bugs require more effort
## Slide 12: Solution
A web application where students can:
- Look at other peer evaluations
- Grade other peer evaluations
## Slide 13: Findings
- Grading evaluation has the strongest effect
- The knowledge that one's own peer evaluation will be examined does not
- Strong effect on peer evaluation quality simply because students know they are being studied
## Slide 15: Experiment Summary
- Sign up → Stopwatch → Memory
## Slide 16: Sign up
- Web consent form, three groups, prize
- Nothing about specific study goals or what was being measured
- 3,015 students
## Slide 17: Three Groups
- G1: Full treatment, grading + viewing
- G2: Only viewing
- G3: Control group
- Size ratio G1:G2:G3 = 8:1:1
## Slides 1824: Experiment Phases
- Submission Phase: Submit programs before deadline
- Evaluation Phase: 1 self evaluation + 5 peer evaluations per rubric item (score + optional comment)
- Grading Evaluation Phase (G1): Web app, per evaluation × rubric item → Good/Neutral/Bad
- Viewing Phase (G1, G2): See number of good/neutral/bad ratings and their own evaluation
## Slide 25: Statistics
- Most evaluations are graded three times
## Slide 27: Goal
- Whether G1 does better grading compared to G2, G3 or both
- Measuring quality: correct scores, comment length
- Reject a set of null hypotheses
## Slide 28: Bootstrapping
- Simulation-based method using resampling with replacement
- Statistically significant: p-value <= 0.05
## Slide 30: Terms
- Good programs: correct (machine grader verified)
- Bad programs: incorrect
- Bad job: incorrect grade OR no comment
- Really bad job: incorrect grade AND no comment
## Slides 3138: Results
Hypothesis tests on comment length, "bad job" fraction, and "really bad job" fraction across groups on good and bad programs.
## Slide 39: Findings
- Grading evaluation has the strongest positive effect
- The knowledge that one's own peer evaluation will be examined does not
- Strong Hawthorne effect: improvement simply from knowing they are being studied
## Slide 40: Conclusion
- A web application for peer evaluation assessment
- Study has positive effect on quality of peer evaluations
- Implications beyond peer evaluations

View File

@@ -0,0 +1,31 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2018-05
source: splicing_comp600_2018.key
---
# Program Splicing — COMP 600 Spring 2018 (Keynote)
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. Keynote presentation, Spring 2018.
Source Keynote file for the Program Splicing COMP 600 presentation. The PDF export is available as splicing_comp600_2018.pdf.
The presentation covers the same content as splicing_comp600_slides_2018.pdf but is a slightly revised version with subtitle "Data-driven Program Synthesis" on the title slide:
1. Copying and Pasting problem — time consuming and introduces bugs
2. Program Synthesis — automatically generating programs from specifications
3. Problem — can we use program synthesis to improve copy-paste?
4. Related work — Sketching (PLDI 2005), Code Transplantation (ISSTA 2015)
5. Program Splicing approach — automate copying/pasting using 3.5M program corpus, ensure correctness
6. Architecture — draft program → Synthesis ↔ PDB → completed program
7. PDB — 3.5M Java programs, natural language features, similarity metrics, KNN search, fast top-k query
8. Relevant programs — query PDB with draft program to find similar implementations
9. Filling holes — enumerative search over candidate expressions from relevant programs
10. Variable renaming — resolve undefined variables
11. Testing — filter incorrect candidates via unit tests
12. Heuristics — type and context-based pruning for search space reduction
13. Benchmark — 12 programs, synthesis times 3161 seconds, efficient algorithm
14. User study — 18 participants (12 grad students + 6 professionals), 4 problems, splicing most helpful for algorithmic tasks and tasks without standard solutions
15. Conclusion — data-driven synthesis with large corpus, enumerative search, efficient algorithm, fast code reuse

View File

@@ -0,0 +1,64 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2018-05
source: splicing_comp600_2018.pdf
---
# Program Splicing — COMP 600 Spring 2018 (PDF Export)
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. 31 slides.
PDF export of the Keynote presentation splicing_comp600_2018.key. Title: "Program Splicing: Data-driven Program Synthesis".
This is a revised version of the earlier splicing_comp600_slides_2018.pdf. Key differences:
- Title slide has subtitle "Data-driven Program Synthesis" (vs just "Presented by Yanxin Lu")
- Adds "Efficient relevant code retrieval" and "KNN search" to PDB slide
- Adds "Programming time" to user study setup
- User study result slides titled differently: "Deceptively simple", "No standard solutions", "Good documentations and tests were hard to write"
- Conclusion adds "Efficient algorithm", "Fast code reuse", "Easy to test", "Future work: synthesis algorithm improvement"
## Slide 2: Title
Program Splicing: Data-driven Program Synthesis
## Slides 37: Motivation and Approach
- Copying and pasting is time consuming and introduces bugs
- Program synthesis: automatically generate programs from specifications
- Problem: can we use program synthesis to improve copying and pasting?
- Related work: Sketching (PLDI 2005), Code Transplantation (ISSTA 2015)
- Program Splicing: automate process, large corpus (3.5M programs), ensure correctness
## Slide 8: Demo
- How does a programmer use program splicing?
## Slides 912: Architecture
- User → draft program → Synthesis ↔ PDB → completed program
- PDB: efficient relevant code retrieval, 3.5M Java programs, NL features, similarity metrics, KNN search, fast top-k query
## Slides 1318: Synthesis Algorithm
- Find relevant programs from PDB
- Fill holes via enumerative search
- Variable renaming for undefined variables
- Testing to filter incorrect programs
## Slides 1920: Benchmark
Same benchmark table as the earlier version. Efficient synthesis algorithm highlighted.
## Slide 21: No need to write many tests
## Slides 2226: User study
- 18 participants, 4 problems, programming time measured
- Sieve: deceptively simple
- Files/CSV: no standard solutions — splicing most helpful
- HTML: good documentation and tests were hard to write
## Slide 27: Conclusion
- Program Splicing: large code corpus, enumerative search, efficient algorithm
- Fast code reuse: no standard solutions, easy to test
- Future work: synthesis algorithm improvement
## Slides 2931: Appendix (Heuristics)
- Type-based pruning: ignore incompatible types
- Context-based pruning: ignore expressions with no common parents
- Huge search space reduction

View File

@@ -0,0 +1,95 @@
---
category: academic
type: academic
person: Yanxin Lu
date: 2018-04
source: splicing_comp600_slides_2018.pdf
---
# Program Splicing — COMP 600 Slides (PDF)
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. Presented by Yanxin Lu. 31 slides.
PDF export of COMP 600 presentation on Program Splicing. This is an earlier version of the presentation (title slide says "Presented by Yanxin Lu"). See also splicing_comp600_2018.pdf for a slightly revised version with subtitle "Data-driven Program Synthesis".
## Slide 2: Title
Program Splicing — Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine, David Melski. Presented by Yanxin Lu.
## Slide 3: Copying and Pasting
- Problem: developers search online, copy code, adapt it — time consuming and bugs introduced
## Slide 4: Program Synthesis
- Automatically generating programs
- Specification: logic formula, unit testing, natural language
- Correctness
## Slide 5: Problem
Can we use program synthesis to help the process of copying and pasting?
## Slide 6: Related work
- Sketch (FMCAD 2013) — cannot synthesize statements, does not use a code database
- Code Transplantation (ISSTA 2015) — not efficient, does not search for relevant code snippets
## Slide 7: Program Splicing
- Use a large corpus of over 3.5 million programs
- Automate the process of copying and pasting
- Ensure correctness
## Slide 8: Summary
- Architecture (corpus and Pliny database, synthesis algorithm)
- Experiment
- Conclusion
## Slide 910: Architecture
- User provides draft program → Synthesis queries PDB → Top-k relevant programs → Completed program
## Slide 11: PDB
- 3.5 million Java programs with features from GitHub, SourceForge
- Natural language terms: "read": 0.10976, "matrix": 0.65858, ...
- Similarity metrics, fast top-k query (1-2 orders of magnitude faster than no-SQL)
## Slide 13: Relevant programs
- Draft program with holes + COMMENT/REQ specification → PDB returns similar programs
## Slides 1416: Filling in the holes
- Enumerative search: try candidate expressions from relevant programs
- Progressive selection of code fragments
## Slide 17: Variable Renaming
- Resolve undefined variables by mapping from relevant program's variables
## Slide 18: Testing
- Filter out incorrect programs using unit tests
## Slide 1920: Benchmark
| Benchmark | Synthesis Time (s) | LOC | Var | Holes (expr-stmt) | Test | uScalpel |
|---|---|---|---|---|---|---|
| Sieve Prime | 4.6 | 12-17 | 2 | 2-1 | 3 | 162.1 |
| Collision Detection | 4.2 | 10-15 | 2 | 2-1 | 4 | N/A |
| Collecting Files | 3.0 | 13-25 | 2 | 1-1 | 2 | timeout |
| Binary Search | 15.4 | 12-20 | 5 | 1-1 | 3 | timeout |
| HTTP Server | 41.1 | 24-45 | 6 | 1-2 | 2 | N/A |
| Prim's Distance Update | 61.1 | 53-58 | 11 | 1-1 | 4 | timeout |
| Quick Sort | 77.2 | 11-18 | 6 | 1-1 | 1 | timeout |
| CSV | 88.4 | 13-23 | 4 | 1-2 | 2 | timeout |
| Matrix Multiplication | 108.9 | 13-15 | 8 | 1-1 | 1 | timeout |
| Floyd Warshall | 110.4 | 9-12 | 7 | 1-1 | 7 | timeout |
| HTML Parsing | 140.4 | 20-34 | 5 | 1-2 | 2 | N/A |
| LCS | 161.5 | 29-36 | 10 | 0-1 | 1 | timeout |
Synthesis algorithm is efficient. No need to write many tests.
## Slides 2226: User study
- 12 graduate students and 6 professionals
- Web-based programming environment
- 4 programming problems (2 with splicing, 2 without)
- Internet search encouraged
- Results: splicing reduced time for algorithmic tasks (sieve, files)
- Sieve: appears simple but was not (deceptively simple)
- Files/CSV: no standard solutions — splicing helps most
- HTML: good documentation and tests were hard to write
## Slide 27: Conclusion
- Data-driven program synthesis using large code corpus
- Enumerative search
- User study: good for tasks without standard solutions