Files

Yanxin Lu dc66e42ec4 Normalize frontmatter taxonomy across 293 document transcriptions

Categories now match folder names (15 canonical values).
Types normalized to 25 canonical values per VAULT_MAP.md spec.
Context-aware mapping: W-2s→tax-form, lease files→lease, vet records→vet, etc.

2026-04-05 20:19:52 -07:00

3.5 KiB

Raw Blame History

type, category, person, date, source

type	category	person	date	source
academic	academic	Yanxin Lu	2018	lu_poster.pdf

Corpus-Driven API Refactoring

Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine Department of Computer Science, Rice University

Introduction

Program rewrite or refactoring improves software maintainability.
Application programming interface (API) plays key role in everyday programming.
Automatically refactor an API call sequence
Translate the input API calls
Synthesize complete API call sequence

Code Example (Before - HtmlCleaner)

HtmlCleaner cleaner = new HtmlCleaner();
TagNode node = cleaner.clean(content);
TagNode[] links = node.getElementsHavingAttribu...
TagNode link = links[0];
String href = link.getAttributeByName(attr);

Code Example (After - Jsoup)

Document doc = Jsoup.parse(content);
Elements links = doc.select(selector);
Element link = links.first();
String href = link.attr(attr);

Methods

Translate the input API calls
Synthesize complete API call sequence

Algorithm Diagram

A() --> a() --> a()
B() --> b() --> b()
C() (API translation) --> c() (API synthesis) --> c()
D() --> d() --> d()
E() --> e() --> e()

Main Results

Refactoring accuracy on various input API call sequences
Accuracy: percentage of correct generated API calls

Accuracy Chart

Bar chart showing "Accuracy w/o params" and "Accuracy" for the following benchmark tasks:

CSV read, CSV write, CSV database, CSV delimiter, email login, email check, email send, email delete, FTP list, FTP login, FTP upload, FTP download, FTP delete, HTML scraping, HTML add node, HTML rm attr, HTML parse, HTML title, HTML write, HTTP get, HTTP post, HTTP server, NLP sentence, NLP token, NLP tag, NLP stem, ML classification, ML regression, ML cluster, ML neural network, graphics, gui, pdf read, pdf write, word read, word write

Limitations

Our refactoring method might not work as expected:

Inaccurate API translation
- HTML Writing
- Word Reading/Writing
- GUI
Long input API sequence
- Sending Email
- PDF Writing

Limitation Diagram

A() --> x()
B() --> y()
C() (translation) --> z()
D()
E()
F()
G()

Conclusion

Effective method that automates the process of API refactoring
Combination of two techniques
- API call translation
- API call sequence synthesizer
Does not work when
- Terminologies are different
- Input sequence is too long

Bibliography

Amruta Gokhale, Vinod Ganapathy, and Yogesh Padmanaban. Inferring likely mappings between apis. In Proceedings of the 2013 International Conference on Software Engineering, pages 82-91. IEEE Press, 2013.
Amruta Gokhale, Daeyoung Kim, and Vinod Ganapathy. Data-driven inference of api mappings. In Proceedings of the 2nd Workshop on Programming for Mobile & Touch, pages 29-32. ACM, 2014.
Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. Neural sketch learning for conditional program generation. arXiv preprint arXiv:1703.05698, 2017.
Rahul Pandita, Raoul Praful Jetley, Sithu D Sudarsan, and Laurie Williams. Discovering likely mappings between apis using text mining. In Source Code Analysis and Manipulation (SCAM), 2015 IEEE 15th International Working Conference on, pages 231-240. IEEE, 2015.
Trong Duc Nguyen, Anh Tuan Nguyen, and Tien N Nguyen. Mapping api elements for code migration with vector representations. In Software Engineering Companion (ICSE-C), IEEE/ACM International Conference on, pages 756-758. IEEE, 2016.

3.5 KiB Raw Blame History