Categories now match folder names (15 canonical values). Types normalized to 25 canonical values per VAULT_MAP.md spec. Context-aware mapping: W-2s→tax-form, lease files→lease, vet records→vet, etc.
3.5 KiB
3.5 KiB
type, category, person, date, source
| type | category | person | date | source |
|---|---|---|---|---|
| academic | academic | Yanxin Lu | 2018 | lu_poster.pdf |
Corpus-Driven API Refactoring
Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine Department of Computer Science, Rice University
Introduction
- Program rewrite or refactoring improves software maintainability.
- Application programming interface (API) plays key role in everyday programming.
- Automatically refactor an API call sequence
- Translate the input API calls
- Synthesize complete API call sequence
Code Example (Before - HtmlCleaner)
HtmlCleaner cleaner = new HtmlCleaner();
TagNode node = cleaner.clean(content);
TagNode[] links = node.getElementsHavingAttribu...
TagNode link = links[0];
String href = link.getAttributeByName(attr);
Code Example (After - Jsoup)
Document doc = Jsoup.parse(content);
Elements links = doc.select(selector);
Element link = links.first();
String href = link.attr(attr);
Methods
- Translate the input API calls
- Synthesize complete API call sequence
Algorithm Diagram
- A() --> a() --> a()
- B() --> b() --> b()
- C() (API translation) --> c() (API synthesis) --> c()
- D() --> d() --> d()
- E() --> e() --> e()
Main Results
- Refactoring accuracy on various input API call sequences
- Accuracy: percentage of correct generated API calls
Accuracy Chart
Bar chart showing "Accuracy w/o params" and "Accuracy" for the following benchmark tasks:
CSV read, CSV write, CSV database, CSV delimiter, email login, email check, email send, email delete, FTP list, FTP login, FTP upload, FTP download, FTP delete, HTML scraping, HTML add node, HTML rm attr, HTML parse, HTML title, HTML write, HTTP get, HTTP post, HTTP server, NLP sentence, NLP token, NLP tag, NLP stem, ML classification, ML regression, ML cluster, ML neural network, graphics, gui, pdf read, pdf write, word read, word write
Limitations
Our refactoring method might not work as expected:
- Inaccurate API translation
- HTML Writing
- Word Reading/Writing
- GUI
- Long input API sequence
- Sending Email
- PDF Writing
Limitation Diagram
- A() --> x()
- B() --> y()
- C() (translation) --> z()
- D()
- E()
- F()
- G()
Conclusion
- Effective method that automates the process of API refactoring
- Combination of two techniques
- API call translation
- API call sequence synthesizer
- Does not work when
- Terminologies are different
- Input sequence is too long
Bibliography
- Amruta Gokhale, Vinod Ganapathy, and Yogesh Padmanaban. Inferring likely mappings between apis. In Proceedings of the 2013 International Conference on Software Engineering, pages 82-91. IEEE Press, 2013.
- Amruta Gokhale, Daeyoung Kim, and Vinod Ganapathy. Data-driven inference of api mappings. In Proceedings of the 2nd Workshop on Programming for Mobile & Touch, pages 29-32. ACM, 2014.
- Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. Neural sketch learning for conditional program generation. arXiv preprint arXiv:1703.05698, 2017.
- Rahul Pandita, Raoul Praful Jetley, Sithu D Sudarsan, and Laurie Williams. Discovering likely mappings between apis using text mining. In Source Code Analysis and Manipulation (SCAM), 2015 IEEE 15th International Working Conference on, pages 231-240. IEEE, 2015.
- Trong Duc Nguyen, Anh Tuan Nguyen, and Tien N Nguyen. Mapping api elements for code migration with vector representations. In Software Engineering Companion (ICSE-C), IEEE/ACM International Conference on, pages 756-758. IEEE, 2016.