Files
obsidian-yanxin/documents/academic/rice_engi601/lu_poster.md
Yanxin Lu dc66e42ec4 Normalize frontmatter taxonomy across 293 document transcriptions
Categories now match folder names (15 canonical values).
Types normalized to 25 canonical values per VAULT_MAP.md spec.
Context-aware mapping: W-2s→tax-form, lease files→lease, vet records→vet, etc.
2026-04-05 20:19:52 -07:00

3.5 KiB

type, category, person, date, source
type category person date source
academic academic Yanxin Lu 2018 lu_poster.pdf

Corpus-Driven API Refactoring

Yanxin Lu, Swarat Chaudhuri, Christopher Jermaine Department of Computer Science, Rice University

Introduction

  • Program rewrite or refactoring improves software maintainability.
  • Application programming interface (API) plays key role in everyday programming.
  • Automatically refactor an API call sequence
  • Translate the input API calls
  • Synthesize complete API call sequence

Code Example (Before - HtmlCleaner)

HtmlCleaner cleaner = new HtmlCleaner();
TagNode node = cleaner.clean(content);
TagNode[] links = node.getElementsHavingAttribu...
TagNode link = links[0];
String href = link.getAttributeByName(attr);

Code Example (After - Jsoup)

Document doc = Jsoup.parse(content);
Elements links = doc.select(selector);
Element link = links.first();
String href = link.attr(attr);

Methods

  • Translate the input API calls
  • Synthesize complete API call sequence

Algorithm Diagram

  • A() --> a() --> a()
  • B() --> b() --> b()
  • C() (API translation) --> c() (API synthesis) --> c()
  • D() --> d() --> d()
  • E() --> e() --> e()

Main Results

  • Refactoring accuracy on various input API call sequences
  • Accuracy: percentage of correct generated API calls

Accuracy Chart

Bar chart showing "Accuracy w/o params" and "Accuracy" for the following benchmark tasks:

CSV read, CSV write, CSV database, CSV delimiter, email login, email check, email send, email delete, FTP list, FTP login, FTP upload, FTP download, FTP delete, HTML scraping, HTML add node, HTML rm attr, HTML parse, HTML title, HTML write, HTTP get, HTTP post, HTTP server, NLP sentence, NLP token, NLP tag, NLP stem, ML classification, ML regression, ML cluster, ML neural network, graphics, gui, pdf read, pdf write, word read, word write

Limitations

Our refactoring method might not work as expected:

  • Inaccurate API translation
    • HTML Writing
    • Word Reading/Writing
    • GUI
  • Long input API sequence
    • Sending Email
    • PDF Writing

Limitation Diagram

  • A() --> x()
  • B() --> y()
  • C() (translation) --> z()
  • D()
  • E()
  • F()
  • G()

Conclusion

  • Effective method that automates the process of API refactoring
  • Combination of two techniques
    • API call translation
    • API call sequence synthesizer
  • Does not work when
    • Terminologies are different
    • Input sequence is too long

Bibliography

  • Amruta Gokhale, Vinod Ganapathy, and Yogesh Padmanaban. Inferring likely mappings between apis. In Proceedings of the 2013 International Conference on Software Engineering, pages 82-91. IEEE Press, 2013.
  • Amruta Gokhale, Daeyoung Kim, and Vinod Ganapathy. Data-driven inference of api mappings. In Proceedings of the 2nd Workshop on Programming for Mobile & Touch, pages 29-32. ACM, 2014.
  • Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. Neural sketch learning for conditional program generation. arXiv preprint arXiv:1703.05698, 2017.
  • Rahul Pandita, Raoul Praful Jetley, Sithu D Sudarsan, and Laurie Williams. Discovering likely mappings between apis using text mining. In Source Code Analysis and Manipulation (SCAM), 2015 IEEE 15th International Working Conference on, pages 231-240. IEEE, 2015.
  • Trong Duc Nguyen, Anh Tuan Nguyen, and Tien N Nguyen. Mapping api elements for code migration with vector representations. In Software Engineering Companion (ICSE-C), IEEE/ACM International Conference on, pages 756-758. IEEE, 2016.