Discovering Likely Mappings between APIs using Text Mining by Rahul Pandita, Raoul Praful Jetley, Sithu D Sudarsan, Laurie Williams.
Abstract:
Developers often release different versions of their applications to support various platform/programming-language application programming interfaces (APIs). To migrate an application written using one API (source) to another API (target), a developer must know how the methods in the source API map to the methods in the target API. Given a typical platform or language exposes a large number of API methods, manually writing API mappings is prohibitively resource-intensive and may be error prone. Recently, researchers proposed to automate the mapping process by mining API mappings from existing codebases. However, these approaches require as input a manually ported (or at least functionally similar) code across source and target APIs. To address the shortcoming, this paper proposes TMAP: Text Mining based approach to discover likely API mappings using the similarity in the textual description of the source and target API documents. To evaluate our approach, we used TMAP to discover API mappings for 15 classes across: 1) Java and C# API, and 2) Java ME and Android API. We compared the discovered mappings with state-of-the-art source code analysis based approaches: Rosetta and StaMiner. Our results indicate that TMAP on average found relevant mappings for 57% more methods compared to previous approaches. Furthermore, our results also indicate that TMAP on average found exact mappings for 6.5 more methods per class with a maximum of 21 additional exact mappings for a single class as compared to previous approaches.
From the introduction:
Our intuition is: since the API documents are targeted towards developers, there may be an overlap in the language used to describe similar concepts that can be leveraged.
There are a number of insights in this paper but this statement of intuition alone is enough to justify reading the paper.
What if instead of API documents we were talking about topics that had been written for developers? Isn’t it fair to assume that concepts would have the same or similar vocabularies?
The evidence from this paper certainly suggests that to be the case.
Of course, merging rules would have to allow for “likely” merging of topics, which could then be refined by readers.
Readers who hopefully contribute more information to make “likely” merging more “precise.” (At least in their view.)
That’s one of the problems with most semantic technologies isn’t it?
“Precision” can only be defined from a point of view, which by definition varies from user to user.
What would it look like to allow users to determine their desired degree of semantic precision?
Suggestions?