Analyzing 2.3 Million Maven Dependencies to Reveal an Essential Core in APIs
This paper addresses the following question: does a small, essential, core set of API members emerges from the actual usage of the API by client applications? To investigate this question, we study the 99 most popular libraries available in Maven Central and the 865,560 client programs that declare dependencies towards them, summing up to 2.3M dependencies. Our key findings are as follows: 43.5 not used in the bytecode; all APIs contain a large part of rarely used types and a few frequently used types, and the ratio varies according to the nature of the API, its size and its design; we can systematically extract a reuse-core from APIs that is sufficient to provide for most clients, the median size of this subset is 17 novel both in its scale and its findings about unused dependencies and the reuse-core of APIs. Our results provide concrete insights to improve Maven's build process with a mechanism to detect unused dependencies. They also support the need to reduce the size of APIs to facilitate API learning and maintenance.
READ FULL TEXT