A Library for Representing Python Programs as Graphs for Machine Learning

08/15/2022
by   David Bieber, et al.
40

Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite “program graphs” that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset