Dataflow-Guided Neuro-Symbolic Language Models for Type Inference

Publication
The Forty-second International Conference on Machine Learning (ICML)

Abstract

Language Models (LMs) are increasingly used for type inference, aiding in error detection and software development. Some real-life deployments of LMs require the model to run on local machines to safeguard software’s intellectual property. This setting often limits the size of the LMs that can be used. We present Nester, a neuro-symbolic approach to enhance LMs for type inference by integrating symbolic learning without increasing model size. Nester breaks type inference into sub-tasks based on the data and control flow of the input code, encoding them as a modular high-level program. This program executes multi-step actions, such as evaluating expressions and analyzing conditional branches of the target program, combining static typing with LMs to infer potential types. Evaluated on the ManyTypes4Py dataset in Python, Nester outperforms two state-of-the-art type inferencing methods (HiTyper and TypeGen), achieving 70.7% Top-1 Exact Match, 18.3% and 3.6% higher than HiTyper and TypeGen, respectively. For complex type annotations using constructs like ‘’typing.Optional’’ and ‘’typing.Union’’, Nester achieves 53.2%, surpassing TypeGen by 17.9%.

Zheng Wang
Zheng Wang
Professor of Intelligent Software Technology