All notable changes to the LibSA4Py tool will be documented in this file. The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Adds the
CountParametricTypeDepth
visitor to count the depth of parametric types.
- Fixed an issue where type annotations for local variables with the same name in methods were not applied (
TypeAlpplier
).
- Adding the
--tc
CLI arg for theprocess
command to type-check source files in Python projects using mypy. - Supporting qualified names for classes by adding the
q_name
field to the JSON output. - Adding line and column no. info for the start and end of:
- Functions definitions
- Class definitions
- Module-level variables
- Class variables
- Functions' local variables
- Improved the performance of the NLP preprocessing quite significantly (~10x).
- Adds a CST visitor (
TypeAnnotationCounter
) for counting the total number of type annotations in source files.
- When applying types to functions' parameters, parameters with the default value of lambdas causes an exception for matching functions' signature.
- Python 3.6 is no longer supported.
- Integrated pyre into the pipeline for inferring the types of variables in source code files of given projects
- Extracting the usage of module constants, class variables, and local variables across a source code file (Contextual hint)
- Adding extensive units tests for testing various features and components of LibSA4Py.
- Minor improvement to the
SpaceAdder
code transformer for handling some rare edge cases. - The
--l
CLI arg formerge
option to process a specified number of projects. - Adding a CST transformer to reduce the depth of parametric types.
- Extracting type annotations with a qualified name from source code files.
- Adding
apply
command to apply inferred type annotations to source code files. - Adding a qualified name for functions in the JSON field
q_name
. - Adding line and column no. for the start and end of functions in the JSON field
fn_lc
.
- Malformed output sequences containing string literal type, i.e.,
Literal['Blah \n blah']
. - Malformed output sequence for the Equal operator (i.e.
==
) in comparisons - Extracting self variables in multi assignments expressions like
self.x,self.y=1,2
. - Replacing imaginary numbers with the
[numeric]
token.
- Removing the unused
input_projects
argument from thePipeline
class.
- Parallel pipeline to speed up processing a Python dataset using all CPU cores
- Storing processed Python projects in JSON-formatted files.
- Excluding duplicate files of a dataset from processing.
- Add file set (train/test/validation) to processed project if given.
- Applying standard NLP operations on identifies in a module.
- Excluding cached projects before running the pipeline if specified.
- Throwing
NullProjectException
for projects that have no source code files.
- Creating a normalized Seq2Seq representation of a source code file aligned with a sequence of identifiers' type.
- Extracting import names of a module.
- Extracting the name of global variables in a module with their type annotations (if present).
- Calculating type annotation coverage for the whole project and its source code files.
- Extracting the name of classes in a module.
- Extracting the name of class variables and their type annotation (if present).
- Extracting the name of functions in a module or in a class.
- Extracting the name of functions' parameters and their type annotations (if present).
- Extracting return expressions in functions.
- Extracting the occurrence of a function's parameters in the function's body.
- Extracting the return type of functions (if present).
- Extracting docstring for functions' parameters and their return type.
- Extracting short and long descriptions of functions in their docstring.
- Adding space around source code tokens for better tokenization.
- Removing comment and docstring from source code for its normalized Seq2Seq representation.
- Removing string literals from source code for its normalized Seq2Seq representation.
- Removing numeric literals from source code for its normalized Seq2Seq representation.
- Removing type annotations from source code for its normalized Seq2Seq representation.
- Propagating the type of functions' parameters in the function body and module-level constants.
- A special case where uninitialized variables with types caused exceptions.
- A case where variables in a tuple couldn't be extracted in multiple assignments.
- Handling nested tuples in multiple assignments for extracting var names.
- A case where a type-annotated class attribute is not initialized for removing its type.