Type4Py: Deep Similarity Learning-Based Type Inference for Python

Overview

Type4Py: Deep Similarity Learning-Based Type Inference for Python

GH Workflow

This repository contains the implementation of Type4Py and instructions for re-producing the results of the paper.

Dataset

For Type4Py, we use the ManyTypes4Py dataset. You can download the latest version of the dataset here. Also, note that the dataset is already de-duplicated.

Code De-deduplication

If you want to use your own dataset, it is essential to de-duplicate the dataset by using a tool like CD4Py.

Installation Guide

Requirements

  • Linux-based OS
  • Python 3.5 or newer
  • An NVIDIA GPU with CUDA support

Quick Install

git clone https://github.com/saltudelft/type4py.git && cd type4py
pip install .

Usage Guide

Follow the below steps to train and evaluate the Type4Py model.

1. Extraction

NOTE: Skip this step if you're using the ManyTypes4Py dataset.

$ type4py extract --c $DATA_PATH --o $OUTPUT_DIR --d $DUP_FILES --w $CORES

Description:

  • $DATA_PATH: The path to the Python corpus or dataset.
  • $OUTPUT_DIR: The path to store processed projects.
  • $DUP_FILES: The path to the duplicate files, i.e., the *.jsonl.gz file produced by CD4Py. [Optional]
  • $CORES: Number of CPU cores to use for processing projects.

2. Preprocessing

$ type4py preprocess --o $OUTPUT_DIR --l $LIMIT

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects. For the MT4Py dataset, use the directory in which the dataset is extracted.
  • $LIMIT: The number of projects to be processed. [Optional]

3. Vectorizing

$ type4py vectorize --o $OUTPUT_DIR

Description:

  • $OUTPUT_DIR: The path that was used in the previous step to store processed projects.

4. Learning

$ type4py learn --o $OUTPUT_DIR --c --p $PARAM_FILE

Description:

  • $OUTPUT_DIR: The path that was used in the previous step to store processed projects.

  • --c: Trains the complete model. Use type4py learn -h to see other configurations.

  • --p $PARAM_FILE: The path to user-provided hyper-parameters for the model. See this file as an example. [Optional]

5. Testing

$ type4py predict --o $OUTPUT_DIR --c

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects.
  • --c: Predicts using the complete model. Use type4py predict -h to see other configurations.

6. Evaluating

$ type4py eval --o $OUTPUT_DIR --t c --tp 10

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects.
  • --t: Evaluates the model considering different prediction tasks. E.g., --t c considers all predictions tasks, i.e., parameters, return, and variables. [Default: c]
  • --tp 10: Considers Top-10 predictions for evaluation. For this argument, You can choose a positive integer between 1 and 10. [Default: 10]

Use type4py eval -h to see other options.

Converting Type4Py to ONNX

To convert the pre-trained Type4Py model to the ONNX format, use the following command:

$ type4py to_onnx --o $OUTPUT_DIR

Description:

  • $OUTPUT_DIR: The path that was used in the usage section to store processed projects and the model.

VSCode Extension

vsm-version

Type4Py can be used in VSCode, which provides ML-based type auto-completion for Python files. The Type4Py's VSCode extension can be installed from the VS Marketplace here.

Type4Py Server

GH Workflow

The Type4Py server is deployed on our server, which exposes a public API and powers the VSCode extension. However, if you would like to deploy the Type4Py server on your own machine, you can adapt the server code here. Also, please feel free to reach out to us for deployment, using the pre-trained Type4Py model and how to train your own model by creating an issue.

Citing Type4Py

@article{mir2021type4py,
  title={Type4Py: Deep Similarity Learning-Based Type Inference for Python},
  author={Mir, Amir M and Latoskinas, Evaldas and Proksch, Sebastian and Gousios, Georgios},
  journal={arXiv preprint arXiv:2101.04470},
  year={2021}
}
Owner
Software Analytics Lab
Software Analytics Lab @ TU Delft
Software Analytics Lab
This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network.

GPRGNN This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network. Hidden state feature extraction i

Jianhao 92 Jan 03, 2023
SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

seanZhuh 76 Dec 24, 2022
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022
Machine Learning in Asset Management (by @firmai)

Machine Learning in Asset Management If you like this type of content then visit ML Quant site below: https://www.ml-quant.com/ Part One Follow this l

Derek Snow 1.5k Jan 02, 2023
LAnguage Model Analysis

LAMA: LAnguage Model Analysis LAMA is a probe for analyzing the factual and commonsense knowledge contained in pretrained language models. The dataset

Meta Research 960 Jan 08, 2023
A particular navigation route using satellite feed and can help in toll operations & traffic managemen

How about adding some info that can quanitfy the stress on a particular navigation route using satellite feed and can help in toll operations & traffic management The current analysis is on the satel

Ashish Pandey 1 Feb 14, 2022
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
[Official] Exploring Temporal Coherence for More General Video Face Forgery Detection(ICCV 2021)

Exploring Temporal Coherence for More General Video Face Forgery Detection(FTCN) Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, Fang Wen Accepted b

57 Dec 28, 2022
Pytorch Implementation of rpautrat/SuperPoint

SuperPoint-Pytorch (A Pure Pytorch Implementation) SuperPoint: Self-Supervised Interest Point Detection and Description Thanks This work is based on:

76 Dec 27, 2022
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022
PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

Yonglong Tian 2.2k Jan 08, 2023
Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision Project | PDF | Poster Fangyu Li, N. Dinesh Reddy, X

25 Dec 21, 2022
Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

NLP_0-project Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures1. We are a "democratic" and c

3 Mar 16, 2022
Official PyTorch Implementation of Hypercorrelation Squeeze for Few-Shot Segmentation, arXiv 2021

Hypercorrelation Squeeze for Few-Shot Segmentation This is the implementation of the paper "Hypercorrelation Squeeze for Few-Shot Segmentation" by Juh

Juhong Min 165 Dec 28, 2022
RL-driven agent playing tic-tac-toe on starknet against challengers.

tictactoe-on-starknet RL-driven agent playing tic-tac-toe on starknet against challengers. GUI reference: https://pythonguides.com/create-a-game-using

21 Jul 30, 2022
Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.

Meet Fidesops: Privacy as Code for DSAR Orchestration A part of the greater Fides ecosystem. ⚡ Overview Fidesops (fee-dez-äps, combination of the Lati

Ethyca 44 Dec 06, 2022
Learning hierarchical attention for weakly-supervised chest X-ray abnormality localization and diagnosis

Hierarchical Attention Mining (HAM) for weakly-supervised abnormality localization This is the official PyTorch implementation for the HAM method. Pap

Xi Ouyang 22 Jan 02, 2023
PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

PyAF (Python Automatic Forecasting) PyAF is an Open Source Python library for Automatic Forecasting built on top of popular data science python module

CARME Antoine 405 Jan 02, 2023
Disentangled Lifespan Face Synthesis

Disentangled Lifespan Face Synthesis Project Page | Paper Demo on Colab Preparation Please follow this github to prepare the environments and dataset.

何森 50 Sep 20, 2022
A PyTorch Implementation of Single Shot Scale-invariant Face Detector.

S³FD: Single Shot Scale-invariant Face Detector A PyTorch Implementation of Single Shot Scale-invariant Face Detector. Eval python wider_eval_pytorch.

carwin 235 Jan 07, 2023