A framework for Quantification written in Python

Last update: Dec 14, 2022

Related tags

Overview

QuaPy

QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify) written in Python.

QuaPy is based on the concept of "data sample", and provides implementations of the most important aspects of the quantification workflow, such as (baseline and advanced) quantification methods, quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols used for evaluating quantification methods. QuaPy also makes available commonly used datasets, and offers visualization tools for facilitating the analysis and interpretation of the experimental results.

Installation

pip install quapy

A quick example:

The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the Adjusted Classify & Count quantification method, using, as the evaluation measure, the Mean Absolute Error (MAE) between the predicted and the true class prevalence values of the test set.

import quapy as qp
from sklearn.linear_model import LogisticRegression

dataset = qp.datasets.fetch_twitter('semeval16')

# create an "Adjusted Classify & Count" quantifier
model = qp.method.aggregative.ACC(LogisticRegression())
model.fit(dataset.training)

estim_prevalence = model.quantify(dataset.test.instances)
true_prevalence  = dataset.test.prevalence()

error = qp.error.mae(true_prevalence, estim_prevalence)

print(f'Mean Absolute Error (MAE)={error:.3f}')

Quantification is useful in scenarios characterized by prior probability shift. In other words, we would be little interested in estimating the class prevalence values of the test set if we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the class prevalence of the training set. For this reason, any quantification model should be tested across many samples, even ones characterized by class prevalence values different or very different from those found in the training set. QuaPy implements sampling procedures and evaluation protocols that automate this workflow. See the Wiki for detailed examples.

Features

Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization, quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
Versatile functionality for performing evaluation based on artificial sampling protocols.
Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
Datasets frequently used in quantification (textual and numeric), including:
- 32 UCI Machine Learning datasets.
- 11 Twitter quantification-by-sentiment datasets.
- 3 product reviews quantification-by-sentiment datasets.
Native support for binary and single-label multiclass quantification scenarios.
Model selection functionality that minimizes quantification-oriented loss functions.
Visualization tools for analysing the experimental results.

Requirements

scikit-learn, numpy, scipy
pytorch (for QuaNet)
svmperf patched for quantification (see below)
joblib
tqdm
pandas, xlrd
matplotlib

SVM-perf with quantification-oriented losses

In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD), SVM(AE), or SVM(RAE), you have to first download the svmperf package, apply the patch svm-perf-quantification-ext.patch, and compile the sources. The script prepare_svmperf.sh does all the job. Simply run:

./prepare_svmperf.sh

The resulting directory svm_perf_quantification contains the patched version of svmperf with quantification-oriented losses.

The svm-perf-quantification-ext.patch is an extension of the patch made available by Esuli et al. 2015 that allows SVMperf to optimize for the Q measure as proposed by Barranquero et al. 2015 and for the KLD and NKLD measures as proposed by Esuli et al. 2015. This patch extends the above one by also allowing SVMperf to optimize for AE and RAE.

Wiki

Check out our Wiki, in which many examples are provided:

Comments

Couldn't train QuaNet on multiclass data
Hi, I am having trouble in training a QuaNet quantifier for multiclass (20) data. Everything works fine with where my dataset only has 2 classes. It looks like the ACC quantifier is not able to aggregate from more than 2 classes?

The classifier is built and trained as with the code below

classifier = LSTMnet(dataset.vocabulary_size, dataset.n_classes) learner = NeuralClassifierTrainer(classifier) learner.fit(*dataset.training.Xy)

where it has all the default configurations

{'embedding_size': 100, 'hidden_size': 256, 'repr_size': 100, 'lstm_class_nlayers': 1, 'drop_p': 0.5}

Then I tried to train QuaNet with following code

model = QuaNetTrainer(learner, qp.environ['SAMPLE_SIZE']) model.fit(dataset.training, fit_learner=False)

and it showed that QuaNet is built as

QuaNetModule( (lstm): LSTM(120, 64, batch_first=True, dropout=0.5, bidirectional=True) (dropout): Dropout(p=0.5, inplace=False) (ff_layers): ModuleList( (0): Linear(in_features=208, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=512, bias=True) ) (output): Linear(in_features=512, out_features=20, bias=True) )

And then the error occured in model.fit().

Attached is the error I get.

Traceback (most recent call last): File "quanet-test.py", line 181, in model.fit(dataset.training, fit_learner=False) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 126, in fit self.epoch(train_data_embed, train_posteriors, self.tr_iter, epoch_i, early_stop, train=True) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 182, in epoch quant_estims = self.get_aggregative_estims(sample_posteriors) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 145, in get_aggregative_estims prevs_estim.extend(quantifier.aggregate(predictions)) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/aggregative.py", line 238, in aggregate return ACC.solve_adjustment(self.Pte_cond_estim_, prevs_estim) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/aggregative.py", line 246, in solve_adjustment adjusted_prevs = np.linalg.solve(A, B) File "<array_function internals>", line 6, in solve File "/usr/local/lib64/python3.6/site-packages/numpy/linalg/linalg.py", line 394, in solve r = gufunc(a, b, signature=signature, extobj=extobj) ValueError: solve1: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m)->(m) (size 2 is different from 20)

Thank you!
opened by vickysvicky 4
Parameter fit_learner in QuaNetTrainer (fit method)

The parameter fit_leaner is not used in the function:

def fit(self, data: LabelledCollection, fit_learner=True):

and the learner is fitted every time:

self.learner.fit(*classifier_data.Xy)

opened by pglez82 1
Wiki correction

In the last part of the Methods wiki page, where it says:

from classification.neural import NeuralClassifierTrainer, CNNnet

I think it should say:

from quapy.classification.neural import NeuralClassifierTrainer, LSTMnet

opened by pglez82 1

Error in LSTMnet

I think there is the function init_hidden:

def init_hidden(self, set_size):
        opt = self.hyperparams
        var_hidden = torch.zeros(opt['lstm_nlayers'], set_size, opt['lstm_hidden_size'])
        var_cell = torch.zeros(opt['lstm_nlayers'], set_size, opt['lstm_hidden_size'])
        if next(self.lstm.parameters()).is_cuda:
            var_hidden, var_cell = var_hidden.cuda(), var_cell.cuda()
        return var_hidden, var_cell

Where it says opt['lstm_hidden_size'] should be opt['hidden_size']

opened by pglez82 1

EMQ can be instantiated with a transformation function
This transformation function is applied to each intermediate estimate.

Why should someone want to transform the prior between two iterations? A transformation of the prior is a heuristic, yet effective way of promoting desired properties of the solution. For instance,

small values could be enhanced if the data is extremely imbalanced

small values could be reduced if the user is looking for a sparse solution

neighboring values could be averaged if the user is looking for a smooth solution

the function could also leave the prior unaltered and just be used as a callback for logging the progress of the method

I hope this feature is useful. Let me know what you think!
opened by mirkobunse 0
fixing two problems with parameters: hidden_size and lstm_nlayers

I found another problem with a parameter. When using LSTMnet with QuaNet two parameters overlap (lstm_nlayers). I have renamed the one in the LSTMnet to lstm_class_nlayers.

opened by pglez82 0
Using a different gpu than cuda:0

The code seems to be tied up to using only 'cuda', which by default uses the first gpu in the system ('cuda:0'). It would be handy to be able to tell the library in which cuda gpu you want to train (cuda:0, cuda:1, etc).

opened by pglez82 0

Releases(0.1.6)

0.1.6(Nov 2, 2021)

Source code(tar.gz)
Source code(zip)

Owner

The Human Language Technologies group of ISTI-CNR

GitHub Repository

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.

7 Jul 27, 2022

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

56 Dec 12, 2022

Release of the ConditionalQA dataset

ConditionalQA Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset

14 Oct 17, 2022

ReAct: Out-of-distribution Detection With Rectified Activations

ReAct: Out-of-distribution Detection With Rectified Activations This is the source code for paper ReAct: Out-of-distribution Detection With Rectified

38 Dec 05, 2022

Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

Disentangle Your Dense Object Detector This repo contains the supported code and configuration files to reproduce object detection results of Disentan

51 Jan 07, 2023

ParmeSan: Sanitizer-guided Greybox Fuzzing

ParmeSan: Sanitizer-guided Greybox Fuzzing ParmeSan is a sanitizer-guided greybox fuzzer based on Angora. Published Work USENIX Security 2020: ParmeSa

158 Dec 31, 2022

Official code of paper: MovingFashion: a Benchmark for the Video-to-Shop Challenge

SEAM Match-RCNN Official code of MovingFashion: a Benchmark for the Video-to-Shop Challenge paper Installation Requirements: Pytorch 1.5.1 or more rec

31 Oct 10, 2022

Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

PEBAL This repo contains the Pytorch implementation of our paper: Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentation on Complex Urb

117 Jan 03, 2023

Keras-1D-ACGAN-Data-Augmentation

Keras-1D-ACGAN-Data-Augmentation What is the ACGAN(Auxiliary Classifier GANs) ? Related Paper : [Abstract : Synthesizing high resolution photorealisti

7 Dec 23, 2022

Image classification for projects and researches

This is a tool to help you quickly solve classification problems including: data analysis, training, report results and model explanation.

2 Dec 27, 2021

The project page of paper: Architecture disentanglement for deep neural networks [ICCV 2021, oral]

This is the project page for the paper: Architecture Disentanglement for Deep Neural Networks, Jie Hu, Liujuan Cao, Tong Tong, Ye Qixiang, ShengChuan

15 Aug 30, 2022

A short and easy PyTorch implementation of E(n) Equivariant Graph Neural Networks

Simple implementation of Equivariant GNN A short implementation of E(n) Equivariant Graph Neural Networks for HOMO energy prediction. Just 50 lines of

97 Dec 23, 2022

AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models Description

0 Jun 08, 2022

A framework for Quantification written in Python

Related tags

Overview

QuaPy

Installation

A quick example:

Features

Requirements

SVM-perf with quantification-oriented losses

Wiki

Comments

Couldn't train QuaNet on multiclass data

Parameter fit_learner in QuaNetTrainer (fit method)

Wiki correction

Error in LSTMnet

EMQ can be instantiated with a transformation function

fixing two problems with parameters: hidden_size and lstm_nlayers

Using a different gpu than cuda:0

Releases(0.1.6)

0.1.6(Nov 2, 2021)

Owner

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

Release of the ConditionalQA dataset

ReAct: Out-of-distribution Detection With Rectified Activations

Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

ParmeSan: Sanitizer-guided Greybox Fuzzing

Official code of paper: MovingFashion: a Benchmark for the Video-to-Shop Challenge

Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

Keras-1D-ACGAN-Data-Augmentation

Image classification for projects and researches

The project page of paper: Architecture disentanglement for deep neural networks [ICCV 2021, oral]

A short and easy PyTorch implementation of E(n) Equivariant Graph Neural Networks

AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

Pun Detection and Location

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Code for "Adversarial Attack Generation Empowered by Min-Max Optimization", NeurIPS 2021

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

License Plate Detection Application

The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.