PECOS - Prediction for Enormous and Correlated Spaces

Overview

PECOS - Predictions for Enormous and Correlated Output Spaces

PyPi Latest Release License

PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval. PECOS' design is intentionally agnostic to the specific nature of the inputs and outputs as it is envisioned to be a general-purpose framework for multiple distinct applications.

Given an input, PECOS identifies a small set (10-100) of relevant outputs from amongst an extremely large (~100MM) candidate set and ranks these outputs in terms of relevance.

Features

Extreme Multi-label Ranking and Classification

  • X-Linear (pecos.xmc.xlinear): recursive linear models learning to traverse an input from the root of a hierarchical label tree to a few leaf node clusters, and return top-k relevant labels within the clusters as predictions. See more details in the PECOS paper (Yu et al., 2020).

    • fast real-time inference in C++
    • can handle 100MM output space
  • XR-Transformer (pecos.xmc.xtransformer): Transformer based XMC framework that fine-tunes pre-trained transformers recursively on multi-resolution objectives. It can be used to generate top-k relevant labels for a given instance or simply as a fine-tuning engine for task aware embeddings. See technical details in XR-Transformer paper (Zhang et al., 2021).

    • easy to extend with many pre-trained Transformer models from huggingface transformers.
    • establishes the State-of-the-art on public XMC benchmarks.
  • ANN Search with HNSW (pecos.ann.hnsw): a PECOS Approximated Nearest Neighbor (ANN) search module that implements the Hierarchical Navigable Small World Graphs (HNSW) algorithm (Malkov et al., TPAMI 2018).

    • Supports both sparse and dense input features
    • SIMD optimization for both dense/sparse distance computation
    • Supports thread-safe graph construction in parallel on multi-core shared memory machines
    • Supports thread-safe Searchers to do inference in parallel, which reduces inference overhead

Requirements and Installation

  • Python (>=3.6)
  • Pip (>=19.3)

See other dependencies in setup.py You should install PECOS in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

Supporting Platforms

  • Ubuntu 18.04 and 20.04
  • Amazon Linux 2

Installation from Wheel

PECOS can be installed using pip as follows:

python3 -m pip install libpecos

Installation from Source

Prerequisite builder tools

  • For Ubuntu (18.04, 20.04):
sudo apt-get update && sudo apt-get install -y build-essential git python3 python3-distutils python3-venv
  • For Amazon Linux 2 Image:
sudo yum -y install python3 python3-devel python3-distutils python3-venv && sudo yum -y install groupinstall 'Development Tools' 

One needs to install at least one BLAS library to compile PECOS, e.g. OpenBLAS:

  • For Ubuntu (18.04, 20.04):
sudo apt-get install -y libopenblas-dev
  • For Amazon Linux 2 Image and AMI:
sudo amazon-linux-extras install epel -y
sudo yum install openblas-devel -y

Install and develop locally

git clone https://github.com/amzn/pecos
cd pecos
python3 -m pip install --editable ./

Quick Tour

To have a glimpse of how PECOS works, here is a quick tour of using PECOS API for the XMR problem.

Toy Example

The eXtreme Multi-label Ranking (XMR) problem is defined by two matrices

Some toy data matrices are available in the tst-data folder.

PECOS constructs a hierarchical label tree and learns linear models recursively (e.g., XR-Linear):

>>> from pecos.xmc.xlinear.model import XLinearModel
>>> from pecos.xmc import Indexer, LabelEmbeddingFactory

# Build hierarchical label tree and train a XR-Linear model
>>> label_feat = LabelEmbeddingFactory.create(Y, X)
>>> cluster_chain = Indexer.gen(label_feat)
>>> model = XLinearModel.train(X, Y, C=cluster_chain)
>>> model.save("./save-models")

After learning the model, we do prediction and evaluation

>>> from pecos.utils import smat_util
>>> Yt_pred = model.predict(Xt)
# print precision and recall at k=10
>>> print(smat_util.Metrics.generate(Yt, Yt_pred))

PECOS also offers optimized C++ implementation for fast real-time inference

>>> model = XLinearModel.load("./save-models", is_predict_only=True)
>>> for i in range(X_tst.shape[0]):
>>>   y_tst_pred = model.predict(X_tst[i], threads=1)

Citation

If you find PECOS useful, please consider citing the following paper:

Some papers from our group using PECOS:

License

Copyright (2021) Amazon.com, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • text2text model evaluation not working

    text2text model evaluation not working

    Description

    Model evaluation is not working properly to output the precision and recall

    How to Reproduce?

    I run the following line of code,

    python3 -m pecos.apps.text2text.evaluate --pred-path ./test-prediction.txt --truth-path ./test.txt --text-item-path ./output-labels.txt
    

    where, --pred-path is the path of file produced during model prediction, --truth-path is the path of test file, e.g. Out1, Out2, Out3 \t cheap door Out1, Out2 and Out3 are the line number in the the following output file
    --text-item-path ./output-labels.txt

    What have you tried to solve it?

    Error message or code output

    Traceback (most recent call last):
      File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 130, in <module>
        do_evaluation(args)
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 119, in do_evaluation
        Y_true = smat.csr_matrix((val_t, (row_id_t, col_id_t)), shape=(num_samples_t, len(item_dict)))
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 55, in __init__
        dtype=dtype))
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 196, in __init__
        self._check()
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 285, in _check
        raise ValueError('column index exceeds matrix dimensions')
    ValueError: column index exceeds matrix dimensions
    

    Environment

    • Operating system:
    • Python version:
    • PECOS version:

    (Add as much information about your environment as possible, e.g. dependencies versions.)

    bug 
    opened by Khalid-Usman 13
  • Format of yt label

    Format of yt label

    Hello,

    hope you are fine, have 2 questions about the format

    Question 1 Have one question about optimal format for label Yt. Is it preferable to have Yt as:

    (A) OneHot encoded with only one 1 per row. (B) Mutiple OneHot encode with mutiple 1 per rows (as this is the case for Xt).

    When the prediction is done, it seems only outputing only one 1 per row.

    Question 2:

    Is there any constraint by having Xt as having a mix of dense input and sparse input instead of sparse input only ?

    enhancement 
    opened by arita37 7
  • some formatting

    some formatting

    Hi, Thasnks for this.

    Just would like to confirm the format of the input

    X : CSR format x(i,k) = val Can valx be a float ? does it need to be binary or [0,1] value ?

    Y: CSR format, y(i,k) = valy . does it need to be binary ( 0 or 1) ?

    Thx

    opened by arita37 7
  • Online Inference Latency for XR-TRANSFORMER

    Online Inference Latency for XR-TRANSFORMER

    hi!

    When I use XR-TRANSFORMER for predict(per input), the online Inference lattency comes up to 400ms. this is why?

    the system I use is ubtuntu18.04, and XR-TRANSFORMER are evaluated on a Nvidia Tesla V100 GPU.

    Thanks!

    opened by xiaokening 6
  • Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}

    Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}

    Description

    I am trying to use ----label-embed-type parameter in the training and it produces this error. ValueError: Object arrays cannot be loaded when allow_pickle=False - Coming from np.load() function.

    I have tested loading of NPZ file for z_labels (compressed and uncompressed, both) it produces this error if allow_pickle=False I have load data by defining the allow_pickle=True for np.load() function.

    Can you please add description of this file format or can we sent this parameter as an input?

    This is the data I have after loading npz file with allow_pickle = True

    [array(['Trump', 'Bus', 'Trolly '], dtype='<U23')
     array(['Show', 'Disp'], dtype='<U20')
     array(['Recap rew'], dtype='<U24')
     array(['Core, '], dtype='<U32')
     array(['Hoe'], dtype='<U10')
     array(['Plan'], dtype='<U21')]
    

    How to Reproduce?

    Execute model training with numpy version 1.21.2

    python -m pecos.apps.text2text.train \
      --label-embed-type pifa_lf_concat::Z=${Z_pifa_file} \
      -i ${train_file} \ 
      -m ${model_folder}
    

    Error message or code output

    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 311, in <module>
        train(args)
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 302, in train
        workspace_folder=args.workspace_folder,
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/model.py", line 325, in train
        Z = smat_util.load_matrix(val)
      File "/home/jupyter/pecos_git/pecos/pecos/utils/smat_util.py", line 117, in load_matrix
        mat = np.load(src)
      File "/opt/conda/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
        pickle_kwargs=pickle_kwargs)
      File "/opt/conda/lib/python3.7/site-packages/numpy/lib/format.py", line 743, in read_array
        raise ValueError("Object arrays cannot be loaded when "
    ValueError: Object arrays cannot be loaded when allow_pickle=False
    

    Environment

    • Operating system: Unix Ubuntu (on GCP)
    • Python version: 3.8
    • PECOS version: 0.1.0
    • numpy version: 1.21.2
    bug 
    opened by zusmani 6
  • Pecos killed on ranker training step

    Pecos killed on ranker training step

    Description

    The training has been killed on this training step:

    Data: Amazon-670k Model: X-Transformer

    [2022-12-01 21:38:23,019][pecos.xmc.xtransformer.model][INFO] - Start training ranker...
    [2022-12-01 21:38:24,001][pecos.xmc.base][INFO] - Training Layer 0 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:39:05,191][pecos.xmc.base][INFO] - Training Layer 1 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:40:24,829][pecos.xmc.base][INFO] - Training Layer 2 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:43:25,293][pecos.xmc.base][INFO] - Training Layer 3 of 4 Layers in HierarchicalMLModel, neg_mining=tfn+man..
    

    Environment

    Distributor ID:	Ubuntu
    Description:	Ubuntu 18.04.6 LTS
    Release:	18.04
    Codename:	bionic
    Python 3.8.15
    libpecos~=0.4.0
    1 RTX A4500, 32 vCPU, and 250 GB RAM
    

    What could it be? Is it possible to resume training from that stage?

    bug 
    opened by celsofranssa 4
  • How to Use XR-Transformer in Text2Text App

    How to Use XR-Transformer in Text2Text App

    Description

    I want to use XR-Transformer in text2text app, following the parameters given here. But setting --params-path to this .json file raise the error:

    Traceback (most recent call last):
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 197, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 345, in <module>
        train(args)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 328, in train
        t2t_model = Text2Text.train(
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 317, in train
        pred_params = pred_params.override_with_kwargs(kwargs)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 126, in override_with_kwargs
        self.xlinear_params.override_with_kwargs(pred_kwargs)
    AttributeError: 'NoneType' object has no attribute 'override_with_kwargs'
    

    References

    enhancement 
    opened by lyy1994 4
  • Examples with text

    Examples with text

    Description

    Current example of X and Y only has numeric values. Could you please provide one example where X and Y are both text? Think the paper/method is targeted to solve such problems.

    enhancement 
    opened by xyan326 4
  • Add memory-mapped utilility module

    Add memory-mapped utilility module

    Issue #, if available: N/A

    Description of changes: Add memory-mapped utilility module.

    User could test with below code: Copy it into a filetest_mmap_util.cpp placed at pecos/core/util/, and run:

    gcc -lm -ldl -lstdc++ -fopenmp -std=c++14 -lgcc -lgomp -O3  -I ./ test_mmap_util.cpp
    ./a.out
    

    Output:

    Generate a Bar with data:
    ---Bar---
    ---Foo---
    foo_1: 0 1 2 3 4 5 6 7 8 9 
    foo_2: 1
    ---------
    bar: 5 5 5 5 5 
    ---------
    Save Bar into mmap file: ./bar_test_mmap.txt
    Load a new Bar from saved mmap file...
    Loaded Bar data:
    ---Bar---
    ---Foo---
    foo_1: 0 1 2 3 4 5 6 7 8 9 
    foo_2: 1
    ---------
    bar: 5 5 5 5 5 
    ---------
    
    #include <iostream>
    #include "mmap_util.hpp"
    
    using namespace pecos::mmap_util;
    
    // Nested class mmap example
    // Bar contains a Foo instance
    class Foo {
        public:
            Foo() {}
            ~Foo() { foo_1.clear(); }
    
            void init_data() {
                foo_1.resize(10, 0);
                for (int i=0; i<foo_1.size(); ++i) { foo_1[i] = i; }
                foo_2 = 1.0;
            }
    
            void print() {
                std::cout << "---Foo---" << std::endl;
                std::cout << "foo_1: ";
                for (int i=0; i<foo_1.size(); ++i) { std::cout << foo_1[i] << " "; }
                std::cout << std::endl;
                std::cout << "foo_2: " << foo_2 << std::endl;
                std::cout << "---------" << std::endl;
            }
    
            void save_to_mmap_store(MmapStore& mmap_s) const {
                foo_1.save_to_mmap_store(mmap_s);
                mmap_s.fput_one<double>(foo_2);
            }
    
            void load_from_mmap_store(MmapStore& mmap_s) {
                foo_1.load_from_mmap_store(mmap_s);
                foo_2 = mmap_s.fget_one<double>();
            }
    
        private:
            MmapableVector<int> foo_1;
            double foo_2;
    };
    
    class Bar {
        public:
            Bar() { }
            ~Bar() { bar.clear(); mmap_store.close(); }
    
            void init_data() {
                foo.init_data();
                bar.resize(5, 0);
                for (int i=0; i<bar.size(); ++i) { bar[i] = 5.0; }
            }
    
            void print() {
                std::cout << "---Bar---" << std::endl;
                foo.print();
                std::cout << "bar: ";
                for (int i=0; i<bar.size(); ++i) { std::cout << bar[i] << " "; }
                std::cout << std::endl;
                std::cout << "---------" << std::endl;
            }
    
            void save(const std::string & file_name) const {
                // Create a mmapfile for dump at the most outer layer class
                // You cannot reuse (i.e, close and reopen) mmap_store, since it may hold the data storage
                MmapStore mmap_s = MmapStore();
                mmap_s.open(file_name, "w");
    
                save_to_mmap_store(mmap_s);
    
                // Metadata dump and fp closure is automatically done at MmapStore destructor when this function ends
                // You can make it happen earlier with explicitly calling close()
                mmap_s.close();
            }
            void load(const std::string & file_name, const bool pre_load=true) {
                mmap_store.open(file_name, pre_load?"r":"r_lazy");
                load_from_mmap_store(mmap_store);
            }
    
            void save_to_mmap_store(MmapStore& mmap_s) const {
                foo.save_to_mmap_store(mmap_s);
                bar.save_to_mmap_store(mmap_s);
            }
    
            void load_from_mmap_store(MmapStore& mmap_s) {
                foo.load_from_mmap_store(mmap_s);
                bar.load_from_mmap_store(mmap_s);
            }
    
        private:
            Foo foo;
            MmapableVector<double> bar;
            // Mmap Data storage at the most outer layer class
            MmapStore mmap_store;
    };
    
    
    int main() {
        std::string f_name = "./bar_test_mmap.txt";
    
        std::cout << "Generate a Bar with data:" << std::endl;
        Bar bar;
        bar.init_data();
        bar.print();
    
        std::cout << "Save Bar into mmap file: " << f_name << std::endl;
        bar.save(f_name);
    
        std::cout << "Load a new Bar from saved mmap file..." << std::endl;
        Bar new_bar;
        new_bar.load(f_name, "r");
    
        std::cout << "Loaded Bar data:" << std::endl;
        new_bar.print();
    
        return 0;
    }
    

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by weiliw-amz 3
  • Is there at least one example showing how to use Pecos from a plain text dataset?

    Is there at least one example showing how to use Pecos from a plain text dataset?

    It has been difficult to infer how to use the PECOS properly. The usage case is splited over several README.md files and through the issues.

    Then, could you provide a toy example of an end-to-end approach (using XR-Transformer for instance)?

    Consider the following scenario: We have the training and testing samples in plain text

    #train samples:
        text: raw_text_1, labels: [L1, L7, ..., L3]
        text: raw_text_2, labels: [L8, L9]
        ...
        text: raw_text_N, labels: [L1, L7, ..., L4]
    
    #test samples:
        text: test_raw_text_1
        text: test_raw_text_2
        ...
        text: test_raw_text_M
    

    and someone has to:

    1. prepare the data to the accepted format;
    2. train the model;
    3. predict the top k labels.
    opened by celsofranssa 3
  • bug of installing from source

    bug of installing from source

    Description

    there is sonme problems when install pecos from source according to readme.md

    How to Reproduce?

    python3 -m pip install --editable ./ Obtaining file:///home/workspace/lishengchao/pecos Requirement already satisfied: scipy>=1.4.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.6.1) Requirement already satisfied: scikit-learn>=0.24.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (0.24.1) Requirement already satisfied: torch>=1.8.0 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.8.0) Collecting sentencepiece!=0.1.92,>=0.1.86 Using cached https://repo.huaweicloud.com/repository/pypi/packages/68/91/ded0f64f90abfc5413c620fc345a0aef1e7ff5addda8704cc6b3bf589c64/sentencepiece-0.1.96-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB) Requirement already satisfied: transformers>=4.1.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (4.8.2) Collecting numpy>=1.19.5 Using cached https://repo.huaweicloud.com/repository/pypi/packages/38/c0/c45c5eb0e25247d5fbb333fd0b56e570ba21cf0e3dca3abad174fb780e8c/numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (2.1.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (1.0.1) Requirement already satisfied: typing_extensions in /opt/conda/lib/python3.8/site-packages (from torch>=1.8.0->libpecos==0.3.0) (3.7.4.3) Collecting huggingface-hub==0.0.12 Downloading https://repo.huaweicloud.com/repository/pypi/packages/2f/ee/97e253668fda9b17e968b3f97b2f8e53aa0127e8807d24a547687423fe0b/huggingface_hub-0.0.12-py3-none-any.whl (37 kB) Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2021.4.4) Requirement already satisfied: sacremoses in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.0.45) Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2.24.0) Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (21.3) Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.10.3) Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (4.62.3) Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (3.0.12) Requirement already satisfied: pyyaml in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (5.4.1) Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (1.15.0) Requirement already satisfied: click in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (7.1.2) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (1.25.11) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2020.12.5) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->transformers>=4.1.1->libpecos==0.3.0) (3.0.6) Installing collected packages: sentencepiece, numpy, libpecos, huggingface-hub Attempting uninstall: numpy Found existing installation: numpy 1.19.2 Uninstalling numpy-1.19.2: Successfully uninstalled numpy-1.19.2 Running setup.py develop for libpecos ERROR: Command errored out with exit status 1: command: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps cwd: /home/workspace/lishengchao/pecos/ Complete output (28 lines): Set version to 0.3.0 running develop running egg_info creating libpecos.egg-info writing libpecos.egg-info/PKG-INFO writing dependency_links to libpecos.egg-info/dependency_links.txt writing requirements to libpecos.egg-info/requires.txt writing top-level names to libpecos.egg-info/top_level.txt writing manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.c' under directory 'pecos/core' writing manifest file 'libpecos.egg-info/SOURCES.txt' running build_ext building 'pecos.core.libpecos_float32' extension INFO: C compiler: gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

    creating build
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/pecos
    creating build/temp.linux-x86_64-3.8/pecos/core
    INFO: compile options: '-Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c'
    extra options: '-fopenmp -O3 -std=c++14'
    INFO: gcc: pecos/core/libpecos.cpp
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /tmp/ccNJQf5g.s: Assembler messages:
    /tmp/ccNJQf5g.s: Fatal error: can't close build/temp.linux-x86_64-3.8/pecos/core/libpecos.o: Input/output error
    error: Command "gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c pecos/core/libpecos.cpp -o build/temp.linux-x86_64-3.8/pecos/core/libpecos.o -fopenmp -O3 -std=c++14" failed with exit status 1
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

    Environment

    • Ubtuntu 18.04
    • Python 3.8
    • PECOS 0.3.0

    (Add as much information about your environment as possible, e.g. dependencies versions.)

    bug 
    opened by xiaokening 3
  • Memory-mapped XLinear Model

    Memory-mapped XLinear Model

    Issue #, if available: N/A

    Description of changes:

    • Memory-mapped PECOS XLinear model
      • Greatly reduce loading time.
      • Ideal for large models that user want to quickly try a few inferences without waiting for loading full model into memory.
      • Also capable for large model inference that could not be stored in memory.

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by weiliw-amz 0
Releases(v0.4.0)
  • v0.4.0(Aug 9, 2022)

    Highlights

    • Enable distributed XR-Transformer fine-tuning
    • Enable the capability of large-batch prediction for ANN HNSW
    • Release interactive hands-on tutorial materials

    Enhancements

    • Unit test for sorted_csc, sorted_csr by @chepingt in https://github.com/amzn/pecos/pull/139
    • Unit test for csr_row_softmax by @houyuhan98 in https://github.com/amzn/pecos/pull/141
    • Bump numpy from 1.21.0 to 1.22.0 by @dependabot in https://github.com/amzn/pecos/pull/145 https://github.com/amzn/pecos/pull/146
    • Release the materials for the PECOS hands-on tutorial in KDD 2022 by @hallogameboy in https://github.com/amzn/pecos/pull/153 https://github.com/amzn/pecos/pull/154 https://github.com/amzn/pecos/pull/161
    • Enable the capability of large-batch prediction for HNSW by @OctoberChang in https://github.com/amzn/pecos/pull/156
    • Distributed XR-Transformer fine-tuning by @jiong-zhang in https://github.com/amzn/pecos/pull/144 https://github.com/amzn/pecos/pull/162

    Bug Fixes

    • Fix argument-passing issue in smat_util.sorted_csc by @jiong-zhang in https://github.com/amzn/pecos/pull/134
    • Fix indptr overflow issue in block_diag_csr() by @OctoberChang in https://github.com/amzn/pecos/pull/136
    • Fix the yum group install command in README by @hallogameboy in https://github.com/amzn/pecos/pull/138
    • Change file names for windows compatibility by @YangyiLi001 in https://github.com/amzn/pecos/pull/143
    • Avoid triggering CodeQL on push for Dependabot branches by @weiliw-amz in https://github.com/amzn/pecos/pull/148
    • Fix Pypi release version error by @weiliw-amz in https://github.com/amzn/pecos/pull/163

    Deprecation

    • Deprecate imbalanced hierarchical K-means from clustering and semantic indexing by @hallogameboy in https://github.com/amzn/pecos/pull/151

    New Contributors

    • @chepingt made their first contribution in https://github.com/amzn/pecos/pull/139
    • @houyuhan98 made their first contribution in https://github.com/amzn/pecos/pull/141
    • @YangyiLi001 made their first contribution in https://github.com/amzn/pecos/pull/143
    • @xiusic made their first contribution in https://github.com/amzn/pecos/pull/147

    Full Changelog: https://github.com/amzn/pecos/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 1, 2022)

    Highlights

    • Enable distributed training for XLinear
    • Enable PECOS for aarch64(arm64) CPU Architecture
    • Enhance pecos.ann.hnsw with Function Multi-Versioning (FMV) technique to automatically select the best supported SIMD instructions (SSE, AVX2, AVX512) at runtime
    • Reduce CPU memory usage in pecos.xmc.xtransformer training

    Enhancements

    • Add distilbert model. by @mo-fu in https://github.com/amzn/pecos/pull/97
    • add CNAME by @jiong-zhang in https://github.com/amzn/pecos/pull/104
    • Bump numpy from 1.20.3 to 1.21.0 in /examples/qp2q by @dependabot in https://github.com/amzn/pecos/pull/110
    • enable Function Multi-Versioning (FMV) to support AVX512 by @rofuyu in https://github.com/amzn/pecos/pull/111
    • Modify supported Python version by @weiliw-amz in https://github.com/amzn/pecos/pull/113
    • Enabling PECOS for aarch64(arm64) CPU Architecture by @weiliw-amz in https://github.com/amzn/pecos/pull/114
    • Update OpenBLAS Version for x86 Wheel Build by @weiliw-amz in https://github.com/amzn/pecos/pull/117
    • SIMD Functions for aarch64(ARM64) by @weiliw-amz in https://github.com/amzn/pecos/pull/115
    • Add profile_util module by @weiliw-amz in https://github.com/amzn/pecos/pull/121
    • Fix FMV setup link flag and add test wheel CI by @weiliw-amz in https://github.com/amzn/pecos/pull/119
    • Fix xlinear.reconstruct_model; Add PII embedding by @weiliw-amz in https://github.com/amzn/pecos/pull/120
    • Add Distributed PECOS XLinear Modules by @weiliw-amz in https://github.com/amzn/pecos/pull/123
    • Add distributed PECOS README by @weiliw-amz in https://github.com/amzn/pecos/pull/127
    • update HNSW README and save/load in Python API by @OctoberChang in https://github.com/amzn/pecos/pull/129
    • Improve XR-Transformer memory efficiency by @jiong-zhang in https://github.com/amzn/pecos/pull/128

    Bug Fixes

    • properly set Text2Text prediction argument by @OctoberChang in https://github.com/amzn/pecos/pull/101
    • Fix HiearchicalMLModel pred-params initialization and add bugs by @weiliw-amz in https://github.com/amzn/pecos/pull/103
    • minor bug fix in XR-Transformer exp script by @jiong-zhang in https://github.com/amzn/pecos/pull/106
    • fixed multithreading bugs in py hierarchical kmeans by @OctoberChang in https://github.com/amzn/pecos/pull/108
    • set pytest of hierarchical kmeans with single thread by @OctoberChang in https://github.com/amzn/pecos/pull/109
    • Fix relative path in distributed README by @weiliw-amz in https://github.com/amzn/pecos/pull/130

    Experiment Codes for Publications

    • add overlap-clustering (Liu et al.) in NeurIPS21 by @xuanqing94 in https://github.com/amzn/pecos/pull/98
    • add MACLR codes by @xyh97 in https://github.com/amzn/pecos/pull/100
    • update experiment code for pecos jmlr paper by @OctoberChang in https://github.com/amzn/pecos/pull/107
    • update Philip's experiment code into example folder by @OctoberChang in https://github.com/amzn/pecos/pull/118

    New Contributors

    • @mo-fu made their first contribution in https://github.com/amzn/pecos/pull/97
    • @xuanqing94 made their first contribution in https://github.com/amzn/pecos/pull/98
    • @xyh97 made their first contribution in https://github.com/amzn/pecos/pull/100
    • @dependabot made their first contribution in https://github.com/amzn/pecos/pull/110

    Full Changelog: https://github.com/amzn/pecos/compare/v0.2.3...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.3(Nov 15, 2021)

  • v0.2.2(Nov 4, 2021)

  • v0.2.1(Oct 27, 2021)

    Highlights

    • Remove support of Ubuntu 16.04
    • Implemented XR-Transformer
    • Enabled HNSW functionality
    • Enabled cost-sensitive learning in PECOS

    Enhancements

    ANN HNSW

    • Initial implementation of HNSW in C++ with single-thread [#44] (@OctoberChang)
    • Refactor HNSW in C++ to support sparse/dense features and multi-threading [#49] (@rofuyu)
    • Initial implementation of HNSW Python interface [#53] (@OctoberChang)
    • Refactor HNSW python API and readme markdown [#63] (@OctoberChang)
    • Refactor HNSW C++ to reuse priority queue for different inference calls within the same Searcher [#65] (@rofuyu)
    • Enable HNSW save/load functionality [#71] (@OctoberChang)
    • Add serialization version in HNSW save/load [#77] (@rofuyu)
    • Enable HNSW python command line interface [#79] (@OctoberChang)

    Cost-sensitive Learning

    • Enable Cost-Sensitive Learning via XLinear API/CLI [#64] (@jiong-zhang)
    • Enable cost sensitive for text2text CLI [#75] (@jiong-zhang)

    XR-Transformer [#27, #64] (@jiong-zhang)

    • Refactor pecos.xmc.xtransformer and enable end2end XR-Transformer training
    • CLI tool for generating embeddings pecos.xmc.xtransformer.encode
    • Faster transformer text tokenizers using huggingface's C implementation
    • Allow training XR-Transformer without numerical features.

    Better control over parameters for XLinear, XTransformer and Text2text [#64, #78, #80] (@jiong-zhang)

    • Enable advanced control of parameters via JSON input file
    • Add utility tool to generate parameter skeleton for further modification

    Other new functionalities

    • Added support for predicting on select outputs [#37, #43, #47] (@bhl00)
    • Added new primal solver L2R_L2LOSS_SVC_PRIMAL for XLinear [#67] (@yuhchenlin)
    • Add Makefile for easy format, install, clean and unittest. [#12] (@weiliw-amz)

    Bug Fixes

    • (#17) Fixed issues with github information obtaining when installing from .zip. [#21, #29] (@weiliw-amz)
    • (#42) Fixed transformer training issue on single GPU [#14] (@jiong-zhang)
    • Removed PECOS source-installation dependency on NumPy BLAS library. [#81] (@weili-amz)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 26, 2021)

Owner
Amazon
Amazon
An Industrial Grade Federated Learning Framework

DOC | Quick Start | 中文 FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure comput

Federated AI Ecosystem 4.8k Jan 09, 2023
We simulate traveling back in time with a modern camera to rephotograph famous historical subjects.

[SIGGRAPH Asia 2021] Time-Travel Rephotography [Project Website] Many historical people were only ever captured by old, faded, black and white photos,

298 Jan 02, 2023
Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving This is the source code for our paper Frequency Domain Image Tran

Mu Cai 52 Dec 23, 2022
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

openpifpaf Continuously tested on Linux, MacOS and Windows: New 2021 paper: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Te

VITA lab at EPFL 50 Dec 29, 2022
Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes (CVPR 2021 Oral)

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Surfaces Official code release for NGLOD. For technical details, please refer t

659 Dec 27, 2022
Save-restricted-v-3 - Save restricted content Bot For telegram

Save restricted content Bot Contact: Telegram A stable telegram bot to get restr

DEVANSH 11 Dec 21, 2022
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Vansh Wassan 15 Jun 17, 2021
Machine Translation Implement By Bi-GRU And Transformer

Seq2Seq Translation Implement By Bidirectional GRU And Transformer In Pytorch Before You Run The Code You should download the data through the link be

He Wang 2 Oct 27, 2021
A tutorial on DataFrames.jl prepared for JuliaCon2021

JuliaCon2021 DataFrames.jl Tutorial This is a tutorial on DataFrames.jl prepared for JuliaCon2021. A video recording of the tutorial is available here

Bogumił Kamiński 106 Jan 09, 2023
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

170.1k Jan 04, 2023
Curvlearn, a Tensorflow based non-Euclidean deep learning framework.

English | 简体中文 Why Non-Euclidean Geometry Considering these simple graph structures shown below. Nodes with same color has 2-hop distance whereas 1-ho

Alibaba 123 Dec 12, 2022
Honours project, on creating a depth estimation map from two stereo images of featureless regions

image-processing This module generates depth maps for shape-blocked-out images Install If working with anaconda, then from the root directory: conda e

2 Oct 17, 2022
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

144 Dec 24, 2022
Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Storage Optimizer Identify potintial optimizations on the cloud storage accounts

Zaher Mousa 1 Feb 13, 2022
nnFormer: Interleaved Transformer for Volumetric Segmentation

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

jsguo 610 Dec 28, 2022
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

HTSeq DEVS: https://github.com/htseq/htseq DOCS: https://htseq.readthedocs.io A Python library to facilitate programmatic analysis of data from high-t

HTSeq 57 Dec 20, 2022
Genetic Programming in Python, with a scikit-learn inspired API

Welcome to gplearn! gplearn implements Genetic Programming in Python, with a scikit-learn inspired and compatible API. While Genetic Programming (GP)

Trevor Stephens 1.3k Jan 03, 2023
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement

Sincere 16 Dec 15, 2022
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

139 Jan 01, 2023