Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Overview

Pypi version Downloads Downloads Build Status Windows Build Status Join the chat at https://gitter.im/nmslib/Lobby

Non-Metric Space Library (NMSLIB)

Important Notes

  • NMSLIB is generic but fast, see the results of ANN benchmarks.
  • A standalone implementation of our fastest method HNSW also exists as a header-only library.
  • All the documentation (including using Python bindings and the query server, description of methods and spaces, building the library, etc) can be found on this page.
  • For generic questions/inquiries, please, use the Gitter chat: GitHub issues page is for bugs and feature requests.

Objectives

Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.

The goal of the project is to create an effective and comprehensive toolkit for searching in generic and non-metric spaces. Even though the library contains a variety of metric-space access methods, our main focus is on generic and approximate search methods, in particular, on methods for non-metric spaces. NMSLIB is possibly the first library with a principled support for non-metric space searching.

NMSLIB is an extendible library, which means that is possible to add new search methods and distance functions. NMSLIB can be used directly in C++ and Python (via Python bindings). In addition, it is also possible to build a query server, which can be used from Java (or other languages supported by Apache Thrift (version 0.12). Java has a native client, i.e., it works on many platforms without requiring a C++ library to be installed.

Authors: Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak. With contributions from Ben Frederickson, Lawrence Cayton, Wei Dong, Avrelin Nikita, Dmitry Yashunin, Bob Poekert, @orgoro, @gregfriedland, Scott Gigante, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.

Brief History

NMSLIB started as a personal project of Bilegsaikhan Naidan, who created the initial code base, the Python bindings, and participated in earlier evaluations. The most successful class of methods--neighborhood/proximity graphs--is represented by the Hierarchical Navigable Small World Graph (HNSW) due to Malkov and Yashunin (see the publications below). Other most useful methods, include a modification of the VP-tree due to Boytsov and Naidan (2013), a Neighborhood APProximation index (NAPP) proposed by Tellez et al. (2013) and improved by David Novak, as well as a vanilla uncompressed inverted file.

Credits and Citing

If you find this library useful, feel free to cite our SISAP paper [BibTex] as well as other papers listed in the end. One crucial contribution to cite is the fast Hierarchical Navigable World graph (HNSW) method [BibTex]. Please, also check out the stand-alone HNSW implementation by Yury Malkov, which is released as a header-only HNSWLib library.

License

The code is released under the Apache License Version 2.0 http://www.apache.org/licenses/. Older versions of the library include additional components, which have different licenses (but this does not apply to NMLISB 2.x):

Older versions of the library included the following components:

  • The LSHKIT, which is embedded in our library, is distributed under the GNU General Public License, see http://www.gnu.org/licenses/.
  • The k-NN graph construction algorithm NN-Descent due to Dong et al. 2011 (see the links below), which is also embedded in our library, seems to be covered by a free-to-use license, similar to Apache 2.
  • FALCONN library's licence is MIT.

Funding

Leonid Boytsov was supported by the Open Advancement of Question Answering Systems (OAQA) group and the following NSF grant #1618159: "Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond". Bileg was supported by the iAd Center.

Related Publications

Most important related papers are listed below in the chronological order:

Comments
  • Add support to build aarch64 wheels

    Add support to build aarch64 wheels

    Travis-CI allows for the creation of aarch64 wheels.

    Build: https://travis-ci.com/github/janaknat/nmslib/builds/205780637

    There are 8-9 failures when testing hnsw. Any suggestions on how to fix these? A majority of the failures are due to expected=0.99 and calculated=~0.98.

    Tagging @jmazanec15 since he added ARM compatibility.

    opened by janaknat 33
  • Speed up pip install

    Speed up pip install

    Currently pip installing is slow, since there is a compile step. Is there any way to speed it up? On my macbook:

    time pip install --no-cache nmslib
    Collecting nmslib
      Downloading https://files.pythonhosted.org/packages/e1/95/1f7c90d682b79398c5ee3f9296be8d2640fa41de24226bcf5473c801ada6/nmslib-1.7.3.6.tar.gz (255kB)
        100% |████████████████████████████████| 256kB 8.8MB/s 
    Requirement already satisfied: pybind11>=2.0 in .../virtualenv/python3.6/lib/python3.6/site-packages (from nmslib) (2.2.4)
    Requirement already satisfied: numpy in .../virtualenv/python3.6/lib/python3.6/site-packages (from nmslib) (1.15.4)
    Installing collected packages: nmslib
      Running setup.py install for nmslib ... -
    done
    Successfully installed nmslib-1.7.3.6
    
    real	3m11.091s
    

    would it be a good idea to provide pre-compiled wheels over pip? That would also simplify the process of finding the pybind11 headers (I had to do something special to copy them in for pip when running with a --target dir)

    opened by matthen 33
  • Can't load index?

    Can't load index?

    Hi, this might me more of a question than problem in the library. I have created an index with NAPP and saved it using saveIndex. However when I load it with loadIndex I get the following error:

    Check failed: A previously saved index is apparently used with a different data set, a different data set split, and/or a different gold standard file! (detected an object index >= #of data points

    Am I doing something wrong?

    Thanks for the help.

    EDIT: The message doesn't make sense to me because I'm not "using the index with a data set", I'm just loading it.

    EDIT2: I'm using the Python interface.

    enhancement 
    opened by zommerfelds 31
  • Custom Metrics

    Custom Metrics

    Hello,

    I wanted to perform NN search on a dataset of genomes. For this task, the distance between 2 datapoints is calculated by a custom script? Is there I can incorporate this without having to create the entire NN search algorithm myself and only modify some parts of your code?

    opened by Chokerino 30
  • Python process crashes: 'pybind11::error_already_set'

    Python process crashes: 'pybind11::error_already_set'

    nmslib is the only lib in our project that relies on pybind11 and we could narrow it down to the Dask nodes that use nmslib. When we disable the nodes that use nmslib it doesn't crash.

    terminate called after throwing an instance of 'pybind11::error_already_set'
      what():  TypeError: '>=' not supported between instances of 'int' and 'NoneType'
    
    At:
      /opt/conda/envs/jobnet-env/lib/python3.6/logging/__init__.py(1546): isEnabledFor
      /opt/conda/envs/jobnet-env/lib/python3.6/logging/__init__.py(1293): debug
    
    /usr/local/bin/entrypoint.sh: line 46:    21 Aborted                 (core dumped) python scripts/cli.py "${@:2}"```
    

    Version:

    - nmslib~=1.7.2
    - pybind11=2.2
    
    opened by lukin0110 28
  • Make failed in linking Boost library

    Make failed in linking Boost library

    Hello,

    I am facing an error in this step:

    [ 75%] Linking CXX executable ../release/experiment

    All of errors liked that:

    undefined reference to `boost::program_options:

    I install latest libraries version and checked that libboost 1.58 is compatible with g++ 4.9. I think maybe it related with C++11, however It returns error in both g++ 4.9 and 4.7.

    This is my system information:

    -- The C compiler identification is GNU 4.9.3 -- The CXX compiler identification is GNU 4.9.3 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Build type: Release -- GSL using gsl-config /usr/bin/gsl-config -- Using GSL from /usr -- Found GSL. -- Found Eigen3: /usr/include/eigen3 (Required is at least version "3") -- Found Eigen3. -- Boost version: 1.58.0 -- Found the following Boost libraries: -- system -- filesystem -- program_options -- Found BOOST.

    I also install Clang and LLDB 3.6. I tried search many possible solution but can not fix that :(.

    opened by nguyenv7 26
  • Python wrapper crashes while retrieving nearest neighbors when M>100

    Python wrapper crashes while retrieving nearest neighbors when M>100

    Hi, I am working on a problem where I need to retrieve ~500 nearest neighbors out of a million points. I am using the python wrapper for HNSW method. The code works perfectly well if I set the value of parameter M <=100 but setting it greater than 100, the code crashes during retrieving nearest neighbors (no issues while building the model) with an "invalid next size" error. Any idea why this might be happening? Thanks Himanshu

    bug 
    opened by hjain689 25
  • Incorrect distances returned for all-zero query

    Incorrect distances returned for all-zero query

    An all-zero query vector will result in NMSLib incorrectly reporting a distance of zero for its nearest neighbours (see example below). Is this related to #187? Is there a suggested workaround?

    # Training set (CSR sparse matrix)
    X.todense()
    # Out:
    # matrix([[4., 2., 3., 1., 0., 0., 0., 0., 0.],
    #         [2., 1., 0., 0., 3., 0., 1., 2., 1.],
    #         [4., 2., 0., 0., 3., 1., 0., 0., 0.]], dtype=float32)
    
    # Query vector (CSR sparse matrix)
    r.todense()
    # Out:
    # matrix([[0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
    
    # Train and query
    import nmslib
    index = nmslib.init(
        method='hnsw',
        space='cosinesimil_sparse_fast',
        data_type=nmslib.DataType.SPARSE_VECTOR,
        dtype=nmslib.DistType.FLOAT)
    index.addDataPointBatch(X)
    index.createIndex()
    index.knnQueryBatch(r, k=3)
    # Out:
    # [(array([2, 1, 0], dtype=int32), array([0., 0., 0.], dtype=float32))]
    
    # Note that distances are all 0, which is incorrect!
    # Same result for dense training & query vectors.
    
    bug 
    opened by lsorber 24
  • Jaccard to method HSNW for sparse features

    Jaccard to method HSNW for sparse features

    Hi,

    I want to know if HSNW provides Jaccard (similarity or distance, does not matter), besides cosine, for sparse features. There are scenarios in which Jaccard outperforms.

    Python notebooks provided show the following metrices: l2, l2sqr_sift, cosinesimil_sparse.

    According to space_sparse_scalar.h, the following metrices seem to be implemented, or in preparation, to sparse features: #define SPACE_SPARSE_COSINE_SIMILARITY "cosinesimil_sparse" #define SPACE_SPARSE_ANGULAR_DISTANCE "angulardist_sparse" #define SPACE_SPARSE_NEGATIVE_SCALAR "negdotprod_sparse" #define SPACE_SPARSE_QUERY_NORM_NEGATIVE_SCALAR "querynorm_negdotprod_sparse"

    What does each of these metrices mean? I also saw cosinesimil_sparse_fast in a few files. What is it, and how is it compared to cosinesimil_sparse? Is it ready for use?

    I can provide a Jaccard implementation for sparse vectors, given 2 vectors implemented as hash tables, but I haven't found out how to integrate it to the code. It would also be preferable to check which metrices are already available. The closest clue I got was to expand the following files: distcomp_scalar.cc, hnsw.cc and hnsw_distfunc_opt.cc, but I am not sure which steps to make. I saw some mentions to Jaccard in space_sparse_jaccard.cc and distcomp.h. But no examples are given.

    Thanks in advance.

    opened by icarocd 24
  • pybind11.h not found when installing using pip

    pybind11.h not found when installing using pip

    I'm trying to install python bindings on Ubuntu 16.04 machine:

    $ pip3 install pybind11 nmslib
    Collecting nmslib
      Using cached https://files.pythonhosted.org/packages/de/eb/28b2060bb1750426c5618e3ad6ce830ac3cfd56cb3eccfb799e52d6064db/nmslib-1.7.2.tar.gz
    Requirement already satisfied: pybind11>=2.0 in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (2.2.2)
    Requirement already satisfied: numpy in /homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages (from nmslib) (1.14.2)
    Building wheels for collected packages: nmslib
      Running setup.py bdist_wheel for nmslib ... error
      Complete output from command /homes/alexandrov/.virtualenvs/pytorch/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-0y71oxa4/nmslib/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-916r1rr9 --python-tag cp35:
      running bdist_wheel
      running build
      running build_ext
      creating tmp
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpwekdswov.cpp -o tmp/tmpwekdswov.o -std=c++14
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c /tmp/tmpyyphh022.cpp -o tmp/tmpyyphh022.o -fvisibility=hidden
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      building 'nmslib' extension
      creating build
      creating build/temp.linux-x86_64-3.5
      creating build/temp.linux-x86_64-3.5/nmslib
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/method
      creating build/temp.linux-x86_64-3.5/nmslib/similarity_search/src/space
      x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I./nmslib/similarity_search/include -Iinclude -Iinclude -I/homes/alexandrov/.virtualenvs/pytorch/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/homes/alexandrov/.virtualenvs/pytorch/include/python3.5m -c nmslib.cc -o build/temp.linux-x86_64-3.5/nmslib.o -O3 -march=native -fopenmp -DVERSION_INFO="1.7.2" -std=c++14 -fvisibility=hidden
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      nmslib.cc:16:31: fatal error: pybind11/pybind11.h: No such file or directory
      compilation terminated.
      error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    

    Clearly, pybind11 headers were not installed on my machine. This library is not packaged for apt-get (at least not for Ubuntu 16.04), so I needed to manually install from source.

    Would be nice if nmslib install script took care of this.

    opened by taketwo 23
  • Optimized index raises RuntimeError on load when saved with `negdotprod` space

    Optimized index raises RuntimeError on load when saved with `negdotprod` space

    Basically, this is what I am trying to do

    import nmslib
    
    space = 'negdotprod'
    
    vectors = [[1, 2], [3, 4], [5, 6]]
    
    index = nmslib.init(space=space, method='hnsw')
    index.addDataPointBatch(vectors)
    index.createIndex(
        {'M': 15, 'efConstruction': 200, 'skip_optimized_index': 0, 'post': 0}
    )
    index.saveIndex('test.index')
    
    new_index = nmslib.init(space=space, method='hnsw')
    new_index.loadIndex('test.index')
    

    and it raises

    Check failed: totalElementsStored_ == this->data_.size() The number of stored elements 3 doesn't match the number of data points ! Did you forget to re-load data?
    Traceback (most recent call last):
      File "8.py", line 15, in <module>
        new_index.loadIndex('test.index')
    RuntimeError: Check failed: The number of stored elements 3 doesn't match the number of data points ! Did you forget to re-load data?
    

    If I change space variable to cosinesimil, it works just fine. It seems that data points are not stored, even though hnsw method with skip_optimized_index=0 is used.

    opened by chomechome 22
  • Unable to pip install nmslib, including historic versions

    Unable to pip install nmslib, including historic versions

    Hey sorry to bother you,

    I've been trying to download scispacy via pip on windows 10 using python 3.10.0 today and it keeps failing due to errors about nmslib I've tried pip installing nmslib versions: 1.7.3.6 1.8 2.1.1

    None of them have worked though, curiously. I've had a long look around scispacys github and yours but nothing I've read has given me any solutions.

    I've also flagged it with scispacy on their github. Anyway I have no idea what's going on but just thought I'd let you know. Cheers Kind regards, Chris

    opened by Cbezz 5
  • Strict typing is needed: Using wrong input can cause distances to be all one, e.g., with cosinesimil_sparse/HNSW when calling knnQueryBatch on a dense array

    Strict typing is needed: Using wrong input can cause distances to be all one, e.g., with cosinesimil_sparse/HNSW when calling knnQueryBatch on a dense array

    Hey, I'm trying to use nmslib's HNSW with a csr_matrix containing sparse vectors.

    Creating the index works fine, adding the data and setting query time params too:

        items = ["foo is a kind of thing", "bar is another one", "this bar is a real one!", "I prefer to use a foo"] # etc, len=3000
        similar_items_index = nmslib.init(
            space="cosinesimil_sparse",
            method="hnsw",
            data_type=nmslib.DataType.SPARSE_VECTOR,
            dtype=nmslib.DistType.FLOAT,
        )
        vectorizer = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
        embeddings: csr_matrix = vectorizer.fit_transform(items)
        similar_items_index.addDataPointBatch(embeddings)
        similar_items_index.createIndex({"M": 128, "efConstruction": 32, "post": 2}, print_progress=False)
        similar_items_index.setQueryTimeParams({"ef": 512})
    

    But when I search with knnQueryBatch, all the returned distances are equal to 1:

    similar_items_index.knnQueryBatch([query_embedding], 5)[0]
    

    -> Knn results: ids, with distances all set to 1

    Am I missing something in the proper usage of HNSW with sparse vector data?

    Setup for reproduction
    • This uses the text-similarity data from Kaggle, downloaded in /tmp/. Any other text dataset should be fine, as computing similarity scores is not required to see the problem with returned distances.
    
    import csv
    from typing import Dict
    
    import nmslib
    import numpy as np
    from implicit.evaluation import csr_matrix
    from sklearn.feature_extraction.text import TfidfVectorizer
    
    CSV_PATH = "/tmp/data/"
    
    
    def main():
        similar_items_index = nmslib.init(
            space="cosinesimil_sparse",
            method="hnsw",
            data_type=nmslib.DataType.SPARSE_VECTOR,
            dtype=nmslib.DistType.FLOAT,
        )
        items = set()
        ids: Dict[str, int] = {}
        rids: Dict[int, str] = {}
        similarities = {}
        for file in [
            f"{CSV_PATH}/similarity-test.csv",
            f"{CSV_PATH}/similarity-train.csv",
        ]:
            with open(file) as f:
                reader = csv.reader(f, delimiter=",", quotechar="|")
                header = next(reader)
                for i, l in enumerate(reader):
                    desc_x = l[header.index("description_x")]
                    desc_y = l[header.index("description_y")]
                    similar = bool(l[header.index("same_security")])
                    id = len(items)
                    if desc_x not in items:
                        items.add(desc_x)
                        ids[desc_x] = id
                        rids[id] = desc_x
                        id_x = id
                        id += 1
                    else:
                        id_x = ids[desc_x]
                    if desc_y not in items:
                        items.add(desc_y)
                        ids[desc_y] = id
                        rids[id] = desc_y
                        id_y = id
                        id += 1
                    else:
                        id_y = ids[desc_y]
                    if similar:
                        similarities[id_x] = id_y
                        similarities[id_y] = id_x
             print(f"Loaded {len(items)}, total {len(similarities)/2} pairs of similar queries.")
             vectorizer = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
        embeddings: csr_matrix = vectorizer.fit_transform(items)
        print("Embedded items, adding datapoints..")
        similar_items_index.addDataPointBatch(embeddings)
        print("Creating index..")
        similar_items_index.createIndex({"M": 128, "efConstruction": 32, "post": 2}, print_progress=False)
        print("Setting index query params..")
        similar_items_index.setQueryTimeParams({"ef": 512})
        print("Searching...")
        score = 0
        total_similar = 0
        for item_id, item in enumerate(items):
            query_embedding = vectorizer.transform([item]).getrow(0).toarray()
            top_50, distances = similar_items_index.knnQueryBatch([query_embedding], 50)[0]
            top_50_texts = [rids[t] for t in top_50]
            try:
                expected = similarities[item_id]
                expected_text = rids[expected]
                if expected:
                    score += 1 if expected in top_50 else 0
            except KeyError:
                continue  # No similar noted on this item.
            total_similar += 1
        print(
            f"After querying {len(items)} of which {total_similar}, we found the similar item in the top50 {score} times."
        )
    
    
    if __name__ == "__main__":
        main()
    
    opened by PLNech 6
  • More encompassing approach for Mac M1 chips

    More encompassing approach for Mac M1 chips

    On a Mac architecture, platform.processor may return i386 even when on a Mac M1. The code below should be more accurate. See stack overflow comment, another stack overflow comment and stack overflow post for some more information / validation that the uname approach is more all encompassing.

    I was personally running into this problem and the following fix solved it for me.

    This PR is a slightly edited solution to what is contained in https://github.com/nmslib/nmslib/pull/485 with many thanks to @netj for getting this started.

    opened by JewlsIOB 3
  • Calling setQueryTimeParams results in a SIGSEGV

    Calling setQueryTimeParams results in a SIGSEGV

    Hi there! Trying to perform knnQuery on an indexed csr_matrix, I got the issue reported in #480 from this code:

            model = TfidfVectorizer(dtype=np.float32, token_pattern=r"\S+")
            embeddings = model.fit_transform(corpus_tfidf)
            logger.info(f"Creating vector index from a {len(corpus_tfidf)} corpus embedded as {embeddings.shape}...")
            index = nmslib.init(method="hnsw", space="cosinesimil_sparse", data_type=nmslib.DataType.SPARSE_VECTOR, dtype=nmslib.DistType.FLOAT)
            logger.info("Adding datapoints to index...")
            index.addDataPointBatch(embeddings)
            logger.info("Creating final index...")
            index.createIndex()
    
            logger.info(f"Search neightbors for first embedding {embeddings[0]})
            index.knnQuery(embeddings[0])
    

    As described in #480, this results in an IndexError: tuple index out of range.

    When trying to apply the index.setQueryTimeParams({'efSearch': efS, 'algoType': 'old'}) workaround mentioned in another issue , it results in a segmentation fault.

    I can reproduce it with the following minimal example, looks like even without arguments the call errors:

    index = nmslib.init(method="hnsw", space="cosinesimil_sparse", data_type=nmslib.DataType.SPARSE_VECTOR, dtype=nmslib.DistType.FLOAT)
    print("Setting index queryParams...")
    index.setQueryTimeParams()
    print("Adding datapoints to index...")
    

    ->

    Setting index queryParams...
    Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
    

    Env info

    • python -V -> Python 3.7.11
    • pip freeze | grep nmslib -> nmslib==2.1.1
    opened by PLNech 3
  • NMSLIB doesn't work on Windows 11

    NMSLIB doesn't work on Windows 11

    Hello,

    We use nmslib as default engine for TensorFlow Similarity due to it's broad compatibility with various OSes. We got multiple reports, and I was able to confirm it, that nmslib don't install on Windows 11, potentially related to the issue #498.

    Do you have any idea if/when you will be able to take a look at this? With the increased adoption of Win11 it become problematic for us.

    Thanks :)

    opened by ebursztein 15
Releases(v2.1.1)
  • v2.1.1(Feb 3, 2021)

    Note: We unfortunately had deployment issues. As a result we had to delete several versions between 2.0.6 and 2.1.1. If you installed one of these versions, please, delete them and install a more recent version (>=2.1.1).

    The current build focuses on:

    1. Providing more efficient ("optimized") implementations for spaces: negdotprod, l1, linf.
    2. Binaries for ARM 64 (aarch64).
    Source code(tar.gz)
    Source code(zip)
  • v2.0.6(Apr 16, 2020)

  • v2.0.5(Nov 7, 2019)

    The main objective of this release to provide binary wheels. For compatibility reasons, we need to stick to basic SSE2 instructions. However, when the Python library is being imported, it prints a message suggesting that a more efficient version can be installed from sources (and tells how to do this).

    Furthermore, this release removes a lot of old code, which speeds up compilation by 70%:

    1. Non-performing methods
    2. Double-indices

    This is a step towards more lightweight NMSLIB library.

    Source code(tar.gz)
    Source code(zip)
  • v1.8.1(Jun 23, 2019)

  • v1.8(Jun 6, 2019)

    This is a clean-up release focusing on several important issues:

    1. Fixing a bug with knnQuery #370
    2. Added a possibility to save/load data efficiently from the Python bindings (and the query server) #356 Python notebooks are updated accordingly
    3. We have bit Jaccard space (many thanks @gregfriedland)
    4. Upgraded the query server to use a recent Apache Thrift
    5. Importantly the documentation is reorganized quite a bit: 5.1 There is now a single entry point for all the docs 5.2 Most of the docs are now online and only fairly technical description of search spaces and methods is in the PDF manual.
    Source code(tar.gz)
    Source code(zip)
  • v1.7.3.6(Oct 4, 2018)

  • v1.7.3.4(Aug 6, 2018)

  • v1.7.3.2(Jul 13, 2018)

  • v1.7.3.1(Jul 9, 2018)

  • v1.7.2(Feb 20, 2018)

    1. Improving concurrency in Python (preventing hanging in a certain situation https://github.com/searchivarius/nmslib/issues/291)
    2. Improving ParallelFor : passing thread ID and not starting threads in a single-thread mode.
    Source code(tar.gz)
    Source code(zip)
  • v1.7(Feb 4, 2018)

  • v1.6(Dec 15, 2016)

    Here are the list of changes for the version 1.6 (manual isn't updated yet):

    We especially thank the following people for the fixes:

    • Bileg Naidan (@bileg)
    • Bob Poekert (@bobpoekert)
    • @orgoro
    1. We simplified the build by excluding the code that required 3rd party code from the core library. In other words, the core library does not have any 3rd party dependencies (not even boost). To build the full version of library you have to run cmake as follows: cmake . -DWITH_EXTRAS=1
    2. It should now be possible to build on MAC.
    3. We improve Python bindings (thanks to @bileg) and their installation process (thanks to @bobpoekert):
      1. We merged our generic and vector bindings into a single module. We upgraded to a more standard installation process via distutils. You can run: python setup.py build and then sudo python setup.py install.
      2. We improved our support for sparse spaces: you can pass data in the form of a numpy sparse array!
      3. There are now batch multi-threaded querying and addition of data.
      4. addDataPoint* functions return a position of an inserted entry. This can be useful if you use function getDataPoint
      5. For examples of using Python API, please, see *.py files in the folder python_bindings.
      6. Note that to execute unit tests you need: python-numpy, python-scipy, and python-pandas.
    4. Because we got rid of boost, we, unfortunately, do not support command-line options WITHOUT arguments. Instead, you have pass values 0 or 1.
    5. However, the utility experiment (experiment.exe) now accepts the option recallOnly. If this option has argument 1, then the only effectiveness metric computed is recall. This is useful for evaluation of HNSW, because (for efficiency reasons) HNSW does not return proper distance values (e.g., for L2 it's a squared distance, not the original one). This makes it impossible to compute effectiveness metrics other than recall (returning wrong distance values would also lead to experiment terminating with an error message).
    6. Additional spaces:
      1. negdotprod_sparse: negative inner (dot) product. This is a sparse space.
      2. querynorm_negdotprod_sparse: query-normalized inner (dot) product, which is the dot product divded by the query norm.
      3. renyi_diverg: Renyi divergence. It has the parameter alpha.
      4. ab_diverg: α-β-divergence. It has two parameters: alpha and beta.
    7. Additional search methods:
      1. simple_invindx: A classical inverted index with a document-at-a-time processing (via a prirority queue). It doesn't have parameters, but works only with the sparse space negdotprod_sparse.
      2. falconn: we ported (created a wrapper for) a June 2016's version of FALCONN library.
        1. Unlike the original implementation, our wrapper works directly with sparse vector spaces as well as with dense vector spaces.
        2. However, our wrapper has to duplicate data twice: so this method is useful mostly as a benchmark.
        3. Our wrapper directly supports a data centering trick, which can boost performance sometimes.
        4. Most parameters (hash_family, cross_polytope, hyperplane, storage_hash_table, num_hash_bits, num_hash_tables, num_probes, num_rotations, seed, feature_hashing_dimension) merely map to FALCONN parameters.
        5. Setting additional parameters norm_data and center_data tells us to center and normalize data. Our implementation of the centering (which is done unfortunately before the hashing trick is applied) for sparse data is horribly inefficient, so we wouldn't recommend using it. Besides, it doesn't seem to improve results. Just in case, the number of sprase dimensions used for centering is controlled by the parameter max_sparse_dim_to_center.
        6. Our FALCONN wrapper would normally use the distance provided by NMSLIB, but you can force using FALCONN's distance function implementation by setting: use_falconn_dist to 1.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.3(Jul 11, 2016)

  • v1.5.2(Jul 2, 2016)

  • v1.5.1(Jun 1, 2016)

  • v1.5(May 20, 2016)

    1. A new efficient method: a hierarchical (navigable) small-world graph (HNSW), contributed by Yury Malkov (@yurymalkov). Works with g++, Visual Studio, Intel Compiler, but doesn't work with Clang yet.
    2. A query server, which can have clients in C++, Java, Python, and other languages supported by Apache Thrift
    3. Python bindings for vector and non-vector spaces
    4. Improved performance of two core methods SW-graph and NAPP
    5. Better handling of the gold standard data in the benchmarking utility experiment
    6. Updated API that permits search methods to serialize indices
    7. Improved documentation (e.g., we added tuning guidelines for best methods)
    Source code(tar.gz)
    Source code(zip)
SberSwap Video Swap base on deep learning

SberSwap Video Swap base on deep learning

Sber AI 431 Jan 03, 2023
Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Ibai Gorordo 35 Sep 07, 2022
Quickly and easily create / train a custom DeepDream model

Dream-Creator This project aims to simplify the process of creating a custom DeepDream model by using pretrained GoogleNet models and custom image dat

55 Dec 27, 2022
[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

SADRNet Paper link: SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction Requirements python

Multimedia Computing Group, Nanjing University 99 Dec 30, 2022
PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

DECOR-GAN PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement, Zhiqin Chen, Vladimir G. Kim, Matthew Fish

Zhiqin Chen 72 Dec 31, 2022
[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

MIT Graphics Group 65 Jan 07, 2023
YOLOv5 Series Multi-backbone, Pruning and quantization Compression Tool Box.

YOLOv5-Compression Update News Requirements 环境安装 pip install -r requirements.txt Evaluation metric Visdrone Model mAP ZhangYuan 719 Jan 02, 2023

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition Introduction Run attack: SGADV.py Objective function: foolbox/attacks/gradi

1 Jul 18, 2022
code for EMNLP 2019 paper Text Summarization with Pretrained Encoders

PreSumm This code is for EMNLP 2019 paper Text Summarization with Pretrained Encoders Updates Jan 22 2020: Now you can Summarize Raw Text Input!. Swit

Yang Liu 1.2k Dec 28, 2022
Api's bulid in Flask perfom to manage Todo Task.

Citymall-task Api's bulid in Flask perfom to manage Todo Task. Installation Requrements : Python: 3.10.0 MongoDB create .env file with variables DB_UR

Aisha Tayyaba 1 Dec 17, 2021
Sharpness-Aware Minimization for Efficiently Improving Generalization

Sharpness-Aware-Minimization-TensorFlow This repository provides a minimal implementation of sharpness-aware minimization (SAM) (Sharpness-Aware Minim

Sayak Paul 54 Dec 08, 2022
Flower classification model that classifies flowers in 10 classes made using transfer learning (~85% accuracy).

flower-classification-inceptionV3 Flower classification model that classifies flowers in 10 classes. Training and validation are done using a pre-anot

Ivan R. Mršulja 1 Dec 12, 2021
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

Shihua Huang 23 Jul 22, 2022
Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

Pose-Transfer Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19(Oral). The paper is available here. Video generation

Tengteng Huang 679 Jan 04, 2023
Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

PARE: Part Attention Regressor for 3D Human Body Estimation [ICCV 2021] PARE: Part Attention Regressor for 3D Human Body Estimation, Muhammed Kocabas,

Muhammed Kocabas 277 Jan 03, 2023
Improving Compound Activity Classification via Deep Transfer and Representation Learning

Improving Compound Activity Classification via Deep Transfer and Representation Learning This repository is the official implementation of Improving C

NingLab 2 Nov 24, 2021
This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Ad^2Attack:Adaptive Adversarial Attack on Real-Time UAV Tracking Demo video 📹 Our video on bilibili demonstrates the test results of Ad^2Attack on se

Intelligent Vision for Robotics in Complex Environment 10 Nov 07, 2022
A graph adversarial learning toolbox based on PyTorch and DGL.

GraphWar: Arms Race in Graph Adversarial Learning NOTE: GraphWar is still in the early stages and the API will likely continue to change. 🚀 Installat

Jintang Li 54 Jan 05, 2023
From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement (CVPR'2020)

Under-exposure introduces a series of visual degradation, i.e. decreased visibility, intensive noise, and biased color, etc. To address these problems, we propose a novel semi-supervised learning app

Yang Wenhan 117 Jan 03, 2023