RapidFuzz is a fast string matching library for Python and C++

Overview

RapidFuzz

Rapid fuzzy string matching in Python and C++ using the Levenshtein Distance

Continous Integration PyPI package version Conda Version Python versions
Gitter chat Documentation GitHub license

DescriptionInstallationUsageLicense


Description

RapidFuzz is a fast string matching library for Python and C++, which is using the string similarity calculations from FuzzyWuzzy. However there are two aspects that set RapidFuzz apart from FuzzyWuzzy:

  1. It is MIT licensed so it can be used whichever License you might want to choose for your project, while you're forced to adopt the GPL license when using FuzzyWuzzy
  2. It is mostly written in C++ and on top of this comes with a lot of Algorithmic improvements to make string matching even faster, while still providing the same results. More details on these performance improvements in form of benchmarks can be found here

Requirements

Installation

There are several ways to install RapidFuzz, the recommended methods are to either use pip(the Python package manager) or conda (an open-source, cross-platform, package manager)

with pip

RapidFuzz can be installed with pip the following way:

pip install rapidfuzz

There are pre-built binaries (wheels) of RapidFuzz for MacOS (10.9 and later), Linux x86_64 and Windows. Wheels for armv6l (Raspberry Pi Zero) and armv7l (Raspberry Pi) are available on piwheels.

✖️   failure "ImportError: DLL load failed"

If you run into this error on Windows the reason is most likely, that the Visual C++ 2019 redistributable is not installed, which is required to find C++ Libraries (The C++ 2019 version includes the 2015, 2017 and 2019 version).

with conda

RapidFuzz can be installed with conda:

conda install -c conda-forge rapidfuzz

from git

RapidFuzz can be installed directly from the source distribution by cloning the repository. This requires a C++14 capable compiler.

git clone https://github.com/maxbachmann/rapidfuzz.git
cd rapidfuzz
pip install .

Usage

Some simple functions are shown below. A complete documentation of all functions can be found here.

Scorers

Scorers in RapidFuzz can be found in the modules fuzz and string_metric.

Simple Ratio

> fuzz.ratio("this is a test", "this is a test!")
96.55171966552734

Partial Ratio

> fuzz.partial_ratio("this is a test", "this is a test!")
100.0

Token Sort Ratio

fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 100.0 ">
> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
90.90908813476562
> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100.0

Token Set Ratio

fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 100.0 ">
> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
83.8709716796875
> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100.0

Process

The process module makes it compare strings to lists of strings. This is generally more performant than using the scorers directly from Python. Here are some examples on the usage of processors in RapidFuzz:

process.extract("new york jets", choices, scorer=fuzz.WRatio, limit=2) [('New York Jets', 100, 1), ('New York Giants', 78.57142639160156, 2)] > process.extractOne("cowboys", choices, scorer=fuzz.WRatio) ("Dallas Cowboys", 90, 3) ">
> from rapidfuzz import process, fuzz
> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
> process.extract("new york jets", choices, scorer=fuzz.WRatio, limit=2)
[('New York Jets', 100, 1), ('New York Giants', 78.57142639160156, 2)]
> process.extractOne("cowboys", choices, scorer=fuzz.WRatio)
("Dallas Cowboys", 90, 3)

The full documentation of processors can be found here

Benchmark

The following benchmark gives a quick performance comparision between RapidFuzz and FuzzyWuzzy. More detailed benchmarks for the string metrics can be found in the documentation. For this simple comparision I generated a list of 10.000 strings with length 10, that is compared to a sample of 100 elements from this list:

words = [
  ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(10))
  for _ in range(10_000)
]
samples = words[::len(words) // 100]

The first benchmark compares the performance of the scorers in FuzzyWuzzy and RapidFuzz when they are used directly from Python in the following way:

for sample in samples:
  for word in words:
    scorer(sample, word)

The following graph shows how many elements are processed per second with each of the scorers. There are big performance differences between the different scorers. However each of the scorers is faster in RapidFuzz

Benchmark Scorer

The second benchmark compares the performance when the scorers are used in combination with extractOne in the following way:

for sample in samples:
  extractOne(sample, word, scorer=scorer)

The following graph shows how many elements are processed per second with each of the scorers. In RapidFuzz the usage of scorers through processors like extractOne is a lot faster than directly using it. Thats why they should be used whenever possible.

Benchmark extractOne

License

RapidFuzz is licensed under the MIT license since I believe that everyone should be able to use it without being forced to adopt the GPL license. Thats why the library is based on an older version of fuzzywuzzy that was MIT licensed as well. This old version of fuzzywuzzy can be found here.

Comments
  • Read the Docs Setup

    Read the Docs Setup

    Creating a pull for the Read the Docs setup and starting documentation. This PR is towards the issue #17.

    For more extensive documentation on functions we should add extensive docstrings in each of the functions. Then the main file can be changed to pull those docstrings from the code.

    opened by TrigonaMinima 18
  • 2.13.3: test suite is failing because it cannot find `tests.common`

    2.13.3: test suite is failing because it cannot find `tests.common`

    I'm packaging your module as an rpm package so I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

    • python3 -sBm build -w --no-isolation
    • because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
    • install .whl file in </install/prefix>
    • run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

    Here is pytest output:

    + PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-rapidfuzz-2.13.3-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-rapidfuzz-2.13.3-2.fc35.x86_64/usr/lib/python3.8/site-packages
    + /usr/bin/pytest -ra --import-mode=importlib
    =========================================================================== test session starts ============================================================================
    platform linux -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0
    rootdir: /home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3, configfile: pyproject.toml, testpaths: tests
    plugins: hypothesis-6.58.2
    collected 75 items / 11 errors
    
    ================================================================================== ERRORS ==================================================================================
    ___________________________________________________________________ ERROR collecting tests/test_fuzz.py ____________________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/test_fuzz.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/test_fuzz.py:7: in <module>
        from .common import symmetric_scorer_tester, scorer_tester
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    ________________________________________________________ ERROR collecting tests/distance/test_DamerauLevenshtein.py ________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_DamerauLevenshtein.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_DamerauLevenshtein.py:4: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    _____________________________________________________________ ERROR collecting tests/distance/test_Hamming.py ______________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Hamming.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_Hamming.py:2: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    ______________________________________________________________ ERROR collecting tests/distance/test_Indel.py _______________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Indel.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_Indel.py:2: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    _______________________________________________________________ ERROR collecting tests/distance/test_Jaro.py _______________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Jaro.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_Jaro.py:4: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    ___________________________________________________________ ERROR collecting tests/distance/test_JaroWinkler.py ____________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_JaroWinkler.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_JaroWinkler.py:4: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    ______________________________________________________________ ERROR collecting tests/distance/test_LCSseq.py ______________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_LCSseq.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_LCSseq.py:2: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    ___________________________________________________________ ERROR collecting tests/distance/test_Levenshtein.py ____________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Levenshtein.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_Levenshtein.py:3: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    _______________________________________________________________ ERROR collecting tests/distance/test_OSA.py ________________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_OSA.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_OSA.py:2: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    _____________________________________________________________ ERROR collecting tests/distance/test_Postfix.py ______________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Postfix.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_Postfix.py:2: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    ______________________________________________________________ ERROR collecting tests/distance/test_Prefix.py ______________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Prefix.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/distance/test_Prefix.py:2: in <module>
        from ..common import GenericScorer
    E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
    ========================================================================= short test summary info ==========================================================================
    ERROR tests/test_fuzz.py
    ERROR tests/distance/test_DamerauLevenshtein.py
    ERROR tests/distance/test_Hamming.py
    ERROR tests/distance/test_Indel.py
    ERROR tests/distance/test_Jaro.py
    ERROR tests/distance/test_JaroWinkler.py
    ERROR tests/distance/test_LCSseq.py
    ERROR tests/distance/test_Levenshtein.py
    ERROR tests/distance/test_OSA.py
    ERROR tests/distance/test_Postfix.py
    ERROR tests/distance/test_Prefix.py
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 11 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ============================================================================ 11 errors in 1.11s ============================================================================
    

    Here is list of installed modules in build env

    Package                       Version
    ----------------------------- -----------------
    alabaster                     0.7.12
    appdirs                       1.4.4
    attrs                         22.1.0
    Babel                         2.11.0
    Brlapi                        0.8.3
    build                         0.9.0
    charset-normalizer            3.0.1
    contourpy                     1.0.6
    cssselect                     1.1.0
    cycler                        0.11.0
    Cython                        0.29.32
    distro                        1.8.0
    dnspython                     2.2.1
    docutils                      0.19
    exceptiongroup                1.0.0
    extras                        1.0.0
    fixtures                      4.0.0
    fonttools                     4.38.0
    gpg                           1.17.1-unknown
    hypothesis                    6.58.2
    idna                          3.4
    imagesize                     1.4.1
    importlib-metadata            5.1.0
    iniconfig                     1.1.1
    Jinja2                        3.1.2
    kiwisolver                    1.4.4
    latexcodec                    2.0.1
    libcomps                      0.1.19
    louis                         3.23.0
    lxml                          4.9.1
    MarkupSafe                    2.1.1
    matplotlib                    3.6.2
    numpy                         1.23.1
    olefile                       0.46
    packaging                     21.3
    pbr                           5.9.0
    pep517                        0.13.0
    Pillow                        9.3.0
    pip                           22.3.1
    pluggy                        1.0.0
    pybtex                        0.24.0
    pybtex-docutils               1.0.2
    Pygments                      2.13.0
    PyGObject                     3.42.2
    pyparsing                     3.0.9
    pytest                        7.2.0
    python-dateutil               2.8.2
    pytz                          2022.4
    PyYAML                        6.0
    requests                      2.28.1
    rpm                           4.17.0
    scikit-build                  0.16.2
    scour                         0.38.2
    setuptools                    65.6.3
    six                           1.16.0
    snowballstemmer               2.2.0
    sortedcontainers              2.4.0
    Sphinx                        5.3.0
    sphinxcontrib-applehelp       1.0.2.dev20221204
    sphinxcontrib-bibtex          2.5.0
    sphinxcontrib-devhelp         1.0.2.dev20221204
    sphinxcontrib-htmlhelp        2.0.0
    sphinxcontrib-jsmath          1.0.1.dev20221204
    sphinxcontrib-qthelp          1.0.3.dev20221204
    sphinxcontrib-serializinghtml 1.1.5
    testtools                     2.5.0
    tomli                         2.0.1
    urllib3                       1.26.12
    wheel                         0.38.4
    zipp                          3.11.0
    
    opened by kloczek 16
  • Fuzzy search by prefix?

    Fuzzy search by prefix?

    Good day, which would be the best approach/scorer to use in order to have better results when trying to match in same order (by prefix)? for example

    For example If I search the value "38050" within a list that contains "358", the result is ('358', 72.0, 8) because 3, 5 and 8 are present in 38050, but for me is not of interest since 3, 5 and 8 are in different order. Would be a match for me if the choice found would be 380XX, that has similarity in prefix compared with 38050.

    The issue is related with this question I made yesterday and they suggest to use another scorer or try differents.

    https://stackoverflow.com/questions/74093719/how-to-make-fuzzy-search-between-lists-showing-matches-and-not-found-elements

    Thanks in advance.

    question 
    opened by RasecMalkic 13
  • Could you please provide compatibility with Cython 0.29.x?

    Could you please provide compatibility with Cython 0.29.x?

    I've been trying to package RapidFuzz for Gentoo. Unfortunately, we're nowhere near close to being ready to switch to Cython 3.x, so the requirement on alpha version of Cython makes it impossible for us to package it. Could you please consider providing compatibility with the current release versions of Cython?

    enhancement 
    opened by mgorny 13
  • Issues packaging with cx_freeze

    Issues packaging with cx_freeze

    I'm currently trying to package an application that uses rapidfuzz using cx_freeze. The packaging is successful but when I try to run the application I get the following error.

    implementation requires numpy to be installed

    I'm using the process.cdist and I understand numpy is required for the matrix output however I also import numpy at the beginning of the module which calls rapidfuzz without issue so the dependency must be there.

    basically what I would like to understand is if there is a way to test which sub modules from numpy are required but are failing to import?

    Below is my setup.py

    import sys
    from cx_Freeze import setup, Executable
    from setuptools import find_packages
    
    options = {
        "build_exe": {
            "zip_include_packages": ["*"],
            "zip_exclude_packages": [],
            "build_exe": "dist\\",
            "includes": [
                "numpy",
                "numpy.int16",
                "numpy.int64",
                "_pytest._argcomplete",
                "_pytest._code.code",
                "_pytest._code.source",
                "_pytest._io.saferepr",
                "_pytest._io.terminalwriter",
                "_pytest._io.wcwidth",
                "_pytest._version",
                "_pytest.assertion.rewrite",
                "_pytest.assertion.truncate",
                "_pytest.assertion.util",
                "_pytest.cacheprovider",
                "_pytest.capture",
                "_pytest.compat",
                "_pytest.config.argparsing",
                "_pytest.config.compat",
                "_pytest.config.exceptions",
                "_pytest.config.findpaths",
                "_pytest.debugging",
                "_pytest.deprecated",
                "_pytest.doctest",
                "_pytest.faulthandler",
                "_pytest.fixtures",
                "_pytest.freeze_support",
                "_pytest.helpconfig",
                "_pytest.hookspec",
                "_pytest.junitxml",
                "_pytest.legacypath",
                "_pytest.logging",
                "_pytest.main",
                "_pytest.mark.expression",
                "_pytest.mark.structures",
                "_pytest.monkeypatch",
                "_pytest.nodes",
                "_pytest.nose",
                "_pytest.outcomes",
                "_pytest.pastebin",
                "_pytest.pathlib",
                "_pytest.pytester",
                "_pytest.pytester_assertions",
                "_pytest.python",
                "_pytest.python_api",
                "_pytest.python_path",
                "_pytest.recwarn",
                "_pytest.reports",
                "_pytest.runner",
                "_pytest.scope",
                "_pytest.setuponly",
                "_pytest.setupplan",
                "_pytest.skipping",
                "_pytest.stash",
                "_pytest.stepwise",
                "_pytest.terminal",
                "_pytest.threadexception",
                "_pytest.timing",
                "_pytest.tmpdir",
                "_pytest.unittest",
                "_pytest.unraisableexception",
                "_pytest.warning_types",
                "_pytest.warnings",
                "py._builtin",
                "py._path.local",
                "py._io.capture",
                "py._io.saferepr",
                "py._io.terminalwriter",
                "py._xmlgen",
                "py._error",
                "py._std",
                # builtin files imported by pytest using py.std implicit mechanism
                "argparse",
                "shlex",
                "warnings",
                "types",
                "rapidfuzz.utils_cpp",
                "rapidfuzz.utils_py",
                "rapidfuzz.process_py",
                "rapidfuzz.fuzz_py",
                "rapidfuzz.distance.Hamming_py",
                "rapidfuzz.process_cpp",
                "rapidfuzz.fuzz_cpp",
                "rapidfuzz.distance.Levenshtein_cpp",
                "rapidfuzz.distance.Levenshtein_py",
                "rapidfuzz.string_metric_cpp",
                "rapidfuzz.string_metric_py",
                "jinja2.ext",
                "jinja2",
            ],
            "include_files": ["tests/"],
        }
    }
    
    
    f = open("README.md", "r")
    LONG_DESCRIPTION = f.read()
    f.close()
    
    setup(
        name="centralized_integrations",
        version="0.1",
        description="xxx",
        long_description=LONG_DESCRIPTION,
        long_description_content_type="text/markdown",
        author="xxx",
        author_email="xxx",
        url="xxx",
        license="BSD 3-Clause License",
        packages=find_packages(exclude=["ez_setup"]),
        options=options,
        include_package_data=True,
        executables=[Executable("cli/main.py", base=None)],
        entry_points="""
            [console_scripts]
            cli = cli.main:main
        """,
    )
    
    bug 
    opened by NaaN108 12
  • 2.0.1 fails to build: Expected ':', found 'class'

    2.0.1 fails to build: Expected ':', found 'class'

    Error compiling Cython file:
    ------------------------------------------------------------
    ...
    )
    
    from array import array
    
    cdef extern from "rapidfuzz/details/types.hpp" namespace "rapidfuzz" nogil:
        cpdef enum class EditType:
                  ^
    ------------------------------------------------------------
    
    /disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_common.pxd:23:15: Expected ':', found 'class'
    
    Error compiling Cython file:
    ------------------------------------------------------------
    ...
    # distutils: language=c++
    # cython: language_level=3, binding=True, linetrace=True
    
    from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
    from cpp_common cimport (
    ^
    ------------------------------------------------------------
    
    /disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/is_valid_string.pxd' not found
    
    Error compiling Cython file:
    ------------------------------------------------------------
    ...
    # distutils: language=c++
    # cython: language_level=3, binding=True, linetrace=True
    
    from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
    from cpp_common cimport (
    ^
    ------------------------------------------------------------
    
    /disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/convert_string.pxd' not found
    
    Error compiling Cython file:
    ------------------------------------------------------------
    ...
    # distutils: language=c++
    # cython: language_level=3, binding=True, linetrace=True
    
    from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
    from cpp_common cimport (
    ^
    ------------------------------------------------------------
    
    /disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/hash_array.pxd' not found
    
    Error compiling Cython file:
    ------------------------------------------------------------
    ...
    # distutils: language=c++
    # cython: language_level=3, binding=True, linetrace=True
    
    from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
    from cpp_common cimport (
    ^
    ------------------------------------------------------------
    
    /disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/hash_sequence.pxd' not found
    
    Error compiling Cython file:
    ------------------------------------------------------------
    ...
    # distutils: language=c++
    # cython: language_level=3, binding=True, linetrace=True
    
    from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
    from cpp_common cimport (
    ^
    ------------------------------------------------------------
    
    /disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/conv_sequence.pxd' not found
    
    Error compiling Cython file:
    ------------------------------------------------------------
    ...
        validate_string(sentence, "sentence must be a String")
        return default_process_impl(sentence)
    
    
    cdef bool default_process_capi(sentence, RF_String* str_) except False:
        proc_str = conv_sequence(sentence)
                  ^
    ------------------------------------------------------------
    
    /disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:44:15: 'conv_sequence' is not a constant, variable or function identifier
    
    Error compiling Cython file:
    ------------------------------------------------------------
    

    Python-3.8 cython-0.29.26 FreeBSD 13

    bug 
    opened by yurivict 12
  • Inconsistent behavior between process.extractOne and fuzz.ratio

    Inconsistent behavior between process.extractOne and fuzz.ratio

    I am using rapidfuzz for similarity of devanagari words. Here's a reproducible example.

    from rapidfuzz import fuzz, process
    
    word = 'मस्सा'
    wordlist = {'शर्ट', 'वर्ट', 'वार्ट', 'वऑर्ट', 'वॉर्ट'}
    
    process.extractOne(word, wordlist)
    # gives: ('वार्ट', 100.0)
    
    for i in wordlist: print(fuzz.ratio(i, word))
    # gives
    # 22.22222222222222
    # 22.22222222222222
    # 19.999999999999996
    # 19.999999999999996
    # 19.999999999999996
    

    In the output of process.extractOne best score should be 22.22 instead of 100.0.

    Environment

    Python: 3.6.10 Rapidfuzz: 0.7.8

    opened by TrigonaMinima 12
  • Asian languages usage

    Asian languages usage

    Hello,

    Not an issue per say, but I was curious about possible asian languages compatibility. Chinese, Korean, and Japanese all perform very poorly with Levenshtein distance matching because they’re not alphabet-based.

    A solution would be to use a romanisation library to translate them to alphabet characters in the same way that all characters are put in lowercase.

    I’d love to give a hand but unfortunately my cpp is pretty rusty and I’d do more harm than good.

    enhancement 
    opened by T1-Tolki 12
  • pure Python mode fails to import

    pure Python mode fails to import

    This was reported here: https://github.com/python-poetry/poetry/issues/6078 It should be relatively simple to add these missing functions to the pure Python version.

    @pekkarr how does the aur ship packages? If it ships binaries those appear to be broken otherwise it would not even attempt to use the pure Python fallback version. You might want to set the environment variable RAPIDFUZZ_BUILD_EXTENSION while packaging to make sure that when a build error occurs it does not simply package the pure Python version.

    bug 
    opened by maxbachmann 10
  • Poetry update is failing for some reason with the latest

    Poetry update is failing for some reason with the latest

    [email protected]:~/Documents/Projects/libretrofuzz$ poetry update -vvv
    Using virtualenv: /home/i3/.cache/pypoetry/virtualenvs/libretrofuzz-VjwDmzHD-py3.8
    Updating dependencies
    Resolving dependencies...
       1: fact: libretrofuzz is 2.4.1
       1: derived: libretrofuzz
       1: fact: libretrofuzz depends on beautifulsoup4 (^4.10.0)
       1: fact: libretrofuzz depends on questionary (^1.10.0)
       1: fact: libretrofuzz depends on typer (^0.5.0)
       1: fact: libretrofuzz depends on rapidfuzz (^2.0.8)
       1: fact: libretrofuzz depends on httpx (^0.23.0)
       1: fact: libretrofuzz depends on tqdm (^4.64.0)
       1: fact: libretrofuzz depends on prompt_toolkit (^3.0.30)
       1: fact: libretrofuzz depends on pillow (^7.0.0)
       1: fact: libretrofuzz depends on pytest (^5.2)
       1: fact: libretrofuzz depends on pytest (^5.2)
       1: selecting libretrofuzz (2.4.1)
       1: derived: pytest (>=5.2,<6.0)
       1: derived: pillow (>=7.0.0,<8.0.0)
       1: derived: prompt_toolkit (>=3.0.30,<4.0.0)
       1: derived: tqdm (>=4.64.0,<5.0.0)
       1: derived: httpx (>=0.23.0,<0.24.0)
       1: derived: rapidfuzz (>=2.0.8,<3.0.0)
       1: derived: typer[all] (>=0.5.0,<0.6.0)
       1: derived: questionary (>=1.10.0,<2.0.0)
       1: derived: beautifulsoup4 (>=4.10.0,<5.0.0)
    PyPI: 15 packages found for pytest >=5.2,<6.0
       1: fact: pytest (5.4.3) depends on py (>=1.5.0)
       1: fact: pytest (5.4.3) depends on packaging (*)
       1: fact: pytest (5.4.3) depends on attrs (>=17.4.0)
       1: fact: pytest (5.4.3) depends on more-itertools (>=4.0.0)
       1: fact: pytest (5.4.3) depends on pluggy (>=0.12,<1.0)
       1: fact: pytest (5.4.3) depends on wcwidth (*)
       1: fact: pytest (5.4.3) depends on atomicwrites (>=1.0)
       1: fact: pytest (5.4.3) depends on colorama (*)
       1: selecting pytest (5.4.3)
       1: derived: colorama
       1: derived: atomicwrites (>=1.0)
       1: derived: wcwidth
       1: derived: pluggy (>=0.12,<1.0)
       1: derived: more-itertools (>=4.0.0)
       1: derived: attrs (>=17.4.0)
       1: derived: packaging
       1: derived: py (>=1.5.0)
    PyPI: 5 packages found for pillow >=7.0.0,<8.0.0
       1: selecting pillow (7.2.0)
    PyPI: 1 packages found for prompt-toolkit >=3.0.30,<4.0.0
       1: fact: prompt-toolkit (3.0.30) depends on wcwidth (*)
       1: selecting prompt-toolkit (3.0.30)
    PyPI: No release information found for tqdm-2.0.0.dev0, skipping
    PyPI: 1 packages found for tqdm >=4.64.0,<5.0.0
       1: fact: tqdm (4.64.0) depends on colorama (*)
       1: selecting tqdm (4.64.0)
    PyPI: No release information found for httpx-0.0.1, skipping
    PyPI: 1 packages found for httpx >=0.23.0,<0.24.0
       1: fact: httpx (0.23.0) depends on certifi (*)
       1: fact: httpx (0.23.0) depends on sniffio (*)
       1: fact: httpx (0.23.0) depends on rfc3986 (>=1.3,<2)
       1: fact: httpx (0.23.0) depends on httpcore (>=0.15.0,<0.16.0)
       1: selecting httpx (0.23.0)
       1: derived: httpcore (>=0.15.0,<0.16.0)
       1: derived: rfc3986[idna2008] (>=1.3,<2)
       1: derived: sniffio
       1: derived: certifi
    PyPI: 15 packages found for rapidfuzz >=2.0.8,<3.0.0
       1: fact: rapidfuzz (2.3.0) depends on jarowinkler (>=1.2.0,<2.0.0)
       1: selecting rapidfuzz (2.3.0)
       1: derived: jarowinkler (>=1.2.0,<2.0.0)
    PyPI: 1 packages found for typer >=0.5.0,<0.6.0
       1: fact: typer (0.5.0) depends on typer (0.5.0)
       1: fact: typer (0.5.0) depends on rich (>=10.11.0,<13.0.0)
       1: fact: typer (0.5.0) depends on shellingham (>=1.3.0,<2.0.0)
       1: fact: typer (0.5.0) depends on colorama (>=0.4.3,<0.5.0)
       1: fact: typer (0.5.0) depends on click (>=7.1.1,<9.0.0)
       1: selecting typer[all] (0.5.0)
       1: derived: click (>=7.1.1,<9.0.0)
       1: derived: colorama (>=0.4.3,<0.5.0)
       1: derived: shellingham (>=1.3.0,<2.0.0)
       1: derived: rich (>=10.11.0,<13.0.0)
       1: derived: typer (==0.5.0)
    PyPI: 1 packages found for questionary >=1.10.0,<2.0.0
       1: fact: questionary (1.10.0) depends on prompt_toolkit (>=2.0,<4.0)
       1: selecting questionary (1.10.0)
    PyPI: 3 packages found for beautifulsoup4 >=4.10.0,<5.0.0
       1: fact: beautifulsoup4 (4.11.1) depends on soupsieve (>1.2)
       1: selecting beautifulsoup4 (4.11.1)
       1: derived: soupsieve (>1.2)
    PyPI: 17 packages found for wcwidth *
       1: selecting wcwidth (0.2.5)
    PyPI: 3 packages found for pluggy >=0.12,<1.0
       1: selecting pluggy (0.13.1)
    PyPI: 26 packages found for more-itertools >=4.0.0
       1: selecting more-itertools (8.13.0)
    PyPI: 13 packages found for attrs >=17.4.0
       1: selecting attrs (21.4.0)
    PyPI: 39 packages found for packaging *
       1: fact: packaging (21.3) depends on pyparsing (>=2.0.2,<3.0.5 || >3.0.5)
       1: selecting packaging (21.3)
       1: derived: pyparsing (>=2.0.2,!=3.0.5)
    PyPI: No release information found for py-0.8.0-alpha2, skipping
    PyPI: No release information found for py-0.9.0, skipping
    PyPI: No release information found for py-1.4.32.dev1, skipping
    PyPI: 12 packages found for py >=1.5.0
       1: selecting py (1.11.0)
    PyPI: 1 packages found for httpcore >=0.15.0,<0.16.0
       1: fact: httpcore (0.15.0) depends on h11 (>=0.11,<0.13)
       1: fact: httpcore (0.15.0) depends on sniffio (>=1.0.0,<2.0.0)
       1: fact: httpcore (0.15.0) depends on anyio (>=3.0.0,<4.0.0)
       1: fact: httpcore (0.15.0) depends on certifi (*)
       1: selecting httpcore (0.15.0)
       1: derived: anyio (>=3.0.0,<4.0.0)
       1: derived: sniffio (>=1.0.0,<2.0.0)
       1: derived: h11 (>=0.11,<0.13)
    PyPI: No release information found for rfc3986-0.0.0, skipping
    PyPI: 5 packages found for rfc3986 >=1.3,<2
       1: fact: rfc3986 (1.5.0) depends on rfc3986 (1.5.0)
       1: fact: rfc3986 (1.5.0) depends on idna (*)
       1: selecting rfc3986[idna2008] (1.5.0)
       1: derived: idna
       1: derived: rfc3986 (==1.5.0)
    PyPI: 3 packages found for sniffio >=1.0.0,<2.0.0
       1: selecting sniffio (1.2.0)
    PyPI: No release information found for certifi-0, skipping
    PyPI: 48 packages found for certifi *
       1: selecting certifi (2022.6.15)
    PyPI: 1 packages found for jarowinkler >=1.2.0,<2.0.0
       1: selecting jarowinkler (1.2.0)
    PyPI: 11 packages found for click >=7.1.1,<9.0.0
       1: fact: click (8.1.3) depends on colorama (*)
       1: selecting click (8.1.3)
    PyPI: 29 packages found for soupsieve >1.2
       1: selecting soupsieve (2.3.2.post1)
    PyPI: No release information found for pyparsing-1.1.2, skipping
    PyPI: No release information found for pyparsing-1.2, skipping
    PyPI: No release information found for pyparsing-1.3.3, skipping
    PyPI: 39 packages found for pyparsing >=2.0.2,<3.0.5 || >3.0.5
       1: selecting pyparsing (3.0.9)
    PyPI: 14 packages found for anyio >=3.0.0,<4.0.0
       1: fact: anyio (3.6.1) depends on idna (>=2.8)
       1: fact: anyio (3.6.1) depends on sniffio (>=1.1)
       1: selecting anyio (3.6.1)
       1: derived: idna (>=2.8)
    PyPI: No release information found for h11-0.0.1, skipping
    PyPI: 2 packages found for h11 >=0.11,<0.13
       1: selecting h11 (0.12.0)
    PyPI: No release information found for rfc3986-0.0.0, skipping
    PyPI: 1 packages found for rfc3986 1.5.0
       1: selecting rfc3986 (1.5.0)
    PyPI: 3 packages found for colorama >=0.4.3,<0.5.0
       1: selecting colorama (0.4.5)
    PyPI: 8 packages found for atomicwrites >=1.0
       1: selecting atomicwrites (1.4.1)
    PyPI: 4 packages found for shellingham >=1.3.0,<2.0.0
       1: selecting shellingham (1.4.0)
    PyPI: 25 packages found for rich >=10.11.0,<13.0.0
       1: fact: rich (12.5.1) depends on typing-extensions (>=4.0.0,<5.0)
       1: fact: rich (12.5.1) depends on pygments (>=2.6.0,<3.0.0)
       1: fact: rich (12.5.1) depends on commonmark (>=0.9.0,<0.10.0)
       1: selecting rich (12.5.1)
       1: derived: commonmark (>=0.9.0,<0.10.0)
       1: derived: pygments (>=2.6.0,<3.0.0)
       1: derived: typing-extensions (>=4.0.0,<5.0)
    PyPI: 2 packages found for commonmark >=0.9.0,<0.10.0
       1: selecting commonmark (0.9.1)
    PyPI: 15 packages found for pygments >=2.6.0,<3.0.0
       1: selecting pygments (2.12.0)
    PyPI: 1 packages found for typer 0.5.0
       1: fact: typer (0.5.0) depends on click (>=7.1.1,<9.0.0)
       1: selecting typer (0.5.0)
    PyPI: No release information found for idna-0.1, skipping
    PyPI: 7 packages found for idna >=2.8
       1: selecting idna (3.3)
    PyPI: 6 packages found for typing-extensions >=4.0.0,<5.0
       1: selecting typing-extensions (4.3.0)
       1: Version solving took 0.249 seconds.
       1: Tried 1 solutions.
    
    Writing lock file
    
    Finding the necessary packages for the current system
    
    Package operations: 1 install, 0 updates, 0 removals
    
      • Installing rapidfuzz (2.3.0): Pending...
      • Installing rapidfuzz (2.3.0): Failed
    
      RuntimeError
    
      Unable to find installation candidates for rapidfuzz (2.3.0)
    
      at ~/.local/lib/python3.8/site-packages/poetry/installation/chooser.py:72 in choose_for
           68│ 
           69│             links.append(link)
           70│ 
           71│         if not links:
        →  72│             raise RuntimeError(
           73│                 "Unable to find installation candidates for {}".format(package)
           74│             )
           75│ 
           76│         # Get the best link
    
    [email protected]:~/Documents/Projects/libretrofuzz$ 
    

    If i remove the ^ from rapidfuzz = "^2.0.8" it suddenly starts working, so it seems like one of the edits of the pyproject.toml (or something) broke building on pypi.

    opened by i30817 9
  • cant pip install using pypy on windows 10

    cant pip install using pypy on windows 10

    error code below

    × Building wheel for rapidfuzz (pyproject.toml) did not run successfully.
      Γöé exit code: 1
      Γò░ΓöÇ> [1852 lines of output]
          Not searching for unused variables given on the command line.
          -- The C compiler identification is MSVC 19.32.31332.0
          -- Detecting C compiler ABI info
          -- Detecting C compiler ABI info - done
          -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe - skipped
          -- Detecting C compile features
          -- Detecting C compile features - done
          -- The CXX compiler identification is MSVC 19.32.31332.0
          CMake Warning (dev) at C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/cmake/data/share/cmake-3.22/Modules/CMakeDetermineCXXCompiler.cmake:162 (if):
            Policy CMP0054 is not set: Only interpret if() arguments as variables or
            keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
            details.  Use the cmake_policy command to set the policy and suppress this
            warning.
    
            Quoted variables like "MSVC" will no longer be dereferenced when the policy
            is set to NEW.  Since the policy is not set the OLD behavior will be used.
          Call Stack (most recent call first):
            CMakeLists.txt:4 (ENABLE_LANGUAGE)
          This warning is for project developers.  Use -Wno-dev to suppress it.
    
          CMake Warning (dev) at C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/cmake/data/share/cmake-3.22/Modules/CMakeDetermineCXXCompiler.cmake:183 (elseif):
            Policy CMP0054 is not set: Only interpret if() arguments as variables or
            keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
            details.  Use the cmake_policy command to set the policy and suppress this
            warning.
    
            Quoted variables like "MSVC" will no longer be dereferenced when the policy
            is set to NEW.  Since the policy is not set the OLD behavior will be used.
          Call Stack (most recent call first):
            CMakeLists.txt:4 (ENABLE_LANGUAGE)
          This warning is for project developers.  Use -Wno-dev to suppress it.
    
          -- Detecting CXX compiler ABI info
          -- Detecting CXX compiler ABI info - done
          -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe - skipped
          -- Detecting CXX compile features
          -- Detecting CXX compile features - done
          -- Configuring done
          -- Generating done
          -- Build files have been written to: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_cmake_test_compile/build
          -- The C compiler identification is MSVC 19.32.31332.0
          -- The CXX compiler identification is MSVC 19.32.31332.0
          -- Detecting C compiler ABI info
          -- Detecting C compiler ABI info - done
          -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe - skipped
          -- Detecting C compile features
          -- Detecting C compile features - done
          -- Detecting CXX compiler ABI info
          -- Detecting CXX compiler ABI info - done
          -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe - skipped
          -- Detecting CXX compile features
          -- Detecting CXX compile features - done
          -- Found PythonInterp: C:/pypy/pypy.exe (found version "3.9.10")
          -- Could NOT find PythonLibs (missing: PYTHON_LIBRARIES) (found version "3.9.10")
          -- Found Python: C:/pypy/pypy.exe (found version "3.9.10") found components: Interpreter Development Development.Module Development.Embed
          Using packaged version of Taskflow
          -- CMAKE_ROOT: C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/cmake/data/share/cmake-3.22
          -- Looking for a CUDA compiler
          -- Looking for a CUDA compiler - NOTFOUND
          -- CMAKE_HOST_SYSTEM: Windows-10.0.19044
          -- CMAKE_BUILD_TYPE: Release
          -- CMAKE_CXX_COMPILER: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe
          -- CMAKE_CXX_COMPILER_ID: MSVC
          -- CMAKE_CXX_COMPILER_VERSION: 19.32.31332.0
          -- CMAKE_CXX_FLAGS: /DWIN32 /D_WINDOWS /W3 /GR /EHsc
          -- CMAKE_CUDA_COMPILER: NOTFOUND
          -- CMAKE_CUDA_COMPILER_ID:
          -- CMAKE_CUDA_COMPILER_VERSION:
          -- CMAKE_CUDA_FLAGS:
          -- CMAKE_MODULE_PATH: C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/skbuild/resources/cmake
          -- CMAKE_CURRENT_SOURCE_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow
          -- CMAKE_CURRENT_BINARY_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-build/extern/taskflow
          -- CMAKE_EXE_LINKER_FLAGS: /machine:x64
          -- CMAKE_INSTALL_PREFIX: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-install
          -- CMAKE_MODULE_PATH: C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/skbuild/resources/cmake
          -- CMAKE_PREFIX_PATH:
          -- PROJECT_NAME: Taskflow
          -- TF_BUILD_BENCHMARKS: OFF
          -- TF_BUILD_CUDA: OFF
          -- TF_BUILD_TESTS: OFF
          -- TF_BUILD_EXAMPLES: OFF
          -- TF_INC_INSTALL_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-install/include
          -- TF_LIB_INSTALL_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-install/lib
          -- TF_UTEST_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow/unittests
          -- TF_EXAMPLE_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow/examples
          -- TF_BENCHMARK_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow/benchmarks
          -- TF_3RD_PARTY_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow/3rd-party
          -- Looking for pthread.h
          -- Looking for pthread.h - not found
          -- Found Threads: TRUE
          Using packaged version of rapidfuzz-cpp
          Using packaged version of jaro_winkler
          -- Performing Test Weak Link MODULE -> SHARED (gnu_ld_ignore) - Failed
          -- Performing Test Weak Link MODULE -> SHARED (osx_dynamic_lookup) - Failed
          -- Performing Test Weak Link MODULE -> SHARED (no_flag) - Failed
          _modinit_prefix:PyInit_
          _modinit_prefix:PyInit_
          _modinit_prefix:PyInit_
          _modinit_prefix:PyInit_
          _modinit_prefix:PyInit_
          _modinit_prefix:PyInit_
          _modinit_prefix:PyInit_
          _modinit_prefix:PyInit_
          _modinit_prefix:PyInit_
          _modinit_prefix:PyInit_
          -- Configuring done
          -- Generating done
          CMake Warning:
            Manually-specified variables were not used by the project:
    
              PYTHON_NumPy_INCLUDE_DIRS
              Python3_EXECUTABLE
              Python3_INCLUDE_DIR
              Python3_LIBRARY
              Python3_NumPy_INCLUDE_DIRS
              Python_NumPy_INCLUDE_DIRS
              SKBUILD
    
    
          -- Build files have been written to: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-build
          [1/22] Building CXX object rapidfuzz\CMakeFiles\cpp_utils.dir\utils.cpp.obj
          cl : ???? ??? warning D9025 : '/W3' ?? '/W4' ????????
          [2/22] Building CXX object rapidfuzz\CMakeFiles\cpp_utils.dir\cpp_utils.cxx.obj
          cl : ???? ??? warning D9025 : '/W3' ?? '/W4' ????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(2519): warning C4100: '__pyx_self': ?????????? 1 ??????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3387): warning C4127: ?????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3399): warning C4127: ?????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3767): warning C4127: ?????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3779): warning C4127: ?????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(4869): warning C4127: ?????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(4911): warning C4127: ?????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6373): warning C4100: 'boundscheck': ?????????? 1 ??????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6372): warning C4100: 'wraparound': ?????????? 1 ??????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6391): warning C4100: 'boundscheck': ?????????? 1 ??????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6390): warning C4100: 'wraparound': ?????????? 1 ??????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6409): warning C4100: 'boundscheck': ?????????? 1 ??????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6408): warning C4100: 'wraparound': ?????????? 1 ??????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(8350): warning C4100: 'tstate': ?????????? 1 ??????????
          C:\pypy\Include\pypy_decl.h(24): warning C4505: 'PySlice_GetIndicesEx': ???????????????????????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3760) : warning C4702: ?????????????
          C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3380) : warning C4702: ?????????????
          [3/22] Linking CXX shared module rapidfuzz\cpp_utils.pypy39-pp73-win_amd64.pyd
          FAILED: rapidfuzz/cpp_utils.pypy39-pp73-win_amd64.pyd
          cmd.exe /C "cd . && C:\Users\Kaman\AppData\Local\Temp\pip-build-env-dk8raibg\overlay\Lib\site-packages\cmake\data\bin\cmake.exe -E vs_link_dll --intdir=rapidfuzz\CMakeFiles\cpp_utils.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x86\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x86\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2022\BUILDT~1\VC\Tools\MSVC\1432~1.313\bin\Hostx86\x64\link.exe /nologo rapidfuzz\CMakeFiles\cpp_utils.dir\cpp_utils.cxx.obj rapidfuzz\CMakeFiles\cpp_utils.dir\utils.cpp.obj  /out:rapidfuzz\cpp_utils.pypy39-pp73-win_amd64.pyd /implib:rapidfuzz\cpp_utils.lib /pdb:rapidfuzz\cpp_utils.pdb /dll /version:0.0 /machine:x64 /INCREMENTAL:NO /EXPORT:PyInit_cpp_utils  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib  && cd ."
          LINK: command "C:\PROGRA~2\MICROS~2\2022\BUILDT~1\VC\Tools\MSVC\1432~1.313\bin\Hostx86\x64\link.exe /nologo rapidfuzz\CMakeFiles\cpp_utils.dir\cpp_utils.cxx.obj rapidfuzz\CMakeFiles\cpp_utils.dir\utils.cpp.obj /out:rapidfuzz\cpp_utils.pypy39-pp73-win_amd64.pyd /implib:rapidfuzz\cpp_utils.lib /pdb:rapidfuzz\cpp_utils.pdb /dll /version:0.0 /machine:x64 /INCREMENTAL:NO /EXPORT:PyInit_cpp_utils kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:rapidfuzz\cpp_utils.pypy39-pp73-win_amd64.pyd.manifest" failed (exit code 1104) with the following output:
          LINK : fatal error LNK1104: ???? 'python39.lib' ????????????
          [4/22] Building CXX object rapidfuzz\distance\CMakeFiles\_initialize.dir\_initialize.cxx.obj
          cl : ???? ??? warning D9025 : '/W3' ?? '/W4' ????????
    
    bug 
    opened by TingTingin 9
  • Add BK Tree implementation

    Add BK Tree implementation

    It would make sense to add a BK Tree implementation for scorers which full fill the triangle inequality. This would provide massive performance improvements for things like searches.

    https://dl.acm.org/doi/10.1145/362003.362025

    enhancement performance 
    opened by maxbachmann 2
  • BUG: `None` can't work with `process.cdist`

    BUG: `None` can't work with `process.cdist`

    None will fail at process.cdist. But None is okay to fuzz.ratio.

    >>> from rapidfuzz import process
    >>> process.cdist(
    ...     ["hello", "world"],
    ...     ["hi", None],
    ... )
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Software\miniforge3\envs\dtoolkit\lib\site-packages\rapidfuzz\process_cpp.py", line 73, in cdist
        _cdist(
      File "src/rapidfuzz/process_cpp_impl.pyx", line 1508, in rapidfuzz.process_cpp_impl.cdist
      File "src/rapidfuzz/process_cpp_impl.pyx", line 1393, in rapidfuzz.process_cpp_impl.cdist_two_lists
      File "src/rapidfuzz/process_cpp_impl.pyx", line 1321, in rapidfuzz.process_cpp_impl.preprocess
      File "./src/rapidfuzz/cpp_common.pxd", line 332, in cpp_common.conv_sequence
      File "./src/rapidfuzz/cpp_common.pxd", line 300, in cpp_common.hash_sequence
    TypeError: object of type 'NoneType' has no len()
    
    enhancement 
    opened by Zeroto521 4
  • add SIMD support to more functions in the process module

    add SIMD support to more functions in the process module

    SIMD support is still missing for:

    • [ ] process.extractOne
    • [ ] process.extract
    • [ ] process.cdist when both sequences are similar.
    • [ ] process.extract_iter
    performance 
    opened by maxbachmann 0
  • add SIMD support for long sequences

    add SIMD support for long sequences

    for sequences with lengths over 64 characters it would still be possible to calculate the similarity for multiple sequences in parallel using simd. However for very long sequences it might be faster to compare individual sequences especially when a score_cutoff is specified

    performance 
    opened by maxbachmann 0
Releases(v2.13.7)
  • v2.13.7(Dec 20, 2022)

  • v2.13.6(Dec 11, 2022)

  • v2.13.5(Dec 10, 2022)

  • v2.13.4(Dec 8, 2022)

    Changed

    • handle float("nan") similar to None for query / choice, since this is common for non-existent data in tools like numpy

    Fixed

    • fix handling on None/float("nan") in process.distance
    • use absolute imports inside tests
    Source code(tar.gz)
    Source code(zip)
  • v2.13.3(Dec 3, 2022)

    Fixed

    • improve handling of functions wrapped using functools.wraps
    • fix broken fallback to Python implementation when the a ImportError occurs on import. This can e.g. occur when the binary has a dependency on libatomic, but it is unavailable on the system
    • define CMAKE_C_COMPILER_AR/CMAKE_CXX_COMPILER_AR/CMAKE_C_COMPILER_RANLIB/CMAKE_CXX_COMPILER_RANLIB if they are not defined yet
    Source code(tar.gz)
    Source code(zip)
  • v2.13.2(Nov 5, 2022)

    Fixed

    • fix incorrect results in Hamming.normalized_similarity
    • fix incorrect score_cutoff handling in pure python implementation of Postfix.normalized_distance and Prefix.normalized_distance
    • fix Levenshtein.normalized_similarity and Levenshtein.normalized_distance when used in combination with the process module
    • fuzz.partial_ratio was not always symmetric when len(s1) == len(s2)
    Source code(tar.gz)
    Source code(zip)
  • v2.13.1(Nov 3, 2022)

    Fixed

    • fix bug in normalized_similarity of most scorers, leading to incorrect results when used in combination with the process module
    • fix sse2 support
    • fix bug in JaroWinkler and Jaro when used in the pure python process module
    • forward kwargs in pure Python implementation of process.extract
    Source code(tar.gz)
    Source code(zip)
  • v2.13.0(Oct 29, 2022)

    Fixed

    • fix bug in Levenshtein.editops leading to crashes when used with score_hint

    Changed

    • moved capi from rapidfuzz_capi into rapidfuzz, since it will always succeed the installation now that there is a pure Python mode
    • add score_hint argument to process module
    • add score_hint argument to Levenshtein module
    Source code(tar.gz)
    Source code(zip)
  • v2.12.0(Oct 24, 2022)

  • v2.11.1(Oct 3, 2022)

  • v2.11.0(Oct 2, 2022)

    Changes

    • move jarowinkler dependency into rapidfuzz to simplify maintenance

    Performance

    • add SIMD implementation for fuzz.ratio/fuzz.QRatio/Levenshtein/Indel/LCSseq/OSA to improve performance for short strings in cdist
    Source code(tar.gz)
    Source code(zip)
  • v2.10.3(Sep 30, 2022)

    Fixed

    • use scikit-build=0.14.1 on Linux, since scikit-build=0.15.0 fails to find the Python Interpreter
    • workaround gcc bug in template type deduction
    Source code(tar.gz)
    Source code(zip)
  • v2.10.2(Sep 27, 2022)

  • v2.10.1(Sep 25, 2022)

  • v2.10.0(Sep 18, 2022)

  • v2.9.0(Sep 16, 2022)

  • v2.8.0(Sep 11, 2022)

    Fixed

    • fuzz.partial_ratio did not find the optimal alignment in some edge cases (#219)

    Performance

    • improve performance of fuzz.partial_ratio

    Changed

    • increased minimum C++ version to C++17 (see #255)
    Source code(tar.gz)
    Source code(zip)
  • v2.7.0(Sep 11, 2022)

    Performance

    • improve performance of Levenshtein.distance/Levenshtein.editops for long sequences.

    Added

    • add score_hint parameter to Levenshtein.editops which allows the use of a faster implementation

    Changed

    • all functions in the string_metric module do now raise a deprecation warning. They are now only wrappers for their replacement functions, which makes them slower when used with the process module
    Source code(tar.gz)
    Source code(zip)
  • v2.6.1(Sep 4, 2022)

  • v2.6.0(Aug 20, 2022)

    Fixed

    • fix hashing for custom classes

    Added

    • add support for slicing in Editops.__getitem__/Editops.__delitem__
    • add DamerauLevenshtein module
    Source code(tar.gz)
    Source code(zip)
  • v2.5.0(Aug 14, 2022)

    Added

    • added support for KeyboardInterrupt in processor module It might still take a bit until the KeyboardInterrupt is registered, but no longer runs all text comparisions after pressing Ctrl + C

    Fixed

    • fix default scorer used by cdist to use C++ implementation if possible
    Source code(tar.gz)
    Source code(zip)
  • v2.4.4(Aug 12, 2022)

  • v2.4.3(Aug 8, 2022)

    Fixed

    • fix value range of jaro_similarity/jaro_winkler_similarity in the pure Python mode for the string_metric module
    • fix missing atomic symbol on arm 32 bit
    Source code(tar.gz)
    Source code(zip)
  • v2.4.2(Jul 30, 2022)

  • v2.4.1(Jul 29, 2022)

  • v2.4.0(Jul 29, 2022)

    Fixed

    • fix banded Levenshtein implementation

    Performance

    • improve performance and memory usage of Levenshtein.editops
      • memory usage is reduced from O(NM) to O(N)
      • performance is improved for long sequences
    Source code(tar.gz)
    Source code(zip)
  • v2.3.0(Jul 22, 2022)

    Added

    • add as_matching_blocks to Editops/Opcodes
    • add support for deletions from Editops
    • add Editops.apply/Opcodes.apply
    • add Editops.remove_subsequence

    Changed

    • merge adjacent similar blocks in Opcodes

    Fixed

    • fix usage of eval(repr(Editop)), eval(repr(Editops)), eval(repr(Opcode)) and eval(repr(Opcodes))
    • fix opcode conversion for empty source sequence
    • fix validation for empty Opcode list passed into Opcodes.__init__
    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Jul 19, 2022)

  • v2.1.4(Jul 17, 2022)

    Changed

    • changed internal implementation of cdist to remove build dependency to numpy

    Added

    • added wheels for musllinux and manylinux ppc64le, s390x
    Source code(tar.gz)
    Source code(zip)
  • v2.1.3(Jul 9, 2022)

Retrying library for Python

Tenacity Tenacity is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Julien Danjou 4.3k Jan 05, 2023
Two fast AUC calculation implementations for python

fastauc Two fast AUC calculation implementations for python: python-based is approximately 5X faster than the default sklearn.metrics.roc_auc_score()

Vsevolod Kompantsev 26 Dec 11, 2022
This is Cool Utility tools that you can use in python.

This is Cool Utility tools that you can use in python. There are a few tools that you might find very useful, you can use this on pretty much any project and some utils might help you a lot and save

Senarc Studios 6 Apr 18, 2022
This project is a set of programs that I use to create a README.md file.

This project is a set of programs that I use to create a README.md file.

Tom Dörr 223 Dec 24, 2022
Python module and its web equivalent, to hide text within text by manipulating bits

cacherdutexte.github.io This project contains : Python modules (binary and decimal system 6) with a dedicated tkinter program to use it. A web version

2 Sep 04, 2022
Grank is a feature-rich script that automatically grinds Dank Memer for you

Grank Inspired by this repository. This is a WIP and there will be more functions added in the future. What is Grank? Grank is a feature-rich script t

42 Jul 20, 2022
Build capture utility for Linux

CX-BUILD Compilation Database alternative Build Prerequisite the CXBUILD uses linux system call trace utility called strace which was customized. So I

GLaDOS (G? L? Automatic Debug Operation System) 3 Nov 03, 2022
NFT-Generator is the best way to generate thousands of NFTs quick and easily with Python.

NFT-Generator is the best way to generate thousands of NFTs quick and easily with Python. Just add your files, set your configuration and run the scri

78 Dec 27, 2022
Helpful functions for use alongside the rich Python library.

🔧 Rich Tools A python package with helpful functions for use alongside with the rich python library. 󠀠󠀠 The current features are: Convert a Pandas

Avi Perl 14 Oct 14, 2022
Simple integer-valued time series bit packing

Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit

Ghiles Meddour 7 Aug 27, 2021
Tool for generating Memory.scan() compatible instruction search patterns

scanpat Tool for generating Frida Memory.scan() compatible instruction search patterns. Powered by r2. Examples $ ./scanpat.py arm.ks:64 'sub sp, sp,

Ole André Vadla Ravnås 13 Sep 19, 2022
A simple and easy to use Spam Bot made in Python!

This is a simple spam bot made in python. You can use to to spam anyone with anything on any platform.

7 Sep 08, 2022
Create a Web Component (a Custom Element) from a python file

wyc Create a Web Component (a Custom Element) from a python file (transpile python code to javascript (es2015)). Features Use python to define your cu

7 Oct 09, 2022
Password generator

Password generator technologies used What is? It is Password generator How to Download? Download on releases Clone repo git clone https://github.com/m

Miek 1 Nov 02, 2021
Nmap script to guess* a GitLab version.

gitlab-version-nse Nmap script to guess* a GitLab version. Usage https://github.com/righel/gitlab-version-nse cd gitlab-version-nse nmap target --s

Luciano Righetti 120 Dec 05, 2022
An online streamlit development platform

streamlit-playground An online streamlit development platform Run, Experiment and Play with streamlit Components Develop full-fledged apps online All

Akshansh Kumar 3 Nov 06, 2021
Python script to get some stats on nodes in a Blender material nodetree

Python script to get some stats on nodes in a Blender material nodetree. It counts the nodes, the node types and the max deep level for group nodes.

Alek Mugnozzo 2 Sep 03, 2022
A python lib for generate random string and digits and special characters or A combination of them

A python lib for generate random string and digits and special characters or A combination of them

Torham 4 Nov 15, 2022
one_click_kag_server is a program which tries to fully automate the creation of a King Arthur's Gold server.

one_click_kag_server is a program which tries to fully automate the creation of a King Arthur's Gold server.

Benjamin Gorman 4 Jan 05, 2022
Tool to produce system call tables from Linux source code.

Syscalls Tool to generate system call tables from the linux source tree. Example The following will produce a markdown (.md) file containing the table

7 Jul 30, 2022