Fuzzy String Matching in Python

Overview
https://travis-ci.org/seatgeek/fuzzywuzzy.svg?branch=master

FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Requirements

For testing

  • pycodestyle
  • hypothesis
  • pytest

Installation

Using PIP via PyPI

pip install fuzzywuzzy

or the following to install python-Levenshtein too

pip install fuzzywuzzy[speedup]

Using PIP via Github

pip install git+git://github.com/seatgeek/[email protected]#egg=fuzzywuzzy

Adding to your requirements.txt file (run pip install -r requirements.txt afterwards)

git+ssh://[email protected]/seatgeek/[email protected]#egg=fuzzywuzzy

Manually via GIT

git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
cd fuzzywuzzy
python setup.py install

Usage

>>> from fuzzywuzzy import fuzz
>>> from fuzzywuzzy import process

Simple Ratio

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")
    100

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    100

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
    [('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
    ("Dallas Cowboys", 90)

You can also pass additional parameters to extractOne method to make it use a specific scorer. A typical use case is to match file paths:

>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
    ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

Known Ports

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

Comments
  • Incompatible License

    Incompatible License

    I see you've got issue #113 closed, however a simple reading of the GPL linked in the StringMatcher file doesn't just imply, but in fact definitively states, that your project must be licensed as GPL.

    1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.

    You have not copied the entire source verbatim, you've copied a select portion and incorporated it into a further derived work.

    2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:

    a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.

    b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.

    Emphasis added. When you decided to copy in a portion of the code from python-levenshtein you incidentally selected the GPL license for your project, and it's been apparently licensed as MIT / X11, but effectively and legally licensed as GPL, ever since.

    I say this as someone who has a company project that's now been copy-lefted by your code, albeit accidentally and inadvertently.

    opened by yawpitch 37
  • List supported python versions, bump minor

    List supported python versions, bump minor

    Now that #33 has been merged in, fuzzywuzzy looks good to go for python 3 compatibility. Might as well advertise this in the setup.py / on pypi. Version bumped so a new version can be pushed to pypi.

    Cheers!

    opened by JeffPaine 14
  • Remove query processing and default processor

    Remove query processing and default processor

    Added test case to check that using a processor of the form lambda x: x['key'] doesn't fail.

    Set default_processor to None and do not run processor on query.

    Remove test case that checked if string reduced to 0 by processor, as string will no longer be processed.

    Adjusted test cases in test_fuzzywuzzy_hypothesis to not use a processor, but not fail if scorer reduces query to empty string. If the user supplies a processor that modifies the choice so that it is no longer an exact match for the query, not finding an exact match would be the expected behavior.

    Saw some relevant discussion around issues #77 and #141 etc. If processor doesn't run on the query, which I feel it CAN NOT, then processor must also default to None to avoid unexpected behaviors.

    (also have a separate branch that only adds the new test case fwiw)

    opened by nol13 13
  • UserWarning: Using slow pure-python SequenceMatcher

    UserWarning: Using slow pure-python SequenceMatcher

    C:\Python27\lib\site-packages\fuzzywuzzy\fuzz.py:33: UserWarning: Using slow pure-python SequenceMatcher
      warnings.warn('Using slow pure-python SequenceMatcher')
    

    The line of code I have for importing is from fuzzywuzzy import fuzz.

    I'm running Python 2.7.8 on Windows 8, pip 1.5.6, and fuzzywuzzy 0.4.0.

    opened by sylvia43 13
  • Upload latest version to pypi

    Upload latest version to pypi

    Could we kindly upload the latest version to pypi (directions)?

    I'd like to use this library on a python 3 project where using pip install would be greatly appreciated :smile: Cheers!

    opened by JeffPaine 13
  • ImportError: cannot import name fuzz

    ImportError: cannot import name fuzz

    Below is more information.

    $> sudo pip install fuzzywuzzy   #works, no error
    
    $> vi mytest.py
    from fuzzywuzzy import fuzz
    from fuzzywuzzy import process
    
    $>python mytest.py
    ...
    ImportError: cannot import name fuzz
    
    opened by harishvc 12
  • Fuzzy install via pip in Conda environment,  python-Levenshtein warning

    Fuzzy install via pip in Conda environment, python-Levenshtein warning

    Hi...

    I'm new to Conda, and just re-installed fuzzywuzzy using conda version of pip. python-Levenshtein is also install according to conda.

    I'm using Python 3.3.

    I'm not sure what to do about this...

    Cheers !

    opened by dpcuneowcg 12
  • Fuzzywuzzy 4x slower *with* python-Levenshtein installed

    Fuzzywuzzy 4x slower *with* python-Levenshtein installed

    According to the intro page: "python-Levenshtein (optional, provides a 4-10x speedup in String Matching)"

    I am fuzzy matching paragraphs with their best match from a significant list of paragraphs. When I use fuzzywuzzy without python-Levenshtein on Ubuntu, I receive the error that I should install it for a speed-up, and it took about 45 mins on a test set of paragraphs:

    real 46m56.813s user 125m45.952s sys 0m0.432s

    When I then installed python-Levenshtein, the error went away as expected, but running the same example set of paragraphs took over 200 minutes:

    real 204m51.384s user 522m49.436s sys 0m21.376s

    The results are identical except for one paragraph, which does match better with the slower, python-Levenshtein version. While I haven't timed an example, doing the same on the mac also seems significantly slower with python-Levenshtein installed.

    The error message when python-Levenshtein is not installed says that the reason to install the package is for speed, so to me something isn't right: either it shouldn't be slower, or the error message should change to say that it should be installed for increased accuracy (if that is the case).

    needs more info 
    opened by Hooloovoo 11
  • Not python3 compliant

    Not python3 compliant

    I get the following error:

    File "/usr/lib/python3.4/site-packages/fuzzywuzzy/fuzz.py", line 49, in ratio
        s1, s2 = utils.make_type_consistent(s1, s2)
      File "/usr/lib/python3.4/site-packages/fuzzywuzzy/utils.py", line 43, in make_type_consistent
        elif isinstance(s1, unicode) and isinstance(s2, unicode):
    NameError: name 'unicode' is not defined
    
    opened by ashneo76 11
  • Support for other langauges

    Support for other langauges

    Hi, First of all, thanks for maintaining this. I just noticed that both token_sort_ratio and token_set_ratio don't support Arabic characters. I don't know about other non-English ones but at lease they don't support Arabic.. It's returning 0 as a result of comparing anything with Arabic string. Even if they were 2 Arabic strings..

    >>> print fuzz.token_sort_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم")
    0
    >>> print fuzz.partial_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم")
    100
    
    

    So I'm just wondering if this's a bug or it simply just doesn't support non-English characters? Thanks

    opened by tester88 10
  • Fix for Python 3.7

    Fix for Python 3.7

    Fixes #233

    According to PEP 479, if raise StopIteration occurs directly in a generator, simply replace it with return.

    This is both backwards and forwards compatible code.

    opened by hb-alexbotello 9
  • NameError: name 'ratio' is not defined

    NameError: name 'ratio' is not defined

    While running the Fuzzy Wuzzy process.extract() method the following error is thrown :-

        matches = fw_process.extract(
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:168: in extract
        return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
    /usr/lib/python3.8/heapq.py:563: in nlargest
        result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
    /usr/lib/python3.8/heapq.py:563: in <listcomp>
        result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:117: in extractWithoutOrder
        score = scorer(processed_query, processed)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:276: in WRatio
        base = ratio(p1, p2)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:38: in decorator
        return func(*args, **kwargs)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:29: in decorator
        return func(*args, **kwargs)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:47: in decorator
        return func(*args, **kwargs)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:28: in ratio
        return utils.intr(100 * m.ratio())
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    self = <fuzzywuzzy.StringMatcher.StringMatcher object at 0x7f115ab32160>
    
        def ratio(self):
            if not self._ratio:
    >           self._ratio = ratio(self._str1, self._str2)
    E           NameError: name 'ratio' is not defined
    
    
    

    I found that pinning the python-Levenshtein version to 0.12.2 solve the issue.

    opened by DivanshuTak 1
  • How to decrease False positive matches? (process.extract / WRatio)

    How to decrease False positive matches? (process.extract / WRatio)

    I am using process.extract method, And I know it uses WRatio under the hood for calculating score. Following is the case in which I am getting very high score of 90 despite the string hardly equal. Is there any way to fix this in WRatio?

    inp_name="america"
    
    name_list=["american Futures and Options Exchange"]
            
    process.extractOne(inp_name,name_list)
    
    

    Output--> ('american Futures and Options Exchange', 90.0, 0)

    PS: I know other alternatives likes fuzz.ratio, partial_ratio, token_sort_ratio. But WRatio works pretty well for my usecase. So any workaround for the same would be appreciated... Thanks!

    opened by Pranav082001 3
  • 'list' object has no attribute 'items'

    'list' object has no attribute 'items'

    When trying to get the data of such a string 'hello 𝙎𝙈𝙈 world' using token_set_ratio(), no problems arise, but there is an error when calling process.extract().

    If you remove the incomprehensible characters "SMM" from the line, then there is no error

    Example:

    strtest = 'hello 𝙎𝙈𝙈 world'
    stroka = "word"
    print(str(fuzz.token_set_ratio(stroka, strtest))) # OK
    for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
        pass
    

    Error:

    Traceback (most recent call last):
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 108, in extractWithoutOrder
        for key, choice in choices.items():
    AttributeError: 'list' object has no attribute 'items'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "C:\Users\Alexey\Documents\fuzzywuzzy\index.py", line 191, in <module>
        for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 168, in extract
        return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\heapq.py", line 531, in nlargest
        result = max(it, default=sentinel, key=key)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 117, in extractWithoutOrder
        score = scorer(processed_query, processed)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 288, in WRatio
        partial = partial_ratio(p1, p2) * partial_scale
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 38, in decorator
        return func(*args, **kwargs)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 29, in decorator
        return func(*args, **kwargs)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 47, in decorator
        return func(*args, **kwargs)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 47, in partial_ratio
        blocks = m.get_matching_blocks()
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\StringMatcher.py", line 58, in get_matching_blocks
        self._matching_blocks = matching_blocks(self.get_opcodes(),
    ValueError: apply_edit edit operations are invalid or inapplicable
    
    opened by syfulin 0
  • Removed a manual file handler pitfall

    Removed a manual file handler pitfall

    The problem There was a case where the code was using a manual file handler pitfall, where a file stream was being opened and closed manually. But since Python supports automatic stream closing using the block 'with', its better to use it instead of the manual close in order to remove a bug vector.

    Solution Refactored the code to remove the manual file handler

    opened by NaelsonDouglas 0
  • token_set_ratio Degenerate Case

    token_set_ratio Degenerate Case

    Referring to the description of token_set_ratio in the original blog post: if the SORTED_INTERSECTION is a strict subset of STRING2, the result ratio will be 100. E.g.,

    fuzz.token_set_ratio("Deep Learning", "Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2")
    

    yields 100. This is patently incorrect, and does not uphold the purported intuition ("because the SORTED_INTERSECTION component is always exactly the same, the scores increase when (a) that makes up a larger percentage of the full string, and (b) the string remainders are more similar").

    Looking at fuzz._token_set, we see that it returns

    max(
        [
            ratio_func(sorted_sect, combined_1to2),
            ratio_func(sorted_sect, combined_2to1),
            ratio_func(combined_1to2, combined_2to1)
        ]
    )
    

    It appears the assumption is that the string remainder will never be empty. Perhaps something like this is more appropriate:

    max(
        [
            0 if sorted_sect == combined_1to2 else ratio_func(sorted_sect, combined_1to2),
            0 if sorted_sect == combined_2to1 else ratio_func(sorted_sect, combined_2to1),
            ratio_func(combined_1to2, combined_2to1)
        ]
    )
    
    opened by rogerrohrbach 0
  • Mark repository as archived

    Mark repository as archived

    since this repo seems to haven been depercated I suggest to mark it as read-only in the settings. This also displays a banner on top of the page, which may be even easier to catch than the readme …

    opened by Bibo-Joshi 0
Releases(0.18.0)
  • 0.16.0(Dec 18, 2017)

    • Add punctuation characters back in so process does something. [davidcellis]

    • Simpler alphabet and even fewer examples. [davidcellis]

    • Fewer examples and larger deadlines for Hypothesis. [davidcellis]

    • Slightly more examples. [davidcellis]

    • Attempt to fix the failing 2.7 and 3.6 python tests. [davidcellis]

    • Readme: add link to C++ port. [Lizard]

    • Fix tests on Python 3.3. [Jon Banafato]

      Modify tox.ini and .travis.yml to install enum34 when running with Python 3.3 to allow hypothesis tests to pass.

    • Normalize Python versions. [Jon Banafato]

      • Enable Travis-CI tests for Python 3.6
      • Enable tests for all supported Python versions in tox.ini
      • Add Trove classifiers for Python 3.4 - 3.6 to setup.py

      Note: Python 2.6 and 3.3 are no longer supported by the Python core team. Support for these can likely be dropped, but that's out of scope for this change set.

    • Fix typos. [Sven-Hendrik Haase]

    Source code(tar.gz)
    Source code(zip)
  • 0.15.1(Sep 21, 2017)

    • Fix setup.py (addresses #155) [Paul O'Leary McCann]

    • Merge remote-tracking branch 'upstream/master' into extract_optimizations. [nolan]

    • Seed random before generating benchmark strings. [nolan]

    • Cleaner implementation of same idea without new param, but adding existing full_process param to Q,W,UQ,UW. [nolan]

    • Fix benchmark only generate list once. [nolan]

    • Only run util.full_process once on query when using extract functions, add new benchmarks. [nolan]

    Source code(tar.gz)
    Source code(zip)
  • 0.15.0(Sep 21, 2017)

    • Add extras require to install python-levenshtein optionally. [Rolando Espinoza]

      This allows to install python-levenshtein as dependency.

    • Fix link formatting in the README. [Alex Chan]

    • Add fuzzball.js JavaScript port link. [nolan]

    • Added Rust Port link. [Logan Collins]

    • Validate_string docstring. [davidcellis]

    • For full comparisons test that ONLY exact matches (after processing) are added. [davidcellis]

    • Add detailed docstrings to WRatio and QRatio comparisons. [davidcellis]

    Source code(tar.gz)
    Source code(zip)
  • 0.14.0(Feb 20, 2017)

    • Possible PEP-8 fix + make pep-8 warnings appear in test. [davidcellis]
    • Possible PEP-8 fix. [davidcellis]
    • Possible PEP-8 fix. [davidcellis]
    • Test for stderr log instead of warning. [davidcellis]
    • Convert warning.warn to logging.warning. [davidcellis]
    • Additional details for empty string warning from process. [davidcellis]
    • Enclose warnings.simplefilter() inside a with statement. [samkennerly]
    Source code(tar.gz)
    Source code(zip)
  • 0.13.0(Feb 20, 2017)

    • Support alternate git status output. [Jose Diaz-Gonzalez]
    • Split warning test into new test file, added to travis execution on 2.6 / pypy3. [davidcellis]
    • Remove hypothesis examples database from gitignore. [davidcellis]
    • Add check for warning to tests. [davidcellis]
    • Check processor and warn before scorer may remove processor. [davidcellis]
    • Renamed test - tidied docstring. [davidcellis]
    • Add token ratios to the list of scorers that skip running full_process as a processor. [davidcellis]
    • Added tokex_sort, token_set to test. [davidcellis]
    • Test docstrings/comments. [davidcellis]
    • Added py.test .cache/ removed duplicated build from gitignore. [davidcellis]
    • Added default_scorer, default_processor parameters to make it easier to change in the future. [davidcellis]
    • Rewrote extracts to explicitly use default values for processor and scorer. [davidcellis]
    • Changed Hypothesis tests to use pytest parameters. [davidcellis]
    • Added Hypothesis based tests for identical strings. [Ducksual]
    • Added test for simple 'a, b' string on process.extractOne. [Ducksual]
    • Process the query in process.extractWithoutOrder when using a scorer which does not do so. [Ducksual]
    • Mention that difflib and levenshtein results may differ. [Jose Diaz-Gonzalez]
    Source code(tar.gz)
    Source code(zip)
  • 0.12.0(Sep 14, 2016)

  • 0.11.1(Sep 14, 2016)

    • Add editorconfig. [Jose Diaz-Gonzalez]
    • Added tox.ini cofig file for easy local multi-environment testing changed travis config to use py.test like tox updated use of pep8 module to pycodestyle. [Pedro Rodrigues]
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Jun 30, 2016)

    • Clean-up. [desmaisons_david]

    • Improving performance. [desmaisons_david]

    • Performance Improvement. [desmaisons_david]

    • Fix link to Levenshtein. [Brian J. McGuirk]

    • Fix readme links. [Brian J. McGuirk]

    • Add license to StringMatcher.py. [Jose Diaz-Gonzalez]

      Closes #113

    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Jun 30, 2016)

  • 0.9.0(Jun 30, 2016)

  • 0.8.2(Jun 30, 2016)

  • 0.8.1(Jun 30, 2016)

  • 0.8.0(Nov 16, 2015)

    • Refer to Levenshtein distance in readme. Closes #88. [Jose Diaz-Gonzalez]

    • Added install step for travis to have pep8 available. [Pedro Rodrigues]

    • Added a pep8 test. The way I add the error 501 to the ignore tuple is probably wrong but from the docs and source code of pep8 I could not find any other way. [Pedro Rodrigues]

      I also went ahead and removed the pep8 call from the release file.

    • Added python 3.5, pypy, and ypyp3 to the travis config file. [Pedro Rodrigues]

    • Added another step to the release file to run the tests before releasing. [Pedro Rodrigues]

    • Fixed a few pep8 errors Added a verification step in the release automation file. This step should probably be somewhere at git level. [Pedro Rodrigues]

    • Pep8. [Pedro Rodrigues]

    • Leaving TODOs in the code was never a good idea. [Pedro Rodrigues]

    • Changed return values to be rounded integers. [Pedro Rodrigues]

    • Added a test with the recovered data file. [Pedro Rodrigues]

    • Recovered titledata.csv. [Pedro Rodrigues]

    • Move extract test methods into the process test. [Shale Craig]

      Somehow, they ended up in the RatioTest, despite asserting that the ProcessTest works.

    Source code(tar.gz)
    Source code(zip)
  • 0.7.0(Oct 2, 2015)

    • Use portable syntax for catching exception on tests. [Luis Madrigal]

    • [Fix] test against correct variable. [Luis Madrigal]

    • Add unit tests for validator decorators. [Luis Madrigal]

    • Move validators to decorator functions. [Luis Madrigal]

      This allows easier composition and IMO makes the functions more readable

    • Fix typo: dictionery -> dictionary. [shale]

    • FizzyWuzzy -> FuzzyWuzzy typo correction. [shale]

    • Add check for gitchangelog. [Jose Diaz-Gonzalez]

    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Sep 3, 2015)

  • 0.6.1(Sep 3, 2015)

  • 0.6.0(Jul 20, 2015)

    • Added link to a java port. [Andriy Burkov]

    • Patched "name 'unicode' is not defined" python3. [Carlos Garay]

      https://github.com/seatgeek/fuzzywuzzy/issues/80

    • Make process.extract accept {dict, list}-like choices. [Nathan Typanski]

      Previously, process.extract expected lists or dictionaries, and tested this with isinstance() calls. In keeping with the spirit of Python (duck typing and all that), this change enables one to use extract() on any dict-like object for dict-like results, or any list-like object for list-like results.

      So now we can (and, indeed, I've added tests for these uses) call extract() on things like:

      • a generator of strings ("any iterable")
      • a UserDict
      • custom user-made classes that "look like" dicts (or, really, anything with a .items() method that behaves like a dict)
      • plain old lists and dicts

      The behavior is exactly the same for previous use cases of lists-and-dicts.

      This change goes along nicely with PR #68, since those docs suggest dict-like behavior is valid, and this change makes that true.

    • Merge conflict. [Adam Cohen]

    • Improve docs for fuzzywuzzy.process. [Nathan Typanski]

      The documentation for this module was dated and sometimes inaccurate. This overhauls the docs to accurately describe the current module, including detailing optional arguments that were not previously explained - e.g., limit argument to extract().

      This change follows the Google Python Style Guide, which may be found at:

      https://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Comments#Comments

    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 20, 2015)

    • FIX: 0.4.0 is released, no need to specify 0.3.1 in README. [Josh Warner (Mac)]

    • Fixed a small typo. [Rostislav Semenov]

    • Reset processor and scorer defaults to None with argument checking. [foxxyz]

    • Catch generators without lengths. [Jeremiah Lowin]

    • Fixed python3 issue and deprecated assertion method. [foxxyz]

    • Fixed some docstrings, typos, python3 string method compatibility, some errors that crept in during rebase. [foxxyz]

    • [mod] The lamdba in extract is not needed. [Olivier Le Thanh Duong]

      [mod] Pass directly the defaults functions in the args

      [mod] itertools.takewhile() can handle empty list just fine no need to test for it

      [mod] Shorten extractOne by removing double if

      [mod] Use a list comprehention in extract()

      [mod] Autopep8 on process.py

      [doc] Document make_type_consistent

      [mod] bad_chars shortened

      [enh] Move regex compilation outside the method, otherwhise we don't get the benefit from it

      [mod] Don't need all the blah just to redefine method from string module

      [mod] Remove unused import

      [mod] Autopep8 on string_processing.py

      [mod] Rewrote asciidammit without recursion to make it more readable

      [mod] Autopep8 on utils.py

      [mod] Remove unused import

      [doc] Add some doc to fuzz.py

      [mod] Move the code to sort string in a separate function

      [doc] Docstrings for WRatio, UWRatio

    • Add note on which package to install. Closes #67. [Jose Diaz-Gonzalez]

    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Oct 31, 2014)

    • Merge pull request #64 from ojomio/master. [Jose Diaz-Gonzalez]

      In extarctBests() and extractOne() use '>=' instead of '>'

    • Merge pull request #62 from ojomio/master. [Jose Diaz-Gonzalez]

      Fixed python3 issue with SequenceMatcher import

    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Oct 22, 2014)

    • Update release script to make it more generic. [Jose Diaz-Gonzalez]
    • Merge pull request #60 from ojomio/master. [Jose Diaz-Gonzalez] Fixed issue #59 - "partial" parameter for _token_set() is now honored
    • Merge pull request #54 from jlowin/patch-1. [Jose Diaz-Gonzalez] Remove explicit check for lists
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(Sep 12, 2014)

    • Make release command an executable. [Jose Diaz-Gonzalez]
    • Simplify MANIFEST.in. [Jose Diaz-Gonzalez]
    • Add a release script. [Jose Diaz-Gonzalez]
    • Fix readme codeblock. [Jose Diaz-Gonzalez]
    • Minor formatting. [Jose Diaz-Gonzalez]
    • Update readme with proper installation notes. [Jose Diaz-Gonzalez]
    • Use version from fuzzywuzzy package. [Jose Diaz-Gonzalez]
    • Set version constant in init.py. [Jose Diaz-Gonzalez]
    • Update setup.py. [Jose Diaz-Gonzalez]
    • Rename LICENSE to LICENSE.txt. [Jose Diaz-Gonzalez]
    • Update packaging a bit. [Jose Diaz-Gonzalez]
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Aug 24, 2014)

    • Allow choices to be a list or dict
    • Add testing for 3.4
    • Typo updates
    • Update readme, change formatting to RST
    • Fix package requirements
    • PEP8!
    Source code(tar.gz)
    Source code(zip)
Just a Basic like Language for Zeno INC

zeno-basic-language Just a Basic like Language for Zeno INC This is written in 100% python. this is basic language like language. so its not for big p

Voidy Devleoper 1 Dec 18, 2021
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 193 Dec 30, 2022
Contract Understanding Atticus Dataset

Contract Understanding Atticus Dataset This repository contains code for the Contract Understanding Atticus Dataset (CUAD), a dataset for legal contra

The Atticus Project 273 Dec 17, 2022
2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

繁體中文場景文字辨識 程式碼說明 組別:這就是我 成員:蔣明憲 唐碩謙 黃玥菱 林冠霆 蕭靖騰 目錄 環境套件 安裝方式 資料夾布局 前處理-製作偵測訓練註解檔 前處理-製作分類訓練樣本 part.py : 從 json 裁切出分類訓練樣本 Class.py : 將切出來的樣本按照文字分類到各資料夾

HuanyueTW 3 Jan 14, 2022
T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets (product titles, images, comments, etc.).

55 Nov 22, 2022
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Implementation of some unbalanced loss for NLP task like focal_loss, dice_loss, DSC Loss, GHM Loss et.al Summary Here is a loss implementation reposit

121 Jan 01, 2023
Topic Inference with Zeroshot models

zeroshot_topics Table of Contents Installation Usage License Installation zeroshot_topics is distributed on PyPI as a universal wheel and is available

Rita Anjana 55 Nov 28, 2022
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 04, 2023
Hostapd-mac-tod-acl - Setup a hostapd AP with MAC ToD ACL

A brief explanation This script provides a quick way to setup a Time-of-day (Tod

2 Feb 03, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 07, 2023
VoiceFixer VoiceFixer is a framework for general speech restoration.

VoiceFixer VoiceFixer is a framework for general speech restoration. We aim at the restoration of severly degraded speech and historical speech. Paper

Leo 174 Jan 06, 2023
Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Sonnet finder Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet. Usage This is a Python scrip

Marcel Bollmann 11 Sep 25, 2022
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python) 日本語は以下に続きます (Japanese follows) English: This book is written in Japanese and primaril

Ryuichi Yamamoto 189 Dec 29, 2022
An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | 中文 Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

Fan 137 Oct 26, 2022
L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

L3Cube-MahaCorpus L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources. We expand the existing Marathi monolingual

21 Dec 17, 2022
Use the power of GPT3 to execute any function inside your programs just by giving some doctests

gptrun Don't feel like coding today? Use the power of GPT3 to execute any function inside your programs just by giving some doctests. How is this diff

Roberto Abdelkader Martínez Pérez 11 Nov 11, 2022
Local cross-platform machine translation GUI, based on CTranslate2

DesktopTranslator Local cross-platform machine translation GUI, based on CTranslate2 Download Windows Installer You can either download a ready-made W

Yasmin Moslem 29 Jan 05, 2023
This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

NERphilosophy 👋 Welcome to the github repository of my BsC thesis. This repository contains (not all) code from my project on Named Entity Recognitio

Ruben 1 Jan 27, 2022
Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

Wake Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec Abstract استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از word2vec ب

Omid Hajipoor 1 Dec 17, 2021
Main repository for the chatbot Bobotinho.

Bobotinho Bot Main repository for the chatbot Bobotinho. ℹ️ Introduction Twitch chatbot with entertainment commands. ‎ 💻 Technologies Concurrent code

Bobotinho 14 Nov 29, 2022