Fuzzy String Matching in Python

Overview
https://travis-ci.org/seatgeek/fuzzywuzzy.svg?branch=master

FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Requirements

For testing

  • pycodestyle
  • hypothesis
  • pytest

Installation

Using PIP via PyPI

pip install fuzzywuzzy

or the following to install python-Levenshtein too

pip install fuzzywuzzy[speedup]

Using PIP via Github

pip install git+git://github.com/seatgeek/[email protected]#egg=fuzzywuzzy

Adding to your requirements.txt file (run pip install -r requirements.txt afterwards)

git+ssh://[email protected]/seatgeek/[email protected]#egg=fuzzywuzzy

Manually via GIT

git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
cd fuzzywuzzy
python setup.py install

Usage

>>> from fuzzywuzzy import fuzz
>>> from fuzzywuzzy import process

Simple Ratio

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")
    100

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    100

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
    [('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
    ("Dallas Cowboys", 90)

You can also pass additional parameters to extractOne method to make it use a specific scorer. A typical use case is to match file paths:

>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
    ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

Known Ports

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

Comments
  • Incompatible License

    Incompatible License

    I see you've got issue #113 closed, however a simple reading of the GPL linked in the StringMatcher file doesn't just imply, but in fact definitively states, that your project must be licensed as GPL.

    1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.

    You have not copied the entire source verbatim, you've copied a select portion and incorporated it into a further derived work.

    2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:

    a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.

    b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.

    Emphasis added. When you decided to copy in a portion of the code from python-levenshtein you incidentally selected the GPL license for your project, and it's been apparently licensed as MIT / X11, but effectively and legally licensed as GPL, ever since.

    I say this as someone who has a company project that's now been copy-lefted by your code, albeit accidentally and inadvertently.

    opened by yawpitch 37
  • List supported python versions, bump minor

    List supported python versions, bump minor

    Now that #33 has been merged in, fuzzywuzzy looks good to go for python 3 compatibility. Might as well advertise this in the setup.py / on pypi. Version bumped so a new version can be pushed to pypi.

    Cheers!

    opened by JeffPaine 14
  • Remove query processing and default processor

    Remove query processing and default processor

    Added test case to check that using a processor of the form lambda x: x['key'] doesn't fail.

    Set default_processor to None and do not run processor on query.

    Remove test case that checked if string reduced to 0 by processor, as string will no longer be processed.

    Adjusted test cases in test_fuzzywuzzy_hypothesis to not use a processor, but not fail if scorer reduces query to empty string. If the user supplies a processor that modifies the choice so that it is no longer an exact match for the query, not finding an exact match would be the expected behavior.

    Saw some relevant discussion around issues #77 and #141 etc. If processor doesn't run on the query, which I feel it CAN NOT, then processor must also default to None to avoid unexpected behaviors.

    (also have a separate branch that only adds the new test case fwiw)

    opened by nol13 13
  • UserWarning: Using slow pure-python SequenceMatcher

    UserWarning: Using slow pure-python SequenceMatcher

    C:\Python27\lib\site-packages\fuzzywuzzy\fuzz.py:33: UserWarning: Using slow pure-python SequenceMatcher
      warnings.warn('Using slow pure-python SequenceMatcher')
    

    The line of code I have for importing is from fuzzywuzzy import fuzz.

    I'm running Python 2.7.8 on Windows 8, pip 1.5.6, and fuzzywuzzy 0.4.0.

    opened by sylvia43 13
  • Upload latest version to pypi

    Upload latest version to pypi

    Could we kindly upload the latest version to pypi (directions)?

    I'd like to use this library on a python 3 project where using pip install would be greatly appreciated :smile: Cheers!

    opened by JeffPaine 13
  • ImportError: cannot import name fuzz

    ImportError: cannot import name fuzz

    Below is more information.

    $> sudo pip install fuzzywuzzy   #works, no error
    
    $> vi mytest.py
    from fuzzywuzzy import fuzz
    from fuzzywuzzy import process
    
    $>python mytest.py
    ...
    ImportError: cannot import name fuzz
    
    opened by harishvc 12
  • Fuzzy install via pip in Conda environment,  python-Levenshtein warning

    Fuzzy install via pip in Conda environment, python-Levenshtein warning

    Hi...

    I'm new to Conda, and just re-installed fuzzywuzzy using conda version of pip. python-Levenshtein is also install according to conda.

    I'm using Python 3.3.

    I'm not sure what to do about this...

    Cheers !

    opened by dpcuneowcg 12
  • Fuzzywuzzy 4x slower *with* python-Levenshtein installed

    Fuzzywuzzy 4x slower *with* python-Levenshtein installed

    According to the intro page: "python-Levenshtein (optional, provides a 4-10x speedup in String Matching)"

    I am fuzzy matching paragraphs with their best match from a significant list of paragraphs. When I use fuzzywuzzy without python-Levenshtein on Ubuntu, I receive the error that I should install it for a speed-up, and it took about 45 mins on a test set of paragraphs:

    real 46m56.813s user 125m45.952s sys 0m0.432s

    When I then installed python-Levenshtein, the error went away as expected, but running the same example set of paragraphs took over 200 minutes:

    real 204m51.384s user 522m49.436s sys 0m21.376s

    The results are identical except for one paragraph, which does match better with the slower, python-Levenshtein version. While I haven't timed an example, doing the same on the mac also seems significantly slower with python-Levenshtein installed.

    The error message when python-Levenshtein is not installed says that the reason to install the package is for speed, so to me something isn't right: either it shouldn't be slower, or the error message should change to say that it should be installed for increased accuracy (if that is the case).

    needs more info 
    opened by Hooloovoo 11
  • Not python3 compliant

    Not python3 compliant

    I get the following error:

    File "/usr/lib/python3.4/site-packages/fuzzywuzzy/fuzz.py", line 49, in ratio
        s1, s2 = utils.make_type_consistent(s1, s2)
      File "/usr/lib/python3.4/site-packages/fuzzywuzzy/utils.py", line 43, in make_type_consistent
        elif isinstance(s1, unicode) and isinstance(s2, unicode):
    NameError: name 'unicode' is not defined
    
    opened by ashneo76 11
  • Support for other langauges

    Support for other langauges

    Hi, First of all, thanks for maintaining this. I just noticed that both token_sort_ratio and token_set_ratio don't support Arabic characters. I don't know about other non-English ones but at lease they don't support Arabic.. It's returning 0 as a result of comparing anything with Arabic string. Even if they were 2 Arabic strings..

    >>> print fuzz.token_sort_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم")
    0
    >>> print fuzz.partial_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم")
    100
    
    

    So I'm just wondering if this's a bug or it simply just doesn't support non-English characters? Thanks

    opened by tester88 10
  • Fix for Python 3.7

    Fix for Python 3.7

    Fixes #233

    According to PEP 479, if raise StopIteration occurs directly in a generator, simply replace it with return.

    This is both backwards and forwards compatible code.

    opened by hb-alexbotello 9
  • NameError: name 'ratio' is not defined

    NameError: name 'ratio' is not defined

    While running the Fuzzy Wuzzy process.extract() method the following error is thrown :-

        matches = fw_process.extract(
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:168: in extract
        return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
    /usr/lib/python3.8/heapq.py:563: in nlargest
        result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
    /usr/lib/python3.8/heapq.py:563: in <listcomp>
        result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:117: in extractWithoutOrder
        score = scorer(processed_query, processed)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:276: in WRatio
        base = ratio(p1, p2)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:38: in decorator
        return func(*args, **kwargs)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:29: in decorator
        return func(*args, **kwargs)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:47: in decorator
        return func(*args, **kwargs)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:28: in ratio
        return utils.intr(100 * m.ratio())
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    self = <fuzzywuzzy.StringMatcher.StringMatcher object at 0x7f115ab32160>
    
        def ratio(self):
            if not self._ratio:
    >           self._ratio = ratio(self._str1, self._str2)
    E           NameError: name 'ratio' is not defined
    
    
    

    I found that pinning the python-Levenshtein version to 0.12.2 solve the issue.

    opened by DivanshuTak 1
  • How to decrease False positive matches? (process.extract / WRatio)

    How to decrease False positive matches? (process.extract / WRatio)

    I am using process.extract method, And I know it uses WRatio under the hood for calculating score. Following is the case in which I am getting very high score of 90 despite the string hardly equal. Is there any way to fix this in WRatio?

    inp_name="america"
    
    name_list=["american Futures and Options Exchange"]
            
    process.extractOne(inp_name,name_list)
    
    

    Output--> ('american Futures and Options Exchange', 90.0, 0)

    PS: I know other alternatives likes fuzz.ratio, partial_ratio, token_sort_ratio. But WRatio works pretty well for my usecase. So any workaround for the same would be appreciated... Thanks!

    opened by Pranav082001 3
  • 'list' object has no attribute 'items'

    'list' object has no attribute 'items'

    When trying to get the data of such a string 'hello 𝙎𝙈𝙈 world' using token_set_ratio(), no problems arise, but there is an error when calling process.extract().

    If you remove the incomprehensible characters "SMM" from the line, then there is no error

    Example:

    strtest = 'hello 𝙎𝙈𝙈 world'
    stroka = "word"
    print(str(fuzz.token_set_ratio(stroka, strtest))) # OK
    for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
        pass
    

    Error:

    Traceback (most recent call last):
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 108, in extractWithoutOrder
        for key, choice in choices.items():
    AttributeError: 'list' object has no attribute 'items'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "C:\Users\Alexey\Documents\fuzzywuzzy\index.py", line 191, in <module>
        for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 168, in extract
        return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\heapq.py", line 531, in nlargest
        result = max(it, default=sentinel, key=key)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 117, in extractWithoutOrder
        score = scorer(processed_query, processed)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 288, in WRatio
        partial = partial_ratio(p1, p2) * partial_scale
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 38, in decorator
        return func(*args, **kwargs)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 29, in decorator
        return func(*args, **kwargs)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 47, in decorator
        return func(*args, **kwargs)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 47, in partial_ratio
        blocks = m.get_matching_blocks()
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\StringMatcher.py", line 58, in get_matching_blocks
        self._matching_blocks = matching_blocks(self.get_opcodes(),
    ValueError: apply_edit edit operations are invalid or inapplicable
    
    opened by syfulin 0
  • Removed a manual file handler pitfall

    Removed a manual file handler pitfall

    The problem There was a case where the code was using a manual file handler pitfall, where a file stream was being opened and closed manually. But since Python supports automatic stream closing using the block 'with', its better to use it instead of the manual close in order to remove a bug vector.

    Solution Refactored the code to remove the manual file handler

    opened by NaelsonDouglas 0
  • token_set_ratio Degenerate Case

    token_set_ratio Degenerate Case

    Referring to the description of token_set_ratio in the original blog post: if the SORTED_INTERSECTION is a strict subset of STRING2, the result ratio will be 100. E.g.,

    fuzz.token_set_ratio("Deep Learning", "Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2")
    

    yields 100. This is patently incorrect, and does not uphold the purported intuition ("because the SORTED_INTERSECTION component is always exactly the same, the scores increase when (a) that makes up a larger percentage of the full string, and (b) the string remainders are more similar").

    Looking at fuzz._token_set, we see that it returns

    max(
        [
            ratio_func(sorted_sect, combined_1to2),
            ratio_func(sorted_sect, combined_2to1),
            ratio_func(combined_1to2, combined_2to1)
        ]
    )
    

    It appears the assumption is that the string remainder will never be empty. Perhaps something like this is more appropriate:

    max(
        [
            0 if sorted_sect == combined_1to2 else ratio_func(sorted_sect, combined_1to2),
            0 if sorted_sect == combined_2to1 else ratio_func(sorted_sect, combined_2to1),
            ratio_func(combined_1to2, combined_2to1)
        ]
    )
    
    opened by rogerrohrbach 0
  • Mark repository as archived

    Mark repository as archived

    since this repo seems to haven been depercated I suggest to mark it as read-only in the settings. This also displays a banner on top of the page, which may be even easier to catch than the readme …

    opened by Bibo-Joshi 0
Releases(0.18.0)
  • 0.16.0(Dec 18, 2017)

    • Add punctuation characters back in so process does something. [davidcellis]

    • Simpler alphabet and even fewer examples. [davidcellis]

    • Fewer examples and larger deadlines for Hypothesis. [davidcellis]

    • Slightly more examples. [davidcellis]

    • Attempt to fix the failing 2.7 and 3.6 python tests. [davidcellis]

    • Readme: add link to C++ port. [Lizard]

    • Fix tests on Python 3.3. [Jon Banafato]

      Modify tox.ini and .travis.yml to install enum34 when running with Python 3.3 to allow hypothesis tests to pass.

    • Normalize Python versions. [Jon Banafato]

      • Enable Travis-CI tests for Python 3.6
      • Enable tests for all supported Python versions in tox.ini
      • Add Trove classifiers for Python 3.4 - 3.6 to setup.py

      Note: Python 2.6 and 3.3 are no longer supported by the Python core team. Support for these can likely be dropped, but that's out of scope for this change set.

    • Fix typos. [Sven-Hendrik Haase]

    Source code(tar.gz)
    Source code(zip)
  • 0.15.1(Sep 21, 2017)

    • Fix setup.py (addresses #155) [Paul O'Leary McCann]

    • Merge remote-tracking branch 'upstream/master' into extract_optimizations. [nolan]

    • Seed random before generating benchmark strings. [nolan]

    • Cleaner implementation of same idea without new param, but adding existing full_process param to Q,W,UQ,UW. [nolan]

    • Fix benchmark only generate list once. [nolan]

    • Only run util.full_process once on query when using extract functions, add new benchmarks. [nolan]

    Source code(tar.gz)
    Source code(zip)
  • 0.15.0(Sep 21, 2017)

    • Add extras require to install python-levenshtein optionally. [Rolando Espinoza]

      This allows to install python-levenshtein as dependency.

    • Fix link formatting in the README. [Alex Chan]

    • Add fuzzball.js JavaScript port link. [nolan]

    • Added Rust Port link. [Logan Collins]

    • Validate_string docstring. [davidcellis]

    • For full comparisons test that ONLY exact matches (after processing) are added. [davidcellis]

    • Add detailed docstrings to WRatio and QRatio comparisons. [davidcellis]

    Source code(tar.gz)
    Source code(zip)
  • 0.14.0(Feb 20, 2017)

    • Possible PEP-8 fix + make pep-8 warnings appear in test. [davidcellis]
    • Possible PEP-8 fix. [davidcellis]
    • Possible PEP-8 fix. [davidcellis]
    • Test for stderr log instead of warning. [davidcellis]
    • Convert warning.warn to logging.warning. [davidcellis]
    • Additional details for empty string warning from process. [davidcellis]
    • Enclose warnings.simplefilter() inside a with statement. [samkennerly]
    Source code(tar.gz)
    Source code(zip)
  • 0.13.0(Feb 20, 2017)

    • Support alternate git status output. [Jose Diaz-Gonzalez]
    • Split warning test into new test file, added to travis execution on 2.6 / pypy3. [davidcellis]
    • Remove hypothesis examples database from gitignore. [davidcellis]
    • Add check for warning to tests. [davidcellis]
    • Check processor and warn before scorer may remove processor. [davidcellis]
    • Renamed test - tidied docstring. [davidcellis]
    • Add token ratios to the list of scorers that skip running full_process as a processor. [davidcellis]
    • Added tokex_sort, token_set to test. [davidcellis]
    • Test docstrings/comments. [davidcellis]
    • Added py.test .cache/ removed duplicated build from gitignore. [davidcellis]
    • Added default_scorer, default_processor parameters to make it easier to change in the future. [davidcellis]
    • Rewrote extracts to explicitly use default values for processor and scorer. [davidcellis]
    • Changed Hypothesis tests to use pytest parameters. [davidcellis]
    • Added Hypothesis based tests for identical strings. [Ducksual]
    • Added test for simple 'a, b' string on process.extractOne. [Ducksual]
    • Process the query in process.extractWithoutOrder when using a scorer which does not do so. [Ducksual]
    • Mention that difflib and levenshtein results may differ. [Jose Diaz-Gonzalez]
    Source code(tar.gz)
    Source code(zip)
  • 0.12.0(Sep 14, 2016)

  • 0.11.1(Sep 14, 2016)

    • Add editorconfig. [Jose Diaz-Gonzalez]
    • Added tox.ini cofig file for easy local multi-environment testing changed travis config to use py.test like tox updated use of pep8 module to pycodestyle. [Pedro Rodrigues]
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Jun 30, 2016)

    • Clean-up. [desmaisons_david]

    • Improving performance. [desmaisons_david]

    • Performance Improvement. [desmaisons_david]

    • Fix link to Levenshtein. [Brian J. McGuirk]

    • Fix readme links. [Brian J. McGuirk]

    • Add license to StringMatcher.py. [Jose Diaz-Gonzalez]

      Closes #113

    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Jun 30, 2016)

  • 0.9.0(Jun 30, 2016)

  • 0.8.2(Jun 30, 2016)

  • 0.8.1(Jun 30, 2016)

  • 0.8.0(Nov 16, 2015)

    • Refer to Levenshtein distance in readme. Closes #88. [Jose Diaz-Gonzalez]

    • Added install step for travis to have pep8 available. [Pedro Rodrigues]

    • Added a pep8 test. The way I add the error 501 to the ignore tuple is probably wrong but from the docs and source code of pep8 I could not find any other way. [Pedro Rodrigues]

      I also went ahead and removed the pep8 call from the release file.

    • Added python 3.5, pypy, and ypyp3 to the travis config file. [Pedro Rodrigues]

    • Added another step to the release file to run the tests before releasing. [Pedro Rodrigues]

    • Fixed a few pep8 errors Added a verification step in the release automation file. This step should probably be somewhere at git level. [Pedro Rodrigues]

    • Pep8. [Pedro Rodrigues]

    • Leaving TODOs in the code was never a good idea. [Pedro Rodrigues]

    • Changed return values to be rounded integers. [Pedro Rodrigues]

    • Added a test with the recovered data file. [Pedro Rodrigues]

    • Recovered titledata.csv. [Pedro Rodrigues]

    • Move extract test methods into the process test. [Shale Craig]

      Somehow, they ended up in the RatioTest, despite asserting that the ProcessTest works.

    Source code(tar.gz)
    Source code(zip)
  • 0.7.0(Oct 2, 2015)

    • Use portable syntax for catching exception on tests. [Luis Madrigal]

    • [Fix] test against correct variable. [Luis Madrigal]

    • Add unit tests for validator decorators. [Luis Madrigal]

    • Move validators to decorator functions. [Luis Madrigal]

      This allows easier composition and IMO makes the functions more readable

    • Fix typo: dictionery -> dictionary. [shale]

    • FizzyWuzzy -> FuzzyWuzzy typo correction. [shale]

    • Add check for gitchangelog. [Jose Diaz-Gonzalez]

    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Sep 3, 2015)

  • 0.6.1(Sep 3, 2015)

  • 0.6.0(Jul 20, 2015)

    • Added link to a java port. [Andriy Burkov]

    • Patched "name 'unicode' is not defined" python3. [Carlos Garay]

      https://github.com/seatgeek/fuzzywuzzy/issues/80

    • Make process.extract accept {dict, list}-like choices. [Nathan Typanski]

      Previously, process.extract expected lists or dictionaries, and tested this with isinstance() calls. In keeping with the spirit of Python (duck typing and all that), this change enables one to use extract() on any dict-like object for dict-like results, or any list-like object for list-like results.

      So now we can (and, indeed, I've added tests for these uses) call extract() on things like:

      • a generator of strings ("any iterable")
      • a UserDict
      • custom user-made classes that "look like" dicts (or, really, anything with a .items() method that behaves like a dict)
      • plain old lists and dicts

      The behavior is exactly the same for previous use cases of lists-and-dicts.

      This change goes along nicely with PR #68, since those docs suggest dict-like behavior is valid, and this change makes that true.

    • Merge conflict. [Adam Cohen]

    • Improve docs for fuzzywuzzy.process. [Nathan Typanski]

      The documentation for this module was dated and sometimes inaccurate. This overhauls the docs to accurately describe the current module, including detailing optional arguments that were not previously explained - e.g., limit argument to extract().

      This change follows the Google Python Style Guide, which may be found at:

      https://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Comments#Comments

    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 20, 2015)

    • FIX: 0.4.0 is released, no need to specify 0.3.1 in README. [Josh Warner (Mac)]

    • Fixed a small typo. [Rostislav Semenov]

    • Reset processor and scorer defaults to None with argument checking. [foxxyz]

    • Catch generators without lengths. [Jeremiah Lowin]

    • Fixed python3 issue and deprecated assertion method. [foxxyz]

    • Fixed some docstrings, typos, python3 string method compatibility, some errors that crept in during rebase. [foxxyz]

    • [mod] The lamdba in extract is not needed. [Olivier Le Thanh Duong]

      [mod] Pass directly the defaults functions in the args

      [mod] itertools.takewhile() can handle empty list just fine no need to test for it

      [mod] Shorten extractOne by removing double if

      [mod] Use a list comprehention in extract()

      [mod] Autopep8 on process.py

      [doc] Document make_type_consistent

      [mod] bad_chars shortened

      [enh] Move regex compilation outside the method, otherwhise we don't get the benefit from it

      [mod] Don't need all the blah just to redefine method from string module

      [mod] Remove unused import

      [mod] Autopep8 on string_processing.py

      [mod] Rewrote asciidammit without recursion to make it more readable

      [mod] Autopep8 on utils.py

      [mod] Remove unused import

      [doc] Add some doc to fuzz.py

      [mod] Move the code to sort string in a separate function

      [doc] Docstrings for WRatio, UWRatio

    • Add note on which package to install. Closes #67. [Jose Diaz-Gonzalez]

    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Oct 31, 2014)

    • Merge pull request #64 from ojomio/master. [Jose Diaz-Gonzalez]

      In extarctBests() and extractOne() use '>=' instead of '>'

    • Merge pull request #62 from ojomio/master. [Jose Diaz-Gonzalez]

      Fixed python3 issue with SequenceMatcher import

    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Oct 22, 2014)

    • Update release script to make it more generic. [Jose Diaz-Gonzalez]
    • Merge pull request #60 from ojomio/master. [Jose Diaz-Gonzalez] Fixed issue #59 - "partial" parameter for _token_set() is now honored
    • Merge pull request #54 from jlowin/patch-1. [Jose Diaz-Gonzalez] Remove explicit check for lists
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(Sep 12, 2014)

    • Make release command an executable. [Jose Diaz-Gonzalez]
    • Simplify MANIFEST.in. [Jose Diaz-Gonzalez]
    • Add a release script. [Jose Diaz-Gonzalez]
    • Fix readme codeblock. [Jose Diaz-Gonzalez]
    • Minor formatting. [Jose Diaz-Gonzalez]
    • Update readme with proper installation notes. [Jose Diaz-Gonzalez]
    • Use version from fuzzywuzzy package. [Jose Diaz-Gonzalez]
    • Set version constant in init.py. [Jose Diaz-Gonzalez]
    • Update setup.py. [Jose Diaz-Gonzalez]
    • Rename LICENSE to LICENSE.txt. [Jose Diaz-Gonzalez]
    • Update packaging a bit. [Jose Diaz-Gonzalez]
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Aug 24, 2014)

    • Allow choices to be a list or dict
    • Add testing for 3.4
    • Typo updates
    • Update readme, change formatting to RST
    • Fix package requirements
    • PEP8!
    Source code(tar.gz)
    Source code(zip)
Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

cdvpp Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files Reads a Digital Vaccination Pass PDF file as in

Esteban Borai 1 Nov 17, 2021
A non-validating SQL parser module for Python

python-sqlparse - Parse SQL statements sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting S

Andi Albrecht 3.1k Jan 04, 2023
🚩 A simple and clean python banner generator - Banners

🚩 A simple and clean python banner generator - Banners

Kumar Vicku 12 Oct 09, 2022
A pipeline for making highlighted text stand-alone.

title emoji colorFrom colorTo sdk app_file pinned decontextualizer 📤 green gray streamlit main.py false Decontextualizer As a second step in improvin

Paul Bricman 26 Dec 17, 2022
This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorithm to summarize documents and FastAPI for the framework.

Indonesian Text Summarization Using FastAPI This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorit

Viqi Nurhaqiqi 2 Nov 03, 2022
Text to ASCII and ASCII to text

Text2ASCII Description This python script (converter.py) contains two functions: encode() is used to return a list of Integer, one item per character

4 Jan 22, 2022
A program that looks through entered text and replaces certain commands with mathematical symbols

TextToSymbolConverter A program that looks through entered text and replaces certain commands with mathematical symbols Example: Syntax: Enter text in

1 Jan 02, 2022
Microsoft's Cascadia Code font customized to my liking.

Microsoft's Cascadia Code font customized to my liking. Also includes some simple batch patch and bake scripts to batch patch glyphs and bake font features into fonts!

Frederik List 3 Jan 29, 2022
This project is a small tool for processing url-containing texts delivered by HUAWEI Share on Windows.

hwshare_helper This project is a small tool for handling url-containing texts delivered by HUAWEI Share on Windows. config Before use, please install

1 Jan 19, 2022
Widevine KEY Extractor in Python

Widevine Client 3 This was originally written by T3rry7f. This repo is slightly modified version of his repo. This only works on standard Windows! Usa

Vank0n (SJJeon) 68 Dec 29, 2022
A python tool one can extract the "hash" from a WINDOWS HELLO PIN

WINHELLO2hashcat About With this tool one can extract the "hash" from a WINDOWS HELLO PIN. This hash can be cracked with Hashcat, more precisely with

33 Dec 05, 2022
Hamming code generation, error detection & correction.

Hamming code generation, error detection & correction.

Farhan Bin Amin 2 Jun 30, 2022
Translate .sbv subtitle files

deepl4subtitle Deeplを使って字幕ファイル(.sbv)を翻訳します。タイムスタンプも含めて出力しますが、翻訳時はタイムスタンプは文の一部とは切り離されるので、.sbvファイルをそのまま翻訳機に突っ込むよりも高精度な翻訳ができるはずです。 つかいかた 入力する.sbvファイルの前処理

Yasunori Toshimitsu 1 Oct 20, 2021
Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes.

Hotpotato Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes. It is a fullstack React App made with a Redux st

Nico G Pierson 13 Nov 05, 2021
A neat little program to read the text from the "All Ten Fingers" program, and write them back.

ATFTyper A neat little program to read the text from the "All Ten Fingers" program, and write them back. How does it work? This program uses the Pillo

1 Nov 26, 2021
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 02, 2023
Auto translate Localizable.strings for multiple languages in Xcode

auto_localize Auto translate Localizable.strings for multiple languages in Xcode Usage put your origin Localizable.strings file in folder pip3 install

Wesley Zhang 13 Nov 22, 2022
A minimal code sceleton for a textadveture parser written in python.

Textadventure sceleton written in python Use with a map file generated on https://www.trizbort.io Use the following Sockets for walking directions: n

1 Jan 06, 2022
JSON and CSV data for Swahili dictionary with over 16600+ words

kamusi JSON and CSV data for swahili dictionary with over 16600+ words. This repo consists of data from swahili dictionary with about 16683 words toge

Jordan Kalebu 8 Jan 13, 2022
AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

Niklas 29 Dec 20, 2022