Augmenty is an augmentation library based on spaCy for augmenting texts.

Overview

Augmenty: The cherry on top of your NLP pipeline

PyPI version python version Code style: black github actions pytest github actions docs github coverage CodeFactor Streamlit App pip downloads

Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of highly flexible augmenters, Augmenty provides a series of tools for working with augmenters, including combining and moderating augmenters. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the assigned labels under the augmentation, thus making many of the augmenters valid for training more than simply sentence classification.

🔧 Installation

To get started using augmenty simply install it using pip by running the following line in your terminal:

pip install augmenty

Do note that this is a minimal installation. As some augmenters requires additional packages please write the following line to install all dependencies.

pip install augmenty[all]

For more detailed instructions on installing augmenty, including specific language support, see the installation instructions.

🍒 Simple Example

The following shows a simple example of how you can quickly augment text using Augmenty. For more on using augmenty see the usage guides.

import spacy
import augmenty

nlp = spacy.load("en_core_web_sm")

docs = nlp.pipe(["Augmenty is a great tool for text augmentation"])

entity_augmenter = augmenty.load("ents_replace.v1", 
                                 ent_dict = {{"ORG": [["spaCy"], ["spaCy", "Universe"]]})

for doc in augmenty.docs(docs, augmenter=entity_augmenter)
    print(doc)
spaCy Universe is a great tool for text augmentation.

📖 Documentation

Documentation
📚 Usage Guides Guides and instruction on how to use augmenty and its features.
📰 News and changelog New additions, changes and version history.
🎛 API References The detailed reference for augmenty's API. Including function documentation
🍒 Augmenters Contains a full list of current augmenters in augmenty.
😎 Demo A simple streamlit demo to try out the augmenters.

💬 Where to ask questions

Type
🚨 Bug Reports GitHub Issue Tracker
🎁 Feature Requests & Ideas GitHub Issue Tracker
👩‍💻 Usage Questions GitHub Discussions
🗯 General Discussion GitHub Discussions
🍒 Adding an Augmenter Adding an augmenter

🤔 FAQ

How do I test the code and run the test suite?

augmenty comes with an extensive test suite. In order to run the tests, you'll usually want to clone the repository and build augmenty from the source. This will also install the required development dependencies and test utilities defined in the requirements.txt.

pip install -r requirements.txt
pip install pytest

python -m pytest

which will run all the test in the augmenty/tests folder.

Specific tests can be run using:

python -m pytest augmenty/tests/test_docs.py

Code Coverage If you want to check code coverage you can run the following:

pip install pytest-cov

python -m pytest --cov=.

Does augmenty run on X?

augmenty is intended to run on all major OS, this includes Windows (latest version), MacOS (Catalina) and the latest version of Linux (Ubuntu). Below you can see if augmenty passes its test suite for the system of interest. Please note these are only the systems augmenty is being actively tested on, if you run on a similar system (e.g. an earlier version of Linux) augmenty will likely run there as well, if not please create an issue.

Operating System Status
Ubuntu/Linux (Latest) github actions pytest ubuntu
MacOS (Catalina) github actions pytest catalina
Windows (Latest) github actions pytest windows

How is the documentation generated?

augmenty uses sphinx to generate documentation. It uses the Furo theme with a custom styling.

To make the documentation you can run:

# install sphinx, themes and extensions
pip install sphinx furo sphinx-copybutton sphinxext-opengraph

# generate html from documentations

make -C docs html

Many of these augmenters are completely useless for training?

That is true, some of the augmenters are rarely something you would augment with during training. For instance randomly adding or removing spacing. However, augmentation can just as well be used to test whether a model is robust to certain variations.


Can I use augmenty without using spacy?

Indeed augmenty contains convenience functions for applying augmentation directly to raw texts. Check out the getting started guide to learn how.


🎓 Citing this work

If you use this library in your research, please cite:

@inproceedings{augmenty2021,
    title={Augmenty, the cherry on top of your NLP pipeline},
    author={Enevoldsen, Kenneth and Hansen, Lasse},
    year={2021}
}
Comments
  • Use of augmenty with spacy config files for training

    Use of augmenty with spacy config files for training

    I didn't see any documentation on how to import these augmenters when using spacy 3.0's config and command line system when training. Is it possible to use it in this sense? If so, how?

    apon further review, for the command line to register new augmentations, the flag: -- code <code.py> Needs to be set when calling the training. I have tried to point to the specific file that contains the keystroke aug that I wanted but it complains about not knowing a parent for relative imports. I also tried the various init.py files but it complained also. It seems to work when you take the code out and place it in a new file without relative imports and point to that.

    image

    Which page or section is this issue related to?

    https://spacy.io/usage/training#data-augmentation-custom

    https://kennethenevoldsen.github.io/augmenty/tutorials/introduction.html#Applying-the-augmentation

    documentation 
    opened by Giles-Billenness 3
  • Added sententence_subset.v1 augmenter following #48

    Added sententence_subset.v1 augmenter following #48

    Following #48, Added the sententence_subset.v1 augmenter which subsamples sentences from a document:

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    text = """Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool
    for obtaining higher performance on limited data. You can also use it to see how
    robust your model is to changes. It will sample subset of the paragraf."""
    docs = nlp(text)
    
    augmenter = augmenty.load("sententence_subset.v1",  respect_sentences = True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Missing:

    • [ ] Add tests
    • [ ] Add documentation
    opened by KennethEnevoldsen 3
  • Paragraf subset augmenter

    Paragraf subset augmenter

    A paragraf subset augmentation which can work on token and sentence level. It will sample a random percentage of included coherent tokens/sentences and a random token/sentence start position ensuring the former constraint is maintained. The augmenter needs to handle annotated entities and avoid breaking them.

    Input arguments: level: how often to apply augmenter min_paragraf: Minimum percentage of tokens or sentences to include. Ie. 4 sentences with min_paragraf=0.5 means it as a minimum includes 2 sentences. sentence_level: Boolean to define if token or sentence level to define

    Example - sentence level

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    texts = [
        "Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool"
        "for obtaining higher performance on limited data. You can also use it to see how "
        "robust your model is to changes. It will sample subset of the paragraf.",
    ]
    docs = nlp(texts)
    
    augmenter = augmenty.load("paragraf_subset.v1", level=1.0, min_paragraf=0.5, sentence_level=True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Example outputs:

    The first section:

    Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool 
    for obtaining higher performance on limited data.
    

    The middle section:

    Augmentation is a wonderful tool for obtaining higher performance on limited data. 
    You can also use it to see how robust your model is to changes.
    

    The middle section:

    You can also use it to see how robust your model is to changes. It will sample subset 
    of the paragraf.
    

    Additional thoughts:

    Possibly addition of a reverse augmenter, eg. removing a coherent section of tokens/sentences.

    additional augmenter 
    opened by martincjespersen 3
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    Bumps MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 2
  • :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    Updates the requirements on pydantic to permit the latest version.

    Release notes

    Sourced from pydantic's releases.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy
    • validate_arguments now supports extra customization (used to always be Extra.forbid), #3161 by @​PrettyWood

    ... (truncated)

    Changelog

    Sourced from pydantic's changelog.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy

    ... (truncated)

    Commits
    • fbf8002 prepare for v1.9.0 release, extra change
    • 5406423 prepare for v1.9.0 release
    • 87da9ac apply update_forward_refs to json_encoders (#3595)
    • 6f26a1c Support mypy 0.910 to 0.930 including CI tests (#3594)
    • 8ef492b build(deps): bump mypy from 0.920 to 0.930 (#3573)
    • 2d3d266 remove failing release step
    • ef46789 add step to upload pypi files to release
    • 5d6f48c prepare for v1.9.0a2
    • e882277 fix: support generic models with discriminated union (#3551)
    • edad0db fix: keep old behaviour of json() by default (#3542)
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 2
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    Bumps MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Support GitHub enterprise urls

    What's Changed

    New Contributors

    Full Changelog: https://github.com/MishaKav/pytest-coverage-comment/compare/v1.1.39...v1.1.40

    Changelog

    Sourced from MishaKav/pytest-coverage-comment's changelog.

    Pytest Coverage Comment 1.1.40

    Release Date: 2022-12-03

    Changes

    • Support for url for github enterprise repositories, thanks to @​jbcumming for contribution
    • Minor readme improvements, thanks to @​AlexanderLanin for contribution
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    Bumps MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Remove link on badge

    add option to remove link on badge

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    Bumps actions/setup-python from 3 to 4.1.0.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.1.0

    In scope of this pull request we updated actions/cache package as the new version contains fixes for caching error handling. Moreover, we added a new input update-environment. This option allows to specify if the action shall update environment variables (default) or not.

    Update-environment input

        - name: setup-python 3.9
          uses: actions/[email protected]
          with:
            python-version: 3.9
            update-environment: false
    

    Besides, we added such changes as:

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405

    ... (truncated)

    Commits
    • c4e89fa Improve readme for 3.x and 3.11-dev style python-version (#441)
    • 0ad0f6a Merge pull request #452 from mayeut/fix-env
    • f0bcf8b Merge pull request #456 from akx/patch-1
    • af97157 doc: Add multiple wildcards example to readme
    • 364e819 Merge pull request #394 from akv-platform/v-sedoli/set-env-by-default
    • 782f81b Merge pull request #450 from IvanZosimov/ResolveVersionFix
    • 2c9de4e Remove duplicate code introduced in #440
    • 412091c Fix tests for update-environment==false
    • 78a2330 Merge pull request #451 from dmitry-shibanov/fx-pipenv-python-version
    • 96f494e trigger checks
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4

    :arrow_up: Bump actions/setup-python from 3 to 4

    Bumps actions/setup-python from 3 to 4.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/[email protected]
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405 python-path output contains Python executable path.

    • Updated zeit/ncc to vercel/ncc package: #393

    • Bugfix: fixed output for prerelease version of poetry: #409

    • Made pythonLocation environment variable consistent for Python and PyPy: #418

    • Bugfix for 3.x-dev syntax: #417

    • Other improvements: #318 #396 #384 #387 #388

    Update actions/cache version to 2.0.2

    In scope of this release we updated actions/cache package as the new version contains fixes related to GHES 3.5 (actions/setup-python#382)

    Add "cache-hit" output and fix "python-version" output for PyPy

    This release introduces new output cache-hit (actions/setup-python#373) and fix python-version output for PyPy (actions/setup-python#365)

    The cache-hit output contains boolean value indicating that an exact match was found for the key. It shows that the action uses already existing cache or not. The output is available only if cache is enabled.

    ... (truncated)

    Commits
    • d09bd5e fix: 3.x-dev can install a 3.y version (#417)
    • f72db17 Made env.var pythonLocation consistent for Python and PyPy (#418)
    • 53e1529 add support for python-version-file (#336)
    • 3f82819 Fix output for prerelease version of poetry (#409)
    • 397252c Update zeit/ncc to vercel/ncc (#393)
    • de977ad Merge pull request #412 from vsafonkin/v-vsafonkin/fix-poetry-cache-test
    • 22c6af9 Change PyPy version to rebuild cache
    • 081a3cf Merge pull request #405 from mayeut/interpreter-path
    • ff70656 feature: add a python-path output
    • fff15a2 Use pypyX.Y for PyPy python-version input (#349)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • Sample fake entities for entity augmenter using Faker package

    Sample fake entities for entity augmenter using Faker package

    Add sampling of entities (such as names or adresses) from https://faker.readthedocs.io/en/master/locales/da_DK.html. This tool supports random sampling of entities for numerous of languages.

    enhancement help wanted 
    opened by martincjespersen 1
  • implement an oversampling function

    implement an oversampling function

    Augmentation can be used to oversample a category.

    Imagined usage would look something like this:

    aug = augmenty.load(...)
    
    def is_positive(example):
        """return true if the example contains an entity"""
        if example.y.cats["positive"] == 1:
            return True
        return False
    
    upsampled_corpus = augumenty.oversample(corpus, augmenter=aug, conditional=is_positive, n=1000)
    
    enhancement 
    opened by KennethEnevoldsen 0
  • Back translation augmentation

    Back translation augmentation

    Augmenting of a document using back translation of various languages e.g., using huggingface models: https://huggingface.co/models?pipeline_tag=translation.

    Example blog: https://dzlab.github.io/dltips/en/pytorch/text-augmentation/

    Example sentence: Augmenty is an augmentation library based on spaCy for augmenting texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence and document labels under the augmentation.

    English -> Danish (Google): Augmenty er et udvidelsesbibliotek baseret på spaCy til forstørrelse af tekster. Augmenty adskiller sig fra andre augmentationsbiblioteker ved, at den korrigerer (så vidt muligt) token-, sætnings- og dokumentetiketterne under augmentationen.

    Danish -> English (Google): Augmenty is an extension library based on spaCy for enlarging texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence, and document labels during augmentation.

    additional augmenter 
    opened by martincjespersen 1
  • List of potentially new augmenters

    List of potentially new augmenters

    The following is a list of potentially new augmenters. If you wish a specific augmenter to be added before others please update the issue corresponding to the augmenter (if it doesn't have one feel free to create one).

    A variation of existing augmenters:

    New augmenters

    Batch augmenters

    A combination of existing augmenters

    • [ ] EDA augmenter following the EDA paper
    additional augmenter 
    opened by KennethEnevoldsen 0
Releases(v1.0.1)
  • v1.0.1(Jun 21, 2022)

    Version

    What's Changed

    • Version 1.0.0 by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/50
    • Update replace.py by @koaning in https://github.com/KennethEnevoldsen/augmenty/pull/51

    Documentation updates

    • added faker based on PR by @martincjespersen by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/85
    • Added pre-config workflows by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/86

    New Contributors

    • @dependabot made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/46
    • @koaning made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/51
    • @martincjespersen

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/compare/v.0.0.12...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v.0.0.12(Feb 7, 2022)

    0.0.12 (03/08/21)

    • Many bugfixes
    • Added a few more augmenters
    • Notable updates to the documentation of the package

    0.0.1 (03/08/21)

    • First version of augmenty launches 🎉
      • with more than 15 highly customizable augmenters,
      • A high-quality code-base (coverage of 96% and a codefactor A),
      • and utilities for easy application of augmenters to strings and spaCy Docs.
      • Furthermore, it also includes a series of convenience functions for combining and moderating augmentations.

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/commits/v.0.0.12

    Source code(tar.gz)
    Source code(zip)
Owner
Kenneth Enevoldsen
Interdisciplinary PhD Student on representation learning in Clinical NLP and Genetics at Aarhus University and Interacting Minds Centre
Kenneth Enevoldsen
Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

farisalasmary 65 Sep 21, 2022
Simple translation demo showcasing our headliner package.

Headliner Demo This is a demo showcasing our Headliner package. In particular, we trained a simple seq2seq model on an English-German dataset. We didn

Axel Springer News Media & Tech GmbH & Co. KG - Ideas Engineering 16 Nov 24, 2022
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

LightSpeech UnOfficial PyTorch implementation of LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search.

Rishikesh (ऋषिकेश) 54 Dec 03, 2022
문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Namuwiki corpus 문장단위로 미리 분절된 나무위키 코퍼스. 목적이 LM등에서 사용하기 위한 데이터셋이라, 링크/이미지/테이블 등등이 잘려있습니다. 문장 단위 분절은 kss를 활용하였습니다. 라이선스는 나무위키에 명시된 바와 같이 CC BY-NC-SA 2.0

Jeong Ukjae 16 Apr 02, 2022
Grover is a model for Neural Fake News -- both generation and detectio

Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks.

Rowan Zellers 856 Dec 24, 2022
Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (

jawahar 20 Apr 30, 2022
This repository describes our reproducible framework for assessing self-supervised representation learning from speech

LeBenchmark: a reproducible framework for assessing SSL from speech Self-Supervised Learning (SSL) using huge unlabeled data has been successfully exp

49 Aug 24, 2022
🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

Explosion 1.5k Dec 25, 2022
Poetry PEP 517 Build Backend & Core Utilities

Poetry Core A PEP 517 build backend implementation developed for Poetry. This project is intended to be a light weight, fully compliant, self-containe

Poetry 293 Jan 02, 2023
Scene Text Retrieval via Joint Text Detection and Similarity Learning

This is the code of "Scene Text Retrieval via Joint Text Detection and Similarity Learning". For more details, please refer to our CVPR2021 paper.

79 Nov 29, 2022
Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Mask-Align: Self-Supervised Neural Word Alignment This is the implementation of our work Mask-Align: Self-Supervised Neural Word Alignment. @inproceed

THUNLP-MT 46 Dec 15, 2022
MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Microsoft 1.1k Dec 17, 2022
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen 🐐 Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
Transformers and related deep network architectures are summarized and implemented here.

Transformers: from NLP to CV This is a practical introduction to Transformers from Natural Language Processing (NLP) to Computer Vision (CV) Introduct

Ibrahim Sobh 138 Dec 27, 2022
Türkçe küfürlü içerikleri bulan bir yapay zeka kütüphanesi / An ML library for profanity detection in Turkish sentences

"Kötü söz sahibine aittir." -Anonim Nedir? sinkaf uygunsuz yorumların bulunmasını sağlayan bir python kütüphanesidir. Farkı nedir? Diğer algoritmalard

KaraGoz 4 Feb 18, 2022
Spam filtering made easy for you

spammy Author: Tasdik Rahman Latest version: 1.0.3 Contents 1 Overview 2 Features 3 Example 3.1 Accuracy of the classifier 4 Installation 4.1 Upgradin

Tasdik Rahman 137 Dec 18, 2022
Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

BiQQLSTM_HS Code and data for paper: Title: An Effective, Robust and Fairness-awareHate Speech Detection Framework. Authors: Guanyi Mou and Kyumin Lee

Guanyi Mou 2 Dec 27, 2022
Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

Tao Lei 14 Dec 12, 2022
A python gui program to generate reddit text to speech videos from the id of any post.

Reddit text to speech generator A python gui program to generate reddit text to speech videos from the id of any post. Current functionality Generate

Aadvik 17 Dec 19, 2022
Journey is a NLP-Powered Developer assistant

Journey Journey is a NLP-Powered Developer assistant Using on the powerful Natural Language Processing library Mindmeld, this projects aims to assist

Christian Eilers 21 Dec 11, 2022