Bidirectionally transformed strings

Overview

bistring

Build status Documentation status

The bistring library provides non-destructive versions of common string processing operations like normalization, case folding, and find/replace. Each bistring remembers the original string, and how its substrings map to substrings of the modified version.

For example:

>>> from bistring import bistr
>>> s = bistr('π•Ώπ–π–Š π––π–šπ–Žπ–ˆπ–, π–‡π–—π–”π–œπ–“ 🦊 π–π–šπ–’π–•π–˜ π–”π–›π–Šπ–— π–™π–π–Š π–‘π–†π–Ÿπ–ž 🐢')
>>> s = s.normalize('NFKD')     # Unicode normalization
>>> s = s.casefold()            # Case-insensitivity
>>> s = s.replace('🦊', 'fox')  # Replace emoji with text
>>> s = s.replace('🐢', 'dog')
>>> s = s.sub(r'[^\w\s]+', '')  # Strip everything but letters and spaces
>>> s = s[:19]                  # Extract a substring
>>> s.modified                  # The modified substring, after changes
'the quick brown fox'
>>> s.original                  # The original substring, before changes
'π•Ώπ–π–Š π––π–šπ–Žπ–ˆπ–, π–‡π–—π–”π–œπ–“ 🦊'

Languages

PyPI version npm version

bistring is available in multiple languages, currently Python and JavaScript/TypeScript. Ports to other languages are planned for the near future.

The code is structured similarly in each language to make it easy to share algorithms, tests, and fixes between them. The main differences come from trying to mirror the language's built-in string API. If you want to contribute a bug fix or a new feature, feel free to implement it in any one of the supported languages, and we'll try to port it to the rest of them.

Demo

Click here for a live demo of the bistring library in your browser.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • Whitespace at start of string mishandled by SentenceTokenizer?

    Whitespace at start of string mishandled by SentenceTokenizer?

    I'm not sure if this is expected behaviour or a bug but the following code illustrates my uncertainty:

    pre_split = bistring.bistr(" \tFoo. \t\n \tBar. \t") \
        .sub(r"^\s+", "") \
        .sub(r"\s*\n\s*", "\n") \
        .sub(r"\s+$", "\n")
    
    post_split = bistring.bistr.join(
        [s.text for s in bistring.SentenceTokenizer("en_GB").tokenize(pre_split)]
    )
    
    # These should print True but actually print False
    print(pre_split == post_split)
    print(pre_split.original == post_split.original)
    
    # This should print True and does print True
    print(pre_split.modified == post_split.modified)
    
    # This should print False but actually prints True
    print(pre_split.original[2:] == post_split.original)
    

    In summary, I was expecting the result of re-joining the tokens produced by SentenceTokenizer to yield an identical bistr to the one that existed prior to the splitting. This appears to be true with the only (known) exception being whitespace at the start of the first sentence is being lost. Whitespace at the end of the string, and whitespace between sentences within the string, are retained as expected.

    Is this expected behaviour?

    Produces behaviour using Python 3.7, bistring 0.4.0, pyicu 2.6, and icu 68.1 (all installed via conda-forge).

    question 
    opened by qtdaniel 10
  • Composition of no-op replacements produces incorrect (or confusing?) alignment

    Composition of no-op replacements produces incorrect (or confusing?) alignment

    >>> from bistring import bistr
    >>> b = bistr("abc")
    >>> b1 = b.replace("bc", "bc")
    >>> b2 = b1.replace("ab", "ab")
    >>> b2
    bistr('abc', 'abc', Alignment([(0, 0), (1, 2), (3, 3)]))
    >>> b2[:2].original
    'a'
    

    Both individual replacements effectively don't change the contents of the original string. I think b2[:2].original "should" return 'abc' here. This would probably be achieved with a coarser composed alignment Alignment([(0, 0), (3, 3)]).

    opened by maxhgerlach 3
  • PyPI Release?

    PyPI Release?

    Thank you very much for this library! It's really useful in many text processing use cases for NLP.

    The last release 0.4 on PyPI is from September 2019 and on master there's been at least a fix for bistr.join() since then: #20 Would you consider cutting a new release?

    opened by maxhgerlach 3
  • Arithmetic progressions

    Arithmetic progressions

    This optimizes the representation of Alignments by detecting and compressing any arithmetic sub-sequences that are found. This implicitly compresses any runs with the identity mapping, as well as other potentially common ones like 2β†’1 char mappings from UTF-16 to UTF-8 or non-BMP to BMP.

    • [x] Python
    • [ ] JS
    opened by tavianator 3
  • Fix readthedocs build

    Fix readthedocs build

    • js: Update dependencies
    • js: Work around https://github.com/rollup/plugins/issues/934
    • docs: Update npm dependencies
    • docs: Work around https://github.com/readthedocs/readthedocs-docker-images/issues/107
    opened by tavianator 2
  • build(deps): bump urllib3 from 1.26.4 to 1.26.5 in /docs

    build(deps): bump urllib3 from 1.26.4 to 1.26.5 in /docs

    Bumps urllib3 from 1.26.4 to 1.26.5.

    Release notes

    Sourced from urllib3's releases.

    1.26.5

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.

    If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

    Changelog

    Sourced from urllib3's changelog.

    1.26.5 (2021-05-26)

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.
    Commits
    • d161647 Release 1.26.5
    • 2d4a3fe Improve performance of sub-authority splitting in URL
    • 2698537 Update vendored six to 1.16.0
    • 07bed79 Fix deprecation warnings for Python 3.10 ssl module
    • d725a9b Add Python 3.10 to GitHub Actions
    • 339ad34 Use pytest==6.2.4 on Python 3.10+
    • f271c9c Apply latest Black formatting
    • 1884878 [1.26] Properly proxy EOF on the SSLTransport test suite
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 2
  • build(deps): bump lodash from 4.17.15 to 4.17.19 in /docs

    build(deps): bump lodash from 4.17.15 to 4.17.19 in /docs

    ⚠️ Dependabot is rebasing this PR ⚠️

    If you make any changes to it yourself then they will take precedence over the rebase.


    Bumps lodash from 4.17.15 to 4.17.19.

    Release notes

    Sourced from lodash's releases.

    4.17.16

    Commits
    Maintainer changes

    This version was pushed to npm by mathias, a new releaser for lodash since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • build(deps): bump minimist from 1.2.5 to 1.2.6 in /js

    build(deps): bump minimist from 1.2.5 to 1.2.6 in /js

    Bumps minimist from 1.2.5 to 1.2.6.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 1
  • build(deps): bump node-notifier from 8.0.0 to 8.0.1 in /js

    build(deps): bump node-notifier from 8.0.0 to 8.0.1 in /js

    Bumps node-notifier from 8.0.0 to 8.0.1.

    Changelog

    Sourced from node-notifier's changelog.

    v8.0.1

    • fixes possible injection issue for notify-send
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump highlight.js from 10.1.2 to 10.4.1 in /docs

    build(deps): bump highlight.js from 10.1.2 to 10.4.1 in /docs

    Bumps highlight.js from 10.1.2 to 10.4.1.

    Release notes

    Sourced from highlight.js's releases.

    10.4.1

    Security fixes:

    • (fix) Exponential backtracking fixes for: Josh Goebel
      • cpp
      • handlebars
      • gams
      • perl
      • jboss-cli
      • r
      • erlang-repl
      • powershell
      • routeros
    • (fix) Polynomial backtracking fixes for: Josh Goebel
      • asciidoc
      • reasonml
      • latex
      • kotlin
      • gcode
      • d
      • aspectj
      • moonscript
      • coffeescript/livescript
      • csharp
      • scilab
      • crystal
      • elixir
      • basic
      • ebnf
      • ruby
      • fortran/irpf90
      • livecodeserver
      • yaml
      • x86asm
      • dsconfig
      • markdown
      • ruleslanguage
      • xquery
      • sqf

    Very grateful to Michael Schmidt for all the help.

    10.4.0 - November 2020

    A largish release with many improvements and fixes from quite a few different contributors. Enjoy!

    Deprecations:

    ... (truncated)

    Changelog

    Sourced from highlight.js's changelog.

    Version 10.4.1 (tentative)

    Security

    • (fix) Exponential backtracking fixes for: Josh Goebel
      • cpp
      • handlebars
      • gams
      • perl
      • jboss-cli
      • r
      • erlang-repl
      • powershell
      • routeros
    • (fix) Polynomial backtracking fixes for: Josh Goebel
      • asciidoc
      • reasonml
      • latex
      • kotlin
      • gcode
      • d
      • aspectj
      • moonscript
      • coffeescript/livescript
      • csharp
      • scilab
      • crystal
      • elixir
      • basic
      • ebnf
      • ruby
      • fortran/irpf90
      • livecodeserver
      • yaml
      • x86asm
      • dsconfig
      • markdown
      • ruleslanguage
      • xquery
      • sqf

    Very grateful to Michael Schmidt for all the help.

    Version 10.4.0

    A largish release with many improvements and fixes from quite a few different contributors. Enjoy!

    ... (truncated)

    Commits
    • e96b915 bump 10.4.1
    • 065f65f chore(release) allow release script to handle production releases
    • 68509fc chore(docs) bump SECURITY mention to 9.18.5
    • aa0fb85 chore(docs) Version 9 has reached EOL.
    • fb0a626 enh(ci): Add tests for polynomial regex issues
    • fa46dd1 fix(reasonml) fix poly backtracking issue
    • d496052 fix(latex) fix poly backtracking issue
    • d9f1cdb fix(javascript/typescript) fix poly backtracking issue
    • fdec037 fix(asciidoc) fix poly backtracking issue
    • 02ca487 fix(kotlin) fix poly backtracking issue
    • Additional commits viewable in compare view
    Maintainer changes

    This version was pushed to npm by joshgoebel, a new releaser for highlight.js since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump lodash from 4.17.15 to 4.17.19 in /js

    build(deps): bump lodash from 4.17.15 to 4.17.19 in /js

    Bumps lodash from 4.17.15 to 4.17.19.

    Release notes

    Sourced from lodash's releases.

    4.17.16

    Commits
    Maintainer changes

    This version was pushed to npm by mathias, a new releaser for lodash since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump json5 from 2.2.1 to 2.2.3 in /js

    build(deps): bump json5 from 2.2.1 to 2.2.3 in /js

    Bumps json5 from 2.2.1 to 2.2.3.

    Release notes

    Sourced from json5's releases.

    v2.2.3

    v2.2.2

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).
    Changelog

    Sourced from json5's changelog.

    v2.2.3 [code, diff]

    v2.2.2 [code, diff]

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).
    Commits
    • c3a7524 2.2.3
    • 94fd06d docs: update CHANGELOG for v2.2.3
    • 3b8cebf docs(security): use GitHub security advisories
    • f0fd9e1 docs: publish a security policy
    • 6a91a05 docs(template): bug -> bug report
    • 14f8cb1 2.2.2
    • 10cc7ca docs: update CHANGELOG for v2.2.2
    • 7774c10 fix: add proto to objects and arrays
    • edde30a Readme: slight tweak to intro
    • 97286f8 Improve example in readme
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /docs

    build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /docs

    Bumps certifi from 2021.10.8 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 0
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi microsoft/bistring!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository β€” take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request β€” to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches β€” this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time β€” to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 1
  • Problems with PyICU when installing bistring 0.4.0 with pip

    Problems with PyICU when installing bistring 0.4.0 with pip

    Summary

    I ran into what's apparently a known issue with installing PyICU over pip while trying to pip install bistring==0.4.0. Contrary to the error message and recommendations on that thread, installing pkg-config and libicu-dev didn't fix the issue for me. Only installing python3-icu (as recommended in the official PyICU docs) finally fixed it.

    This is obviously not an issue with bistring itself, but it makes it difficult to install bistring because the ICU dependency can't be automatically installed by pip. If there is nothing else that can be done about it, maybe a note about this could at least be added to the Readme file, so people can avoid the frustration of running into the pip error?

    More Details

    Here is the output I got from pip install bistring==0.4.0:

    Collecting bistring==0.4.0
      Downloading bistring-0.4.0-py3-none-any.whl (22 kB)
    Collecting pyicu
      Downloading PyICU-2.8.tar.gz (299 kB)
         |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 299 kB 2.1 MB/s
      Installing build dependencies ... done
      Getting requirements to build wheel ... error
      ERROR: Command errored out with exit status 1:
       command: /usr/bin/python3 /tmp/tmprkr9c6uy get_requires_for_build_wheel /tmp/tmp5tnhtmha
           cwd: /tmp/pip-install-x_54yndb/pyicu
      Complete output (64 lines):
      (running 'icu-config --version')
      (running 'pkg-config --modversion icu-i18n')
      Traceback (most recent call last):
        File "setup.py", line 63, in <module>
          ICU_VERSION = os.environ['ICU_VERSION']
        File "/usr/lib/python3.8/os.py", line 675, in __getitem__
          raise KeyError(key) from None
      KeyError: 'ICU_VERSION'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "setup.py", line 66, in <module>
          ICU_VERSION = check_output(('icu-config', '--version')).strip()
        File "setup.py", line 19, in check_output
          return subprocess_check_output(popenargs)
        File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.8/subprocess.py", line 489, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: 'icu-config'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "setup.py", line 69, in <module>
          ICU_VERSION = check_output(('pkg-config', '--modversion', 'icu-i18n')).strip()
        File "setup.py", line 19, in check_output
          return subprocess_check_output(popenargs)
        File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.8/subprocess.py", line 489, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: 'pkg-config'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "/tmp/tmprkr9c6uy", line 280, in <module>
          main()
        File "/tmp/tmprkr9c6uy", line 263, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/tmp/tmprkr9c6uy", line 114, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 162, in get_requires_for_build_wheel
          return self._get_build_requires(
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 143, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 158, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 71, in <module>
          raise RuntimeError('''
      RuntimeError:
      Please install pkg-config on your system or set the ICU_VERSION environment
      variable to the version of ICU you have installed.
    
      ----------------------------------------
    ERROR: Command errored out with exit status 1: /usr/bin/python3 /tmp/tmprkr9c6uy get_requires_for_build_wheel /tmp/tmp5tnhtmha Check the logs for full command output
    
    opened by outofthecave 4
  • Rust

    Rust

    • [x] Add a Rust project
    • [x] Implement Alignment
    • [ ] Implement BiString
    • [ ] Implement slices
    • [x] Add a README
    • [ ] Benchmarks
    • [ ] Try arithmetic progression compression
    • [ ] Unify unbounded slice behaviour with Python and JS
    • [x] CI
    opened by tavianator 0
  • Transliterate

    Transliterate

    I was hoping you might advise me on how to incorporate transliteration into a text transformation pipeline.

    Let's say I want to use a 3rd party library like from unidecode import unidecode. I could create a bistring with new_bistr = bistr(text.modified, unidecode(text.modified)) but I would loose all the previous operations.

    Is there a way to fold in a modified string that is calculated outside bistring's capabilities?

    enhancement question 
    opened by christian-storm 4
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones ([emai

Peter Waller 1.1k Jan 02, 2023
A minimal code sceleton for a textadveture parser written in python.

Textadventure sceleton written in python Use with a map file generated on https://www.trizbort.io Use the following Sockets for walking directions: n

1 Jan 06, 2022
Vastasanuli - Vastasanuli pelaa Sanuli-peliΓ€.

Vastasanuli Vastasanuli pelaa SANULI -peliΓ€. Se ei aina voita. KΓ€yttΓΆ Tarttet Pythonin (3.6+). Aja make (tai lataa words.txt muualta) Asentele vaaditt

Aarni Koskela 1 Jan 06, 2022
A production-ready pipeline for text mining and subject indexing

A production-ready pipeline for text mining and subject indexing

UF Open Source Club 12 Nov 06, 2022
Python tool to make adding to your armory spreadsheet armory less of a pain.

Python tool to make adding to your armory spreadsheet armory slightly less of a pain by creating a CSV to simply copy and paste.

1 Oct 20, 2021
Goblin-sim - Procedural fantasy world generator

goblin-sim This project is an attempt to create a procedural goblin fantasy worl

3 May 18, 2022
Extract price amount and currency symbol from a raw text string

price-parser is a small library for extracting price and currency from raw text strings.

Scrapinghub 252 Dec 31, 2022
A Python package to facilitate research on building and evaluating automated scoring models.

Rater Scoring Modeling Tool Introduction Automated scoring of written and spoken test responses is a growing field in educational natural language pro

ETS 59 Oct 10, 2022
Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their native language.

Find a Doc - Localization Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their n

Our Japan Life 18 Dec 19, 2022
A minimal python script for generating multiple onetime use bip39 seed phrases

seed_signer_ontimes WARNING This project has mainly been used for local development, and creation should be ran on a air-gapped machine. A minimal pyt

CypherToad 4 Sep 12, 2022
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! πŸ§™β€β™€οΈ

🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! πŸ§™β€β™€οΈ

Brandon 5.6k Jan 03, 2023
RSS Reader application for the Emacs Application Framework.

EAF RSS Reader RSS Reader application for the Emacs Application Framework. Load application (add-to-list 'load-path "~/.emacs.d/site-lisp/eaf-rss-read

EAF 15 Dec 07, 2022
A non-validating SQL parser module for Python

python-sqlparse - Parse SQL statements sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting S

Andi Albrecht 3.1k Jan 04, 2023
Free & simple way to encipher text

VenSipher VenSipher is a free medium through which text can be enciphered. It can convert any text into an unrecognizable secret text that can only be

3 Jan 28, 2022
Convert English text to IPA using the toPhonetic

Installation: Windows python -m pip install text2ipa macOS sudo pip3 install text2ipa Linux pip install text2ipa Features Convert English text to I

Joseph Quang 3 Jun 14, 2022
Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 1.2k Jan 01, 2023
Code Jam for creating a text-based adventure game engine and custom worlds

Text Based Adventure Jam Author: Devin McIntyre Our goal is two-fold: Create a text based adventure game engine that can parse a standard file format

HTTPChat 4 Dec 26, 2021
Skype export archive to text converter for python

Skype export archive to text converter This software utility extracts chat logs

Roland Pihlakas open source projects 2 Jun 30, 2022
Add your new words to a text file and get them randomly.

Memorize-New-Words In this very very very little project, I've wrote a code to memorize new english words. Therefore you can add the words and their m

Mostafa 2 Jul 04, 2022
WorldCloud Orçamento de Estado 2022

World Cloud Orçamento de Estado 2022 What it does This script creates a worldcloud, masked on a image, from a txt file How to run it? Install all libr

Jorge Gomes 2 Oct 12, 2021