Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Overview

Lark - a parsing toolkit for Python

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark can parse all context-free languages. To put it simply, it means that it is capable of parsing almost any programming language out there, and to some degree most natural languages too.

Who is it for?

  • Beginners: Lark is very friendly for experimentation. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so efficiently. It also constructs an annotated parse-tree for you, using only the grammar and an input, and it gives you convienient and flexible tools to process that parse-tree.

  • Experts: Lark implements both Earley(SPPF) and LALR(1), and several different lexers, so you can trade-off power and speed, according to your requirements. It also provides a variety of sophisticated features and utilities.

What can it do?

  • Parse all context-free grammars, and handle any ambiguity gracefully
  • Build an annotated parse-tree automagically, no construction code required.
  • Provide first-rate performance in terms of both Big-O complexity and measured run-time (considering that this is Python ;)
  • Run on every Python interpreter (it's pure-python)
  • Generate a stand-alone parser (for LALR(1) grammars)

And many more features. Read ahead and find out!

Most importantly, Lark will save you time and prevent you from getting parsing headaches.

Quick links

Install Lark

$ pip install lark --upgrade

Lark has no dependencies.

Tests

Syntax Highlighting

Lark provides syntax highlighting for its grammar files (*.lark):

Clones

These are implementations of Lark in other languages. They accept Lark grammars, and provide similar utilities.

Hello World

Here is a little program to parse "Hello, World!" (Or any other similar phrase):

from lark import Lark

l = Lark('''start: WORD "," WORD "!"

            %import common.WORD   // imports from terminal library
            %ignore " "           // Disregard spaces in text
         ''')

print( l.parse("Hello, World!") )

And the output is:

Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'World')])

Notice punctuation doesn't appear in the resulting tree. It's automatically filtered away by Lark.

Fruit flies like bananas

Lark is great at handling ambiguity. Here is the result of parsing the phrase "fruit flies like bananas":

fruitflies.png

Read the code here, and see more examples here.

List of main features

  • Builds a parse-tree (AST) automagically, based on the structure of the grammar
  • Earley parser
    • Can parse all context-free grammars
    • Full support for ambiguous grammars
  • LALR(1) parser
    • Fast and light, competitive with PLY
    • Can generate a stand-alone parser (read more)
  • CYK parser, for highly ambiguous grammars
  • EBNF grammar
  • Unicode fully supported
  • Python 2 & 3 compatible
  • Automatic line & column tracking
  • Standard library of terminals (strings, numbers, names, etc.)
  • Import grammars from Nearley.js (read more)
  • Extensive test suite codecov
  • MyPy support using type stubs
  • And much more!

See the full list of features here

Comparison to other libraries

Performance comparison

Lark is the fastest and lightest (lower is better)

Run-time Comparison

Memory Usage Comparison

Check out the JSON tutorial for more details on how the comparison was made.

Note: I really wanted to add PLY to the benchmark, but I couldn't find a working JSON parser anywhere written in PLY. If anyone can point me to one that actually works, I would be happy to add it!

Note 2: The parsimonious code has been optimized for this specific test, unlike the other benchmarks (Lark included). Its "real-world" performance may not be as good.

Feature comparison

Library Algorithm Grammar Builds tree? Supports ambiguity? Can handle every CFG? Line/Column tracking Generates Stand-alone
Lark Earley/LALR(1) EBNF Yes! Yes! Yes! Yes! Yes! (LALR only)
PLY LALR(1) BNF No No No No No
PyParsing PEG Combinators No No No* No No
Parsley PEG EBNF No No No* No No
Parsimonious PEG EBNF Yes No No* No No
ANTLR LL(*) EBNF Yes No Yes? Yes No

(* PEGs cannot handle non-deterministic grammars. Also, according to Wikipedia, it remains unanswered whether PEGs can really parse all deterministic CFGs)

Projects using Lark

  • Poetry - A utility for dependency management and packaging
  • tartiflette - a GraphQL server by Dailymotion
  • PyQuil - Python library for quantum programming using Quil
  • Preql - An interpreted relational query language that compiles to SQL
  • Hypothesis - Library for property-based testing
  • mappyfile - a MapFile parser for working with MapServer configuration
  • synapse - an intelligence analysis platform
  • Datacube-core - Open Data Cube analyses continental scale Earth Observation data through time
  • SPFlow - Library for Sum-Product Networks
  • Torchani - Accurate Neural Network Potential on PyTorch
  • Command-Block-Assembly - An assembly language, and C compiler, for Minecraft commands
  • EQL - Event Query Language
  • Fabric-SDK-Py - Hyperledger fabric SDK with Python 3.x
  • required - multi-field validation using docstrings
  • miniwdl - A static analysis toolkit for the Workflow Description Language
  • pytreeview - a lightweight tree-based grammar explorer
  • harmalysis - A language for harmonic analysis and music theory
  • gersemi - A CMake code formatter

Using Lark? Send me a message and I'll add your project!

License

Lark uses the MIT license.

(The standalone tool is under MPL2)

Contribute

Lark is currently accepting pull-requests. See How to develop Lark

Sponsor

If you like Lark, and want to see it grow, please consider sponsoring us!

Contact the author

Questions about code are best asked on gitter or in the issues.

For anything else, I can be reached by email at erezshin at gmail com.

-- Erez

Comments
  • Bug in handling ambiguity?

    Bug in handling ambiguity?

    When running this code:

    grammar = """
    expression: "c" | "d" | "c" "d"
    unit: expression "a"
        | "a" expression
        | "b" unit
        | "b" expression
    start: unit*
    
    %import common.WS
    %ignore WS
    """
    
    l = Lark(grammar, parser='earley', ambiguity='explicit')
    print(l.parse('b c d a a c').pretty())
    

    It is expected to have an ambiguous parse, but there is no '_ambig' node.

    At least these options are valid:

    unit(
        b
        unit(
            expression(
                c
                d
            )
            a
        )
    )
    unit(
        a
        expression(
            c
        )
    )
    

    and this parse:

    unit(
        b
        expression(
            c
        )
    )
    unit(
        expression(
            d
        )
        a
    )
    unit(
        a
        expression(
            c
        )
    )
    

    The only parse that comes back is the second one. When one removes the "b" expression option, you get the first one.

    bug 
    opened by uriva 67
  • Lark runs on Pyodide! (Online IDE)

    Lark runs on Pyodide! (Online IDE)

    Lark runs out-of-the-box inside the browser using Pyodide:

    image

    Pyodide is a CPython 3.7 interpreter compiled to web-assembly (wasm). Here's the Python console from above: https://pyodide.cdn.iodide.io/console.html

    Maybe this can be helpful as a quick start for all who quickly want to get into?

    discussion 
    opened by phorward 36
  • 0.11.2: pytest is failing

    0.11.2: pytest is failing

    I'm trying to package your module as rpm packag. So I'm using typical in such case build, install and test cycle used on building package from non-root account:

    • "setup.py build"
    • "setup.py install --root </install/prefix>"
    • "pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

    May I ask for help because few units are failing:

    + PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-lark-parser-0.11.3-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-lark-parser-0.11.3-2.fc35.x86_64/usr/lib/python3.8/site-packages
    + /usr/bin/pytest -ra
    =========================================================================== test session starts ============================================================================
    platform linux -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
    benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
    Using --randomly-seed=2126451817
    rootdir: /home/tkloczko/rpmbuild/BUILD/lark-0.11.3
    plugins: forked-1.3.0, shutil-1.7.0, virtualenv-1.7.0, expect-1.1.0, flake8-1.0.7, timeout-1.4.2, betamax-0.8.1, freezegun-0.4.2, aspectlib-1.5.2, toolbox-0.5, rerunfailures-9.1.1, requests-mock-1.9.3, cov-2.12.1, pyfakefs-4.5.0, flaky-3.7.0, benchmark-3.4.1, xdist-2.3.0, pylama-7.7.1, datadir-1.3.1, regressions-2.2.0, cases-3.6.3, xprocess-0.18.1, black-0.3.12, checkdocs-2.7.1, anyio-3.3.0, Faker-8.11.0, asyncio-0.15.1, trio-0.7.0, httpbin-1.0.0, subtests-0.5.0, isort-2.0.0, hypothesis-6.14.6, mock-3.6.1, profiling-1.7.0, randomly-3.8.0
    collected 998 items
    
    tests/test_tools.py ....                                                                                                                                             [  0%]
    tests/test_logger.py ...                                                                                                                                             [  0%]
    tests/test_reconstructor.py .......                                                                                                                                  [  1%]
    tests/test_trees.py ..............                                                                                                                                   [  2%]
    tests/test_parser.py ...............s.....ss.s.s......ss.....ss..s..s.......s......s.s.s...s.....s...s.s......s...s.....s......s................s...s...s........... [ 17%]
    ..s...................s.........s.....s...................s...s..s...s........s...................s.........s....................................................... [ 33%]
    ...........s....s.......s........................s.................s.............s..s...................s....s.s...ss......................s...............s..s.s... [ 50%]
    .........s..s...s....................s..................s..........s...s................s.........s..s..s.....s........s.....s.s.......s......s......s....s......... [ 66%]
    ...........s............s.....s....................s.s............................s.......s....ss..ss..s...........s.ss......s...............s.s........s.s.s...s.s. [ 82%]
    ....ss...............s.......s.........................s....s............s..........s..........................................                                      [ 95%]
    tests/test_lexer.py .                                                                                                                                                [ 95%]
    tests/test_nearley/test_nearley.py ..FF...F                                                                                                                          [ 96%]
    tests/test_cache.py ....                                                                                                                                             [ 96%]
    . .                                                                                                                                                                  [ 97%]
    tests/test_cache.py F.                                                                                                                                               [ 97%]
    tests/test_grammar.py .......F.......                                                                                                                                [ 98%]
    tests/test_tree_forest_transformer.py ............                                                                                                                   [100%]
    
    ================================================================================= FAILURES =================================================================================
    _________________________________________________________________________ TestNearley.test_include _________________________________________________________________________
    
    self = <tests.test_nearley.test_nearley.TestNearley testMethod=test_include>
    
        def test_include(self):
            fn = os.path.join(NEARLEY_PATH, 'test/grammars/folder-test.ne')
    >       with open(fn) as f:
    E       FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_nearley/nearley/test/grammars/folder-test.ne'
    
    tests/test_nearley/test_nearley.py:48: FileNotFoundError
    ______________________________________________________________________ TestNearley.test_multi_include ______________________________________________________________________
    
    self = <tests.test_nearley.test_nearley.TestNearley testMethod=test_multi_include>
    
        def test_multi_include(self):
            fn = os.path.join(NEARLEY_PATH, 'test/grammars/multi-include-test.ne')
    >       with open(fn) as f:
    E       FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_nearley/nearley/test/grammars/multi-include-test.ne'
    
    tests/test_nearley/test_nearley.py:61: FileNotFoundError
    ___________________________________________________________________________ TestNearley.test_css ___________________________________________________________________________
    
    self = <tests.test_nearley.test_nearley.TestNearley testMethod=test_css>
    
        def test_css(self):
            fn = os.path.join(NEARLEY_PATH, 'examples/csscolor.ne')
    >       with open(fn) as f:
    E       FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_nearley/nearley/examples/csscolor.ne'
    
    tests/test_nearley/test_nearley.py:28: FileNotFoundError
    __________________________________________________________________________ TestCache.test_imports __________________________________________________________________________
    
    self = <tests.test_cache.TestCache testMethod=test_imports>
    
        def test_imports(self):
            g = """
            %import .grammars.ab (startab, expr)
            """
    >       parser = Lark(g, parser='lalr', start='startab', cache=True)
    
    tests/test_cache.py:131:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    lark/lark.py:299: in __init__
        self.grammar, used_files = load_grammar(grammar, self.source_path, self.options.import_paths, self.options.keep_all_tokens)
    lark/load_grammar.py:1229: in load_grammar
        builder.load_grammar(grammar, source)
    lark/load_grammar.py:1082: in load_grammar
        self.do_import(dotted_path, base_path, aliases, mangle)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    self = <lark.load_grammar.GrammarBuilder object at 0x7f9944b73640>, dotted_path = (Token('RULE', 'grammars'), Token('RULE', 'ab')), base_path = '/usr/bin'
    aliases = {Token('RULE', 'expr'): Token('RULE', 'expr'), Token('RULE', 'startab'): Token('RULE', 'startab')}, base_mangle = None
    
        def do_import(self, dotted_path, base_path, aliases, base_mangle=None):
            assert dotted_path
            mangle = _get_mangle('__'.join(dotted_path), aliases, base_mangle)
            grammar_path = os.path.join(*dotted_path) + EXT
            to_try = self.import_paths + ([base_path] if base_path is not None else []) + [stdlib_loader]
            for source in to_try:
                try:
                    if callable(source):
                        joined_path, text = source(base_path, grammar_path)
                    else:
                        joined_path = os.path.join(source, grammar_path)
                        with open(joined_path, encoding='utf8') as f:
                            text = f.read()
                except IOError:
                    continue
                else:
                    h = hashlib.md5(text.encode('utf8')).hexdigest()
                    if self.used_files.get(joined_path, h) != h:
                        raise RuntimeError("Grammar file was changed during importing")
                    self.used_files[joined_path] = h
    
                    gb = GrammarBuilder(self.global_keep_all_tokens, self.import_paths, self.used_files)
                    gb.load_grammar(text, joined_path, mangle)
                    gb._remove_unused(map(mangle, aliases))
                    for name in gb._definitions:
                        if name in self._definitions:
                            raise GrammarError("Cannot import '%s' from '%s': Symbol already defined." % (name, grammar_path))
    
                    self._definitions.update(**gb._definitions)
                    break
            else:
                # Search failed. Make Python throw a nice error.
    >           open(grammar_path, encoding='utf8')
    E           FileNotFoundError: [Errno 2] No such file or directory: 'grammars/ab.lark'
    
    lark/load_grammar.py:1162: FileNotFoundError
    ______________________________________________________________________ TestGrammar.test_override_rule ______________________________________________________________________
    
    self = <tests.test_grammar.TestGrammar testMethod=test_override_rule>
    
        def test_override_rule(self):
            # Overrides the 'sep' template in existing grammar to add an optional terminating delimiter
            # Thus extending it beyond its original capacity
            p = Lark("""
                %import .test_templates_import (start, sep)
    
                %override sep{item, delim}: item (delim item)* delim?
                %ignore " "
            """, source_path=__file__)
    
            a = p.parse('[1, 2, 3]')
            b = p.parse('[1, 2, 3, ]')
            assert a == b
    
    >       self.assertRaises(GrammarError, Lark, """
                %import .test_templates_import (start, sep)
    
                %override sep{item}: item (delim item)* delim?
            """)
    
    tests/test_grammar.py:39:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    lark/lark.py:299: in __init__
        self.grammar, used_files = load_grammar(grammar, self.source_path, self.options.import_paths, self.options.keep_all_tokens)
    lark/load_grammar.py:1229: in load_grammar
        builder.load_grammar(grammar, source)
    lark/load_grammar.py:1082: in load_grammar
        self.do_import(dotted_path, base_path, aliases, mangle)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
        def do_import(self, dotted_path, base_path, aliases, base_mangle=None):
            assert dotted_path
            mangle = _get_mangle('__'.join(dotted_path), aliases, base_mangle)
            grammar_path = os.path.join(*dotted_path) + EXT
            to_try = self.import_paths + ([base_path] if base_path is not None else []) + [stdlib_loader]
            for source in to_try:
                try:
                    if callable(source):
                        joined_path, text = source(base_path, grammar_path)
                    else:
                        joined_path = os.path.join(source, grammar_path)
                        with open(joined_path, encoding='utf8') as f:
                            text = f.read()
                except IOError:
                    continue
                else:
                    h = hashlib.md5(text.encode('utf8')).hexdigest()
                    if self.used_files.get(joined_path, h) != h:
                        raise RuntimeError("Grammar file was changed during importing")
                    self.used_files[joined_path] = h
    
                    gb = GrammarBuilder(self.global_keep_all_tokens, self.import_paths, self.used_files)
                    gb.load_grammar(text, joined_path, mangle)
                    gb._remove_unused(map(mangle, aliases))
                    for name in gb._definitions:
                        if name in self._definitions:
                            raise GrammarError("Cannot import '%s' from '%s': Symbol already defined." % (name, grammar_path))
    
                    self._definitions.update(**gb._definitions)
                    break
            else:
                # Search failed. Make Python throw a nice error.
    >           open(grammar_path, encoding='utf8')
    E           FileNotFoundError: [Errno 2] No such file or directory: 'test_templates_import.lark'
    
    lark/load_grammar.py:1162: FileNotFoundError
    ============================================================================= warnings summary =============================================================================
    tests/test_cache.py:110
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_cache.py:110: DeprecationWarning: invalid escape sequence \d
        g = """
    
    tests/test_cache.py:48
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_cache.py:48: PytestCollectionWarning: cannot collect test class 'TestT' because it has a __init__ constructor (from: tests/test_cache.py)
        class TestT(Transformer):
    
    tests/test_parser.py:166
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_parser.py:166: DeprecationWarning: invalid escape sequence \d
        g = """
    
    tests/test_reconstructor.py:75
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_reconstructor.py:75: DeprecationWarning: invalid escape sequence \s
        g = """
    
    tests/test_reconstructor.py:90
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_reconstructor.py:90: DeprecationWarning: invalid escape sequence \s
        g = """
    
    tests/test_reconstructor.py:154
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_reconstructor.py:154: DeprecationWarning: invalid escape sequence \s
        g1 = """
    
    tests/test_reconstructor.py:162
      /home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tests/test_reconstructor.py:162: DeprecationWarning: invalid escape sequence \s
        g2 = """
    
    -- Docs: https://docs.pytest.org/en/stable/warnings.html
    ========================================================================= short test summary info ==========================================================================
    SKIPPED [7] tests/test_parser.py:2005: Currently only Earley supports priority sum in rules
    SKIPPED [2] tests/test_parser.py:2077: No empty rules
    SKIPPED [7] tests/test_parser.py:2309: Serialize currently only works for LALR parsers without custom lexers (though it should be easy to extend)
    SKIPPED [9] tests/test_parser.py:1045: cStringIO not available
    SKIPPED [3] tests/test_parser.py:2355: match_examples() not supported for CYK/old custom lexer
    SKIPPED [9] tests/test_parser.py:1249: Flattening list isn't implemented (and may never be)
    SKIPPED [2] tests/test_parser.py:1961: Doesn't work for CYK
    SKIPPED [2] tests/test_parser.py:2231: Empty rules
    SKIPPED [2] tests/test_parser.py:2220: Empty rules
    SKIPPED [2] tests/test_parser.py:1120: Takes forever
    SKIPPED [9] tests/test_parser.py:1265: Flattening list isn't implemented (and may never be)
    SKIPPED [6] tests/test_parser.py:1705: Only standard lexers care about token priority
    SKIPPED [2] tests/test_parser.py:1512: No empty rules
    SKIPPED [2] tests/test_parser.py:1194: No empty rules
    SKIPPED [2] tests/test_parser.py:1650: No empty rules
    SKIPPED [6] tests/test_parser.py:2435: interactive_parser error handling only works with LALR for now
    SKIPPED [6] tests/test_parser.py:2398: interactive_parser is only implemented for LALR at the moment
    SKIPPED [2] tests/test_parser.py:1451: No empty rules
    SKIPPED [9] tests/test_parser.py:1281: Flattening list isn't implemented (and may never be)
    SKIPPED [2] tests/test_parser.py:1213: No empty rules
    SKIPPED [2] tests/test_parser.py:1233: No empty rules
    SKIPPED [4] tests/test_parser.py:2194: Priority not handled correctly right now
    SKIPPED [2] tests/test_parser.py:1915: %declare/postlex doesn't work with dynamic
    SKIPPED [2] tests/test_parser.py:1938: %declare/postlex doesn't work with dynamic
    SKIPPED [1] tests/test_parser.py:754: Only relevant for the dynamic_complete parser
    SKIPPED [1] tests/test_parser.py:402: Only relevant for the dynamic_complete parser
    FAILED tests/test_nearley/test_nearley.py::TestNearley::test_include - FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3...
    FAILED tests/test_nearley/test_nearley.py::TestNearley::test_multi_include - FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-...
    FAILED tests/test_nearley/test_nearley.py::TestNearley::test_css - FileNotFoundError: [Errno 2] No such file or directory: '/home/tkloczko/rpmbuild/BUILD/lark-0.11.3/tes...
    FAILED tests/test_cache.py::TestCache::test_imports - FileNotFoundError: [Errno 2] No such file or directory: 'grammars/ab.lark'
    FAILED tests/test_grammar.py::TestGrammar::test_override_rule - FileNotFoundError: [Errno 2] No such file or directory: 'test_templates_import.lark'
    ========================================================= 5 failed, 889 passed, 103 skipped, 7 warnings in 44.97s ==========================================================
    pytest-xprocess reminder::Be sure to terminate the started process by running 'pytest --xkill' if you have not explicitly done so in your fixture with 'xprocess.getinfo(<process_name>).terminate()'.
    
    opened by kloczek 35
  • Fix #696 now providing the correct amount of placeholders

    Fix #696 now providing the correct amount of placeholders

    p = Lark("""!start: ["a" "b" "c"] """, maybe_placeholders=True)
    p.parse("").children
    

    now returns [None, None, None] instead of [None]

    same for !start: ["a" ["b" "c"]].

    opened by ornariece 32
  • Changing file-extension for standalone grammar definitions from .g?

    Changing file-extension for standalone grammar definitions from .g?

    Currently standalone files like common and the example json use the file extension .g. However, it looks like g is already associated with the ANTLR parser. While I suppose it's possible to make Lark compatible with ANTLR, in the meantime it's probably best to use a different file extension. I would propose the extension lrk as it doesn't seem to be used by anything currently.

    I'm about to submit a pull request to change the file extensions to .lrk in the relevant file names and in the code referencing the .g extension. It's on a separate branch so if you want to use a different extension that should be easy enough to change.

    discussion 
    opened by RobRoseKnows 31
  • Fix `python.number` pattern

    Fix `python.number` pattern

    Python doesn't accept numbers with the _ in the beginning/end and numbers with more than one _ in the allowed places:

    >>> 69420
    69420
    >>> 69_420
    69420
    >>> 69__420
      File "<stdin>", line 1
        69__420
          ^
    SyntaxError: invalid decimal literal
    >>> 69_420_
      File "<stdin>", line 1
        69_420_
              ^
    SyntaxError: invalid decimal literal
    >>> _69_420
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name '_69_420' is not defined
    
    >>> 03.1415
    3.1415
    >>> 0_3.14_15
    3.1415
    >>> 0__3.14_15
      File "<stdin>", line 1
        0__3.14_15
         ^
    SyntaxError: invalid decimal literal
    >>> 0_3.14__15
      File "<stdin>", line 1
        0_3.14__15
              ^
    SyntaxError: invalid decimal literal
    >>> 0_3.14_15_
      File "<stdin>", line 1
        0_3.14_15_
                 ^
    SyntaxError: invalid decimal literal
    >>> 0_3._14_15
      File "<stdin>", line 1
        0_3._14_15
           ^
    SyntaxError: invalid decimal literal
    >>> 0_3_.14_15
      File "<stdin>", line 1
        0_3_.14_15
           ^
    SyntaxError: invalid decimal literal
    >>> _0_3.14_15
      File "<stdin>", line 1
        _0_3.14_15
        ^^^^^^^^^^
    SyntaxError: invalid syntax. Perhaps you forgot a comma?
    

    the same goes with complex numbers. And yes, python recognizes _xxx as a name, even though x is a digit, but it's still not a number, so this doesn't affect us.

    The current implementation only filters numbers with _ in the beginning, so here's the fix for the other cases.


    Hopefully, we still can make backward-incompatible changes, so it's fine to change IMAG_NUMBER to COMPLEX_NUMBER

    I also tested \d(?:_?\d+)* for DEC_NUMBER but haven't seen any significant performance changes (everything is within the normal range, considering that I was not using a stable environment).

    opened by 0dminnimda 30
  • Newbie questions

    Newbie questions

    Consider the below snippet:

    from lark import Lark, inline_args, Transformer
    
    grammars = [
        """
            ?start: sum | NAME "=" sum
            ?sum: product | sum "+" product | sum "-" product
            ?product: atom | product "*" atom | product "/" atom
            ?atom: NUMBER | "-" atom | NAME | "(" sum ")"
            %import common.CNAME -> NAME
            %import common.NUMBER
            %import common.WS_INLINE
            %ignore WS_INLINE
        """,
        """
            ?start: sum | NAME "=" sum
            ?sum: product | sum "+" product | sum "-" product
            ?product: atom | product "*" atom | product "/" atom
            ?atom: NUMBER | "-" atom | NAME | "(" sum ")"
            EQUAL: "="
            LPAR: "("
            RPAR: ")"
            SLASH: "/"
            STAR: "*"
            MINUS: "-"
            PLUS: "+"
            %import common.CNAME -> NAME
            %import common.NUMBER
            %import common.WS_INLINE
            %ignore WS_INLINE
        """,
        """
            ?start: sum | NAME "=" sum
            ?sum: product | sum "+" product | sum "-" product
            ?product: atom | product "*" atom | product "/" atom
            ?atom: NUMBER | "-" atom | NAME | "(" sum ")"
            OPERATOR : "=" | "(" | ")" | "/" | "*" | "-" | "+"
            %import common.CNAME -> NAME
            %import common.NUMBER
            %import common.WS_INLINE
            %ignore WS_INLINE
        """
    ]
    
    
    def test(grammar, text):
        parser = Lark(grammar, start='start')
        # print(parser.parse(text).pretty())
        print(sorted(list(set([t.type for t in parser.lex(text)]))))
        # print([t.name for t in parser.lexer.tokens])
    
    
    text = "x = 1+2 - 3-4 - 5*6 - 7/8 - (9+10-11*12/13)"
    for i, grammar in enumerate(grammars):
        print('grammar {}'.format(i).center(80, '*'))
        test(grammar, text)
    

    whose output is:

    ***********************************grammar 0************************************
    ['NAME', 'NUMBER', '__EQUAL', '__LPAR', '__MINUS', '__PLUS', '__RPAR', '__SLASH', '__STAR']
    ***********************************grammar 1************************************
    ['EQUAL', 'LPAR', 'MINUS', 'NAME', 'NUMBER', 'PLUS', 'RPAR', 'SLASH', 'STAR']
    ***********************************grammar 2************************************
    ['NAME', 'NUMBER', '__EQUAL', '__LPAR', '__MINUS', '__PLUS', '__RPAR', '__SLASH', '__STAR']
    

    got some questions:

    1. About grammar0, this set of token types '__EQUAL', '__LPAR', '__MINUS', '__PLUS', '__RPAR', '__SLASH', '__STAR' are generated automagically, how does this work internally?

    2. About grammar1, following this method I'll be able to identify easily the token types so I can use the types to syntax highlight with QScintilla, is there any problem with this approach?

    3. About grammar2, in case I want to syntax highlight a group of similar tokens, how can I do that? In this case the token types are still generated automatically instead becoming OPERATOR. I'd like to be able to apply one QScintilla style to a bunch of related tokens (ie: OPERATORS= " | "(" | ")" | "/" | "*" | "-" | "+")

    opened by brupelo 30
  • Fail to create parser using a big grammar (memory increase infinitely)

    Fail to create parser using a big grammar (memory increase infinitely)

    Hello,

    I'm trying to parse a text in python 3.5, using the 0.5.6 release of lark. I have a very long grammar in this format :

    start: title  field+
    
    field: rule1 -> alias1
    	| rule2 -> alias2
    	[…]
    	| rule386 -> alias386
    
    //AUXILIARY TERMS
    title: ...
    term1: ...
    [...]
    term90:
    
    //RULES GROUP 1
    rule1: ...
    [...]
    rule276: ...
    
    //RULES GROUP2
    rule277: ...
    [...]
    rule386: ...
    
    //TERMINALS
    [...]
    

    Here are some examples of the syntax of rules and terms :

    //AUXILIARY TERMS
    adexpmsg: CHARACTER*
    aidequipment: (("N"|"S") [equipmentcode])|equipmentcode
    aircraftid: ALPHANUM~2..7
    
    //PRIMARY FIELDS
    aatot: _HYPHEN _sep "AATOT" _sep timehhmm
    ad: _HYPHEN _sep "AD" _sep adid [_sep (fl|flblock)] [_sep eto] [_sep to] [_sep cto] [_sep sto] [_sep ptstay] [_sep ptrfl] [_sep ptrulchg] [_sep (ptspeed|ptmach)]
    ada: _HYPHEN _sep "ADA" _sep date
    
    //SUBFIELDS
    addrinfo: _HYPHEN _sep "ADDRINFO" _sep networktype _sep fac
    adid: _HYPHEN _sep "ADID" _sep (icaoaerodrome | "ZZZZ")
    adname: _HYPHEN _sep "ADNAME" _sep (LIM_CHAR)~1..50
    
    //TERMINALS
    _sep: SEP*
    ALPHA: /[A-Z]{1}/
    DIGIT: /[0-9]{1}/
    ALPHANUM: ALPHA|DIGIT
    SPACE: " "
    _HYPHEN: "-"
    FEF: "\n"|"\r"
    SEP: (SPACE|FEF)
    SPECIAL: SPACE
    	|"("
    	|")"
    	|"?"
    	|":"
    	|"."
    	|","
    	|"'"
    	|"="
    	|"+"
    	|"/"
    CHARACTER: ALPHA|DIGIT|SPECIAL|FEF|_HYPHEN
    LIM_CHAR: ALPHA|DIGIT|SPECIAL|FEF
    START_OF_FIELD: _HYPHEN
    %import common.WS
    

    The text format I'm trying to parse is the ADEXP format, which is a succession of fields, all of them beginning by a "-", followed by a name of field, and one or more values, which can also be a new field. The first field is "-TITLE". Here is an example of an ADEXP message :

    -TITLE BFD -REFDATA -SENDER -FAC BORD -RECVR -FAC A -SEQNUM 001 -ARCID RYR743D -SSRCODE A1122 -NBARC 1 
    -ARCTYP B738 -ADEP EDDH -ROUTE N0441F370 OLRAK DEGOL LAPRO PPG ALBER  -BEGIN RTEPTS -PT -PTID OLRAK -PT 
    -PTID DEGOL -PT -PTID LAPRO -PT -PTID PPG -PT -PTID ALBER -END RTEPTS -ADES LEBL -BEGIN EQCST -EQPT Y/EQ 
    -EQPT W/EQ -EQPT R/EQ -END EQCST -RFL F370 -SPEED N0441 -EOBT 2359 -WKTRC M
    

    This format allow separators between almost every fields, but some fields have to be directly one after the other, without any separator, so i had to explicit all separators in all rules. You can find in details the complete syntax of all fields here.

    I tested some rules individually and I'm able to parse a text with this rules.

    But when I try to parse the entire grammar above (~900 lines), lark can't create the parser:

    When I execute the code :

    [...]
    print("Creating parser")
    parser = Lark(grammar, parser='earley')
    print("Parser created")
    tree_res = parser.parse(text)
    print("Text parsed")
    

    The program displays "Creating parser", and processes to create it. The RAM is increasing progressively, until it reaches 12-13GO of RAM and get killed by the system (after 15-20 minutes).

    I also tried to use different parser with different lexers, but it doesn't change anything. I obtained the same result when I used a different python interpreter (Pypy).

    I would like to know if you have any idea why it's taking this time to finally get killed ? Is it my grammar which is too big ? Or maybe too ambiguous ? Tell me if you need any more details about the grammar or anything else.

    Thank you in advance for your answer.

    opened by dryslope 29
  • Improvement: Use Cython for Speed

    Improvement: Use Cython for Speed

    Having written a LALR parser for my language (https://github.com/eddieschoute/quippy) it still takes many seconds to parse a file of 100k LOC. One benchmark It takes very roughly 2m20s to parse a 600kLOC input file that I have, which is slow in my opinion. One straightforward improvement that I can think of is to use Cython to generate a C-implementation of the LALR parser. Most of the time seems to be spent in the main LALR parser loop, which can be significantly sped up by Cython. I would also be open to other suggestions to improve the parsing speed.

    Since specifically the LALR parser is meant to compete in speed, I think it would be worth exploring the possibility of pushing this parser to its limit. Hopefully, converting the code to Cython code will be fairly painless and from there it just remains to optimize the functionality.

    I do not know how the standalone parser will be affected by this, but I can image that instead of generating py files it should instead generate a pyx file that can be cythonized.

    enhancement discussion 
    opened by eddieschoute 29
  • Make the Earley parser closer to the spec and add a complete SPPF forest implementation.

    Make the Earley parser closer to the spec and add a complete SPPF forest implementation.

    Key changes: Add Items to the current Column and ensure unique before adding derivations

    • Ensures all derivations get added to the same unique items.

    Add rudimentary SPPF type implementation to derivations, indexed on start and end, end as per:

    • https://www.sciencedirect.com/science/article/pii/S1571066108001497
    • This was required after with fixed _ambig detection.

    Remove earley__predict_all property.

    • No longer needed after the above two changes.
    opened by night199uk 28
  • Bytes support

    Bytes support

    This is a start of implementing support for byte string as suggested in #626. This is still WIP.

    My idea is that the grammar is still string, but you pass the use_bytes=True flag, make the patterns to be compiled with bytes. If you need to use match bytes that are not compatible with whatever encoding is used, you can just escaped them. They will be unescaped later.

    TODO:

    • [X] Add use_bytes to make regex compile as bytes
    • [x] Add tests (essentially, everything needs to be tested again, but with bytes)
    • [x] Find and check edge cases

    @ctrlcctrlv, does this fix your use case?

    opened by MegaIng 27
  • Can I display progress status of Lark().parse()?

    Can I display progress status of Lark().parse()?

    I have implemented a JSON converter for a unique format of text with a CLI using Lark. When I run Lark().parse() on a large file, I have to wait for several tens of seconds.

    Is there a way to get a progress status -- for example, by returning a generator that can be passed to tqdm?

    I am not having any issues with the speed of Lark. I just want to be able to inform the user that the program is running👍.

    enhancement 
    opened by quag-cactus 2
  • How to keep track of tree while transforming it?

    How to keep track of tree while transforming it?

    I have a lark.visitors.Transformer which converts a AST into some other AST, while doing it I will exit if there is a error, i use the tokens to show where the error occured. now it is not possible to get the token because it is transformed.

    question 
    opened by aspizu 0
  • Generate Type-annotated Visitor definition from lark grammar

    Generate Type-annotated Visitor definition from lark grammar

    This feature will generate a python file containing a Visitor class definition with methods for every rule defined in the lark grammar file which will have the correct type-annotations.

    Example: grammar.lark

    start: "FOO" bar biz
    bar: (/[a-z]/)*
    biz: [/no/]
    

    Result: lark --generate-visitor grammar.lark

    from typing import Optional, Literal
    from lark import Visitor, Token
    
    class MyVisitor(Visitor[Token]):
        def start(self, args: tuple[tuple[Token, ...], Optional[Literal["no"]]]):
            ...
        
        def bar(self, args: tuple[Token, ...]):
            ...
        
        def biz(self, args: tuple[Optional[Literal["no"]]]):
            ...
    
    enhancement 
    opened by aspizu 4
  • Macroses support. Dynamic grammar

    Macroses support. Dynamic grammar

    I want to implement Macroses support for my masm parser. So is it possible to have kind of Dynamic grammar so I could add tokens at runtime? Or check if token match with my handler? (If macros was defined several lines before) It also might called custom matcher

    opened by xor2003 5
  • Support for Python-style comments in Lark grammar

    Support for Python-style comments in Lark grammar

    Given that

    • most (all?) editors are unaware of Lark's syntax
    • most lark grammars live in Python strings
    • most editors will use # when asked to comment lines or blocks within Lark strings (eg Pycharm's CTRL+D)
    • commenting lines and blocks is frequently done while developing a grammar (debugging...)
    • adding this style of comments should not break existing grammars

    I propose in this small PR to enable Python-style comments in Lark grammars. If accepted, I'll do another PR to reflect that in documentation.

    opened by vincent-hugot 12
Releases(1.1.5)
  • 1.1.5(Dec 6, 2022)

    What's Changed

    • setup.cfg: Replace deprecated license_file with license_files by @mgorny in https://github.com/lark-parser/lark/pull/1209
    • Fix Github shenanigans by @erezsh in https://github.com/lark-parser/lark/pull/1220
    • Fix AmbiguousExpander (Issue #1214) by @chanicpanic in https://github.com/lark-parser/lark/pull/1216
    • Fix EOF line information in InteractiveParser.resume_parse() by @erezsh in https://github.com/lark-parser/lark/pull/1224
    • Use generator instead of list expand or add method by @jmishra01 in https://github.com/lark-parser/lark/pull/1225

    New Contributors

    • @mgorny made their first contribution in https://github.com/lark-parser/lark/pull/1209
    • @jmishra01 made their first contribution in https://github.com/lark-parser/lark/pull/1225

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.4...1.1.5

    Source code(tar.gz)
    Source code(zip)
  • 1.1.4(Nov 2, 2022)

    What's Changed

    • ci: Python 3.11 final by @henryiii in https://github.com/lark-parser/lark/pull/1204
    • Add __all__ to __init__ by @aspizu in https://github.com/lark-parser/lark/pull/1200
    • PropagatePositions: Allow any object to carry the metadata, by returning it in __lark_meta__() by @erezsh in https://github.com/lark-parser/lark/pull/1203
    • fix: Token now pattern matches correctly by @marcinplatek in https://github.com/lark-parser/lark/pull/1181
    • Updates to merge PR #1151 by @erezsh in https://github.com/lark-parser/lark/pull/1205
    • style: pre-commit basic config by @henryiii in https://github.com/lark-parser/lark/pull/1151
    • PR for v1.1.4 by @erezsh in https://github.com/lark-parser/lark/pull/1208

    New Contributors

    • @aspizu made their first contribution in https://github.com/lark-parser/lark/pull/1200
    • @marcinplatek made their first contribution in https://github.com/lark-parser/lark/pull/1181

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.3...1.1.4

    Source code(tar.gz)
    Source code(zip)
  • 1.1.3(Oct 11, 2022)

    What's Changed

    • Add user to cache filename; better handle cache load/save failures by @klauer in https://github.com/lark-parser/lark/pull/1179

    • refactor: add 'usedforsecurity=False' arg to hashlib.md5 usage by @cquick01 in https://github.com/lark-parser/lark/pull/1190

    • Create lark/grammars/init.py by @chanicpanic in https://github.com/lark-parser/lark/pull/1171

    • Adjust imports for Python 3.11 by @The-Compiler in https://github.com/lark-parser/lark/pull/1140

    • Fix for issue #1173 by @erezsh in https://github.com/lark-parser/lark/pull/1198

    • Add match stmt support to python.lark by @joseph-e-k in https://github.com/lark-parser/lark/pull/1123

    • Added match stmt support to python.lark by @MegaIng in https://github.com/lark-parser/lark/pull/1016

    • Linting to fix minor issues by @Erotemic in https://github.com/lark-parser/lark/pull/1128

    • Simplify lexer: Use Match.lastgroup instead of lastindex by @erezsh in https://github.com/lark-parser/lark/pull/1129

    • Fix confusing import in examples by @JonasLoos in https://github.com/lark-parser/lark/pull/1138

    • Move iter_subtrees_topdown into standalone by @camgunz in https://github.com/lark-parser/lark/pull/1137

    • Fix 1146: use the class's get instead of the instance's get by @MegaIng in https://github.com/lark-parser/lark/pull/1147

    • fix: remove Python 2 legacy packaging code by @henryiii in https://github.com/lark-parser/lark/pull/1148

    • Fix for PR #1149 by @erezsh in https://github.com/lark-parser/lark/pull/1150

    • Old link for sppf is no longer valid. Point to web archive instead. by @patrickhuber in https://github.com/lark-parser/lark/pull/1159

    • Fix ForestToPyDotVisitor by @chanicpanic in https://github.com/lark-parser/lark/pull/1167

    • Close file-like objects to address ResourceWarning. by @shawnbrown in https://github.com/lark-parser/lark/pull/1183

    • Minor adjustments to PR #1179 by @erezsh in https://github.com/lark-parser/lark/pull/1189

    • Adjustments for PR #1152 by @erezsh in https://github.com/lark-parser/lark/pull/1191

    • Remove trailing whitespace by @bcr in https://github.com/lark-parser/lark/pull/1196

    New Contributors

    • @joseph-e-k made their first contribution in https://github.com/lark-parser/lark/pull/1123
    • @Erotemic made their first contribution in https://github.com/lark-parser/lark/pull/1128
    • @JonasLoos made their first contribution in https://github.com/lark-parser/lark/pull/1138
    • @camgunz made their first contribution in https://github.com/lark-parser/lark/pull/1137
    • @The-Compiler made their first contribution in https://github.com/lark-parser/lark/pull/1140
    • @henryiii made their first contribution in https://github.com/lark-parser/lark/pull/1148
    • @patrickhuber made their first contribution in https://github.com/lark-parser/lark/pull/1159
    • @shawnbrown made their first contribution in https://github.com/lark-parser/lark/pull/1183
    • @klauer made their first contribution in https://github.com/lark-parser/lark/pull/1179
    • @cquick01 made their first contribution in https://github.com/lark-parser/lark/pull/1190
    • @bcr made their first contribution in https://github.com/lark-parser/lark/pull/1196

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.2...1.1.3

    Source code(tar.gz)
    Source code(zip)
  • 1.1.2(Mar 1, 2022)

    Highlights

    • Tree instances now have a pretty print with the "rich" library, when doing rich.print(tree)
    • Bugfix for recursive regexes (with the "regex" library)
    • Refactors, cleanups, and better mypy support

    What's Changed

    • Clean up tree templates implementation to reduce mypy errors by @plannigan in https://github.com/lark-parser/lark/pull/1091
    • Remove redefinitions related to standalone parser by @plannigan in https://github.com/lark-parser/lark/pull/1115
    • Added Tree.rich() method to make Tree a Rich renderable by @erezsh in https://github.com/lark-parser/lark/pull/1117
    • Rename lexer_state->lexer_thread, and make a few adjustments for the benefit of Lark-Cython by @erezsh in https://github.com/lark-parser/lark/pull/1118
    • Use isinstance() checks in expcetions match_examples() by @plannigan in https://github.com/lark-parser/lark/pull/1065
    • change MAXREPEAT to int by @gruebel in https://github.com/lark-parser/lark/pull/1120
    • Tests: Small fixes by @erezsh in https://github.com/lark-parser/lark/pull/1122

    New Contributors

    • @gruebel made their first contribution in https://github.com/lark-parser/lark/pull/1120

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.1...1.1.2

    Source code(tar.gz)
    Source code(zip)
  • 1.1.1(Feb 8, 2022)

    What's Changed

    • Add test cases for tree templates by @plannigan in https://github.com/lark-parser/lark/pull/1096
    • 🖊 Fix Typo: plural "options" instead of singular "option" by @hf-kklein in https://github.com/lark-parser/lark/pull/1101
    • PEP 8: Minor Code Style Improvements by @hf-kklein in https://github.com/lark-parser/lark/pull/1102
    • Add Code Style Section to Contribution Guide by @hf-kklein in https://github.com/lark-parser/lark/pull/1107
    • Fix MyPy Warnings in lark/tools/init.py by @hf-kklein in https://github.com/lark-parser/lark/pull/1100
    • rename n to child when iterating over children by @hf-kklein in https://github.com/lark-parser/lark/pull/1110
    • specify ignored mypy error by using type: ignore[error] in lark/tree.py and lark/utils.py by @hf-kklein in https://github.com/lark-parser/lark/pull/1099
    • Add py.typed to package_data of lark package by @hf-kklein in https://github.com/lark-parser/lark/pull/1109
    • InteractiveParser: Added iter_parse() method, for easier instrumentation by @erezsh in https://github.com/lark-parser/lark/pull/1111

    New Contributors

    • @hf-kklein made their first contribution in https://github.com/lark-parser/lark/pull/1101

    Full Changelog: https://github.com/lark-parser/lark/compare/1.1.0...1.1.1

    Source code(tar.gz)
    Source code(zip)
  • 1.1.0(Jan 31, 2022)

    • Better support for typing and mypy. Includes generic tree typing (Thanks @plannigan!)

    • Improvements to python.lark (walrus operator, slashes in function params, and more). Now parses the entire Python 3.10 lib successfully

    • Bugfixes:

      • Transformer.__default__ not called in tree-less LALR mode (Issue #1029)
      • v_args failed to apply to class under standalone parser (Issue #1059)
      • maybe_placeholders incorrectly accumulated params when it encountered the | operator (Issue #1078)
    Source code(tar.gz)
    Source code(zip)
  • 1.0.0(Nov 15, 2021)

    Over the last few years, Lark has grown to become a comprehensive toolkit for parsing structured text.

    Today, I'm happy to announce the long anticipated version 1.0 of Lark, marking the API as stable.

    We've made quite a few breaking changes, in order to achieve congruous API with as little "gotchas" as possible. Upgrading to version 1.0 might require a few changes to your project.

    Breaking changes

    • Dropped Python 2 support! Lark now only supports Python 3.6 and up.

    • Install lark using pip install lark (instead of lark-parser ).

    • maybe_placeholders is now True by default.

    • Renamed TraditionalLexer to BasicLexer, and 'standard' lexer option to 'basic'.

    • Default priority is now 0, for both terminals and rules (used to be 1 for terminals).

    • Discard mechanism is now done by returning Discard, instead of raising it as an exception.

    • use_accepts in UnexpectedInput.match_examples() is now True by default.

    • v_args(meta=True) now gives meta as the first argument. i.e. (meta, children).

    Improvments

    • Better type annotations
    • Support for terminal priorities for dynamic Earley
    • Python3 grammar is now officially supported, and can be used via %import python (...)
    • New experimental feature: Tree Templates
    • Various bugfixes

    Acknowledgements

    Many thanks to all our contributors and donors, who made this release possible. Special thanks goes to -

    • @MegaIng, for innumerous features, bugfixes, and code-reviews.
    • @chanicpanic, for his immense and continual contributions to the Earley parser, and for helping with the v1.0 effort.
    • @erezsh, for being myself.
    Source code(tar.gz)
    Source code(zip)
  • 0.12.0(Aug 30, 2021)

    Announcements

    • This is likely to be the last major release that supports Python 2 !

    We are now working on a Python3.6+ only v1.0 branch, which will soon become the default. See the work in progress: https://github.com/lark-parser/lark/pull/925

    • We also have a new online IDE! Check it out here: https://lark-parser.github.io/ide

    • Lark can now generate standalone Javascript parsers! Check it out here: https://github.com/lark-parser/Lark.js (still in beta)

    Changes

    • Using rule repeat (~ syntax) is now much much faster for large numbers, thanks to @MegaIng

    • Bugfix for the propagate_positions option. Added option value propagate_positions='ignore_ws'.

    • Fixed reconstructor for when keep_all_tokens=True

    • Added merge_transformers (Thanks Robin!)

    • Many minor bugfixes, and improvements to code and docs

    Source code(tar.gz)
    Source code(zip)
  • 0.11.3(May 3, 2021)

    Cache

    • Lark now tracks changes in imported grammars (%import), and updates the cache if necessary
    • Added support for atomicwrites, for multiprocess caching and crash recovery

    InteractiveParser

    • Now an official interface (renamed from Puppet)
    • Added Lark.parse_interactive() for starting the parser in interactive mode

    Other

    • Added ast_utils, to assist in tranforming lark.Tree into a customized AST.

    • Better docs

    • Bugfixes

    Notification: Support for Python 2 is ending

    In the near future, Lark will drop support for Python 2. We will continue to develop for Python 3.6+ only, which will simplify the code and ease development.

    Old releases (including this one) will still work, of course, and should be stable enough to accompany the remaining Python 2 users into the sunset.

    If you have any objections, feel free to voice them here: https://github.com/lark-parser/lark/discussions/874

    Thanks for everyone who helped make Lark better!

    Source code(tar.gz)
    Source code(zip)
  • 0.11.2(Feb 16, 2021)

    New Features:

    • Better grammar re-use with the %override and %extend statements, which allow to rewrite and extend imported rules and tokens, similarly to class inheritance. (See this example: https://github.com/lark-parser/lark/blob/master/examples/advanced/extend_python.py)

    Improvements

    • Indenter now throws DedentError instead of AssertionError

    • Improved the Python3 grammar, now works with reconstructor. (See this example: https://github.com/lark-parser/lark/blob/master/examples/advanced/reconstruct_python.py)

    • Lots of refactoring for a better tomorrow.

    • rule/terminals names can now be in unicode. (thanks @julienmalard)

    • Better errors.

    • Better type hints.

    • lark.lark is now part of the standard library.

    • Earley:

      • Now works with match_examples()
      • Now supports a custom lexer
      • Better handling of ignored terminals
      • Faster forest visiting, and a few edge-case bugfixes (thanks @chanicpanic)

    Other

    • Lark now accepts funding as a member of Github Sponsors! See here: https://github.com/sponsors/lark-parser
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Nov 16, 2020)

    • LALR parser

      • The LALR parser now supports priority in rules, as a way to resolve collision errors

      • Improvements to the standalone tool, including more command-line options, like optional compression for the json data.

      • Improvements to the puppet error handling interface

      • Better error reporting on LALR collisions

    • Bugfixes in Earley

    Misc

    • Added support for syntax highlighting in Atom

    • Fixes and improvements for the cache option. cache=True now uses a temporary directory instead of working directory.

    • Lark can now be imported directly from a zip (See: ed5c8ec51c4c6e8bd0ac80caff6afcb90a97d218)

    • Added more terminals to the grammar library (available for %import).

    • Nearley tools now supports case insensitive strings

    • Deprecated some interfaces

    • Improvements to docs, stubs, and various bugfixes

    Thanks to @MegaIng for helping with Lark's maintenance, and to @ldbo, @chanicpanic, @michael-k, @ThatXliner and everyone else for their help and contributions.

    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Sep 21, 2020)

    • Complete overhaul of documentation. Now using sphinx to generate API docs from docstrings. (commit 0664cbd3d3c19e321cae8df044839e7baf7135af. Thank you @chsasank !)

      • Many improvements and additions to documentation
    • New and friendlier Earley SPPF interface! (commit 555b268eb26bcbfce64991ea7517338dee85a840. Thank you @chanicpanic !)

      • Added the ambiguity='forest' option. Added ForestTransformer and TreeForestTranformer.

      • Various Bugfixes to improve the handling of ambiguous results.

      • Read the docs here: https://lark-parser.readthedocs.io/en/latest/forest.html

    • New Vim syntax highlighting for Lark (https://github.com/lark-parser/vim-lark-syntax Thank you @omega16 !)

    • Lark now loads faster from cache (commit 7dc00179e63efa6e98d688bfba3265d382db79c4)

    • Terminals can now be composed of regexps and strings with different flags, if using Python 3.6+ (commit e6fc3c9b00306e3a8661210fcc93bf50479ee229)

    • Added support for parsing byte-strings, with the use_bytes flag (commit 9ee8428f3f6ad285ad93e2b62ec47d33fff54768).

    • UnexpectedToken exception now has the accepts attribute, which contains a list of terminals that would be accepted by the parser instead (in addition to the expects attribute, which is guided by the lexer and may include terminals that won't be accepted by the parser) (commit a7bcd0bc2d3cb96030d9e77523c0007e8034ce49)

    • Allow multiline regexes with the x flag (commit 9923987e94547ded8a17d7a03840c4cebce39188)

    • Lark no longer uses the default logger. Instead uses lark.LOGGER. (commit 7010f96825b5fbac79522d1b30689065df53dc8c)

    • Lark now notifies on unused terminals/rules through logging.debug.

    • Standalone generator now creates smaller files (without comments and docstrings). Also undergone various fixes. (commit bf2d9bf7b16cddb39f2e0ea3cefecc8de5269e2c)

    • Wheel distribution due to (somewhat) popular demand.

    • Lots of small bugfixes and improvements!

    Many thanks to @MegaIng for his continued work on many of these new features and fixes, and to everyone else who contributed to Lark and helped make it even better.

    Source code(tar.gz)
    Source code(zip)
  • 0.9.0(Jul 1, 2020)

    • Added error handling to LALR!

      • on_error option to Lark.parse(). Read here: https://lark-parser.readthedocs.io/en/latest/classes/#larkparse
      • Parser now comes with a puppet for advanced error handling. Read here: https://lark-parser.readthedocs.io/en/latest/classes/#parserpuppet
    • Support for better regexps with the regex module, when using Lark(..., regex=True) Read here: https://lark-parser.readthedocs.io/en/latest/classes/#using-unicode-character-classes-with-regex

    Source code(tar.gz)
    Source code(zip)
  • 0.8.9(Jun 16, 2020)

    The last two releases were wrong. I apologize.

    Hopefully that's the last of it, and we'll be back on track with periodic and accurate releases.

    Source code(tar.gz)
    Source code(zip)
  • 0.8.6(Jun 10, 2020)

    The main features for this release:

    • Grammar caching: It's now possible to cache the results of the LALR grammar analysis, for x2 to x3 faster loading. Use Lark(..., cache=True) or specify a file name. See here: https://lark-parser.readthedocs.io/en/latest/classes/

    • Grammar templates: Added support for grammar "functions" that expand in preprocessing. No docs yet, but see here for examples: https://github.com/lark-parser/lark/blob/master/tests/test_parser.py#L845

    • Lark online IDE: Technically not a feature, but it's possible to run Lark in the browser. Now we also have a simple IDE on github pages: https://lark-parser.github.io/lark/ide/app.html

    • Other changes:

      • Improved performance for large grammars

      • More debug prints when in debug mode

      • Better support for PyInstaller

      • Lots of bugfixes: mypy stubs, v_args, docs, and more.

    Source code(tar.gz)
    Source code(zip)
  • 0.8.3(Mar 28, 2020)

    • Added the g_regex_flags option, to allow applying flags to all terminals.
    • Fixed end_pos for Earley, when using propagate_positions
    • Fixes for mypy
    • Better docs
    Source code(tar.gz)
    Source code(zip)
  • 0.8.2(Mar 7, 2020)

    Changes in this version are:

    • Added type stubs for all public APIs, in order to support type checking and completion using MyPy (or others)

    • Added two new methods to the Lark class: Lark.save() and Lark.load(). Both methods pickle and unpickle (respectively) the class instance into/from file objects. These can be used to allow faster loading times. (future versions will implement an automatic caching feature)

    • The standalone parser is now MPL2, instead of GPL. The Mozilla Public License is much less restrictive, so this shouldn't affect anyone who's already using the standalone parser. But it should make it easier for other users to adopt it.

    Source code(tar.gz)
    Source code(zip)
  • 0.8.1(Jan 22, 2020)

  • 0.8.0(Jan 22, 2020)

    - Better LALR

    The biggest change to this release is a new LALR engine, that is capable of dealing with a few edge cases that the previous parser couldn't.

    This parser is supposed to be fully backwards-compatible with the previous one, but that is hard to verify!

    Thank you, @Raekye, for this great contribution to Lark!

    For more details, see issue #418

    - Transformers now visit tokens, as well as rules (an alternative to lexer_callbacks)

    Transformer now visit tokens, in addition to rules.

    Simply define a method with the correct name (uppercase, of course), and the transformer will visit your tokens before the rules that contain them.

    It's possible to disable this, for backwards compatibility, or for the slight performance gain.

    - Other Changes

    • Added visit_topdown methods to Visitor classes

    • Lark now allows line comments in its rule definitions

    • Better error messages

    • Improvements to documentation

    • Bugfixes

    • maybe_placeholders is now the default (backwards-incompatible)** (REVERTED in 0.8.1)

    Source code(tar.gz)
    Source code(zip)
  • 0.7.8(Nov 1, 2019)

    • Improved error messages for EOF in Earley, recursive terminals, UnexpectedToken

    • Bugfix for declared terminals, UnexpectedToken, unicode support in Python2,

    Source code(tar.gz)
    Source code(zip)
  • 0.7.7(Oct 3, 2019)

    • Fixed a bug in Earley where running it from different threads produced bad results

    • Improved error reporting when using LALR

    • Added 'edit_terminals' option, to allow programmatical manipulation of terminals, for example to support keywords in different languages.

    Note: This release skips 0.7.6, due to simple oversight on my part. Hopefully that shouldn't be a problem.

    Source code(tar.gz)
    Source code(zip)
  • 0.7.5(Sep 6, 2019)

    Lark transformers can now visit tokens as well. Use like this:

    class MyTransformer(Transformer):
        def TOKEN1(self, tok):
            return tok.upper()
    
        def rule_as_usual(self, children):
            return children
    
    MyTransformer(visit_tokens=True).transform(tree)
    

    Fixed a few regressions that I accidentally added to 0.7.4

    Source code(tar.gz)
    Source code(zip)
  • 0.7.4(Aug 29, 2019)

    • Fixed long-standing non-determinism and prioritization bugs in Earley.

    • Serialize tool now supports multiple start symbols

    • iter_subtrees, find_data and find_pred methods are now included in standalone parser

    • Bugfixes for the transformer interface, for the custom lexer, for grammar imports, and many more

    Source code(tar.gz)
    Source code(zip)
  • 0.7.3(Aug 14, 2019)

    • Added a new tool called Serialize, that stores Lark's internal state as JSON. That will allow for integration with other languages. I have already started such a project for Julia: https://github.com/erezsh/Lark_Julia (It's working, but still in early stages)

    • Minor bugfix regarding line-counting and the \s regex

    Source code(tar.gz)
    Source code(zip)
  • 0.7.2(Jul 30, 2019)

    New features:

    • Lark now allows you to specify the start symbol when calling Lark.parse() (requires pre-declaration of all possible start states, see the start option)

    • Negative priority now allows in rules and terminals (default value is still 1, may change in 0.8)

    Also includes many minor bugfixes, optimizations, and improvements to documentation

    Source code(tar.gz)
    Source code(zip)
  • 0.7.1(May 4, 2019)

    • Lark can now serialize its parsers, resulting in simplified stand-alone code.

    • Bugfix for v_args (Issue #350)

    • Improvements and bugfixes for importing rules from grammar files

    • Performance improvement for the reconstructor feature

    Source code(tar.gz)
    Source code(zip)
Owner
Lark - Parsing Library & Toolkit
Lark - Parsing Library & Toolkit
A string to hashtags module

A string to hashtags module

Fayas Noushad 4 Dec 01, 2021
Build capture utility for Linux

CX-BUILD Compilation Database alternative Build Prerequisite the CXBUILD uses linux system call trace utility called strace which was customized. So I

GLaDOS (G? L? Automatic Debug Operation System) 3 Nov 03, 2022
A simple dork generator written in python that outputs dorks with the domain extensions you enter

Dork Gen A simple dork generator written in python that outputs dorks with the domain extensions you enter in a ".txt file". Usage The code is pretty

Z3NToX 4 Oct 30, 2022
Deep Difference and search of any Python object/data.

DeepDiff v 5.6.0 DeepDiff Overview DeepDiff: Deep Difference of dictionaries, iterables, strings and other objects. It will recursively look for all t

Sep Dehpour 1.6k Jan 08, 2023
This project is a set of programs that I use to create a README.md file.

This project is a set of programs that I use to create a README.md file.

Tom Dörr 223 Dec 24, 2022
Macro recording and metaprogramming in Python

macro-kit is a package for efficient macro recording and metaprogramming in Python using abstract syntax tree (AST).

8 Aug 31, 2022
Parse URLs for DOIs, PubMed identifiers, PMC identifiers, arXiv identifiers, etc.

citation-url Parse URLs for DOIs, PubMed identifiers, PMC identifiers, arXiv identifiers, etc. This module has a single parse() function that takes in

Charles Tapley Hoyt 2 Feb 12, 2022
Analyze metadata of your Python project.

Analyze metadata of your Python projects Setup: Clone repo py-m venv venv (venv) pip install -r requirements.txt specify the folders which you want to

Pedro Monteiro de Carvalho e Silva Prado 1 Nov 10, 2021
cpp20.py is a Python script to compile C++20 code using modules.

cpp20.py is a Python script to compile C++20 code using modules. It browses the source files to determine their dependencies. Then, it compiles then in order using the correct flags.

Julien VERNAY 6 Aug 26, 2022
Dice Rolling Simulator using Python-random

Dice Rolling Simulator As the name of the program suggests, we will be imitating a rolling dice. This is one of the interesting python projects and wi

PyLaboratory 1 Feb 02, 2022
A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis.

Jupyter Pylinter A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis. If you find this tool us

Edmund Goodman 10 Oct 13, 2022
Basic loader is a small tool that will help you generating Cloudflare cookies

Basic Loader Cloudflare cookies loader This tool may help some people getting valide cloudflare cookies Installation 🔌 : pip install -r requirements.

IHateTomLrge 8 Mar 30, 2022
A simple package for handling variables in string.

A simple package for handling string variables. Welcome! This is a simple package for handling variables in string, You can add or remove variables wi

1 Dec 31, 2021
Obsidian tools - a Python package for analysing an Obsidian.md vault

obsidiantools is a Python package for getting structured metadata about your Obsidian.md notes and analysing your vault.

Mark Farragher 153 Jan 04, 2023
An OData v4 query parser and transpiler for Python

odata-query is a library that parses OData v4 filter strings, and can convert them to other forms such as Django Queries, SQLAlchemy Queries, or just plain SQL.

Gorilla 39 Jan 05, 2023
Small project to interact with python, C, HTML, JavaScript, PHP.

Micro Hidroponic Small project to interact with python, C, HTML, JavaScript, PHP. Table of Contents General Info Technologies Used Screenshots Usage P

Filipe Martins 1 Nov 10, 2021
Hot reloading for Python

Hot reloading for Python

Olivier Breuleux 769 Jan 03, 2023
Factoral Methods using two different method

Factoral-Methods-using-two-different-method Here, I am finding the factorial of a number by using two different method. The first method is by using f

Sachin Vinayak Dabhade 4 Sep 24, 2021
Auto-generate /etc/hosts for HackTheBox machines

Auto-generate /etc/hosts for HackTheBox machines Save yourself some tedium on getting started on a new machine by having your /etc/hosts ready to go.

3 Feb 16, 2022
Allows you to canibalize methods from classes effectively implementing trait-oriented programming

About This package enables code reuse in non-inheritance way from existing classes, effectively implementing traits-oriented programming pattern. Stor

1 Dec 13, 2021