A fast streaming JSON parser for Python that generates SAX-like events using yajl

Related tags

JSONjson-streamer
Overview

json-streamer Build Status

jsonstreamer provides a SAX-like push parser via the JSONStreamer class and a 'object' parser via the ObjectStreamer class which emits top level entities in any JSON object. Based on the fast c libary 'yajl'. Great for parsing streaming json over a network as it comes in or json objects that are too large to hold in memory altogether.

Dependencies

git clone [email protected]:lloyd/yajl.git
cd yajl
./configure && make install

Setup

pip3 install jsonstreamer

Also available at PyPi - https://pypi.python.org/pypi/jsonstreamer

Example

Shell

python -m jsonstreamer.jsonstreamer < some_file.json

Code

variables which contain the input we want to parse

json_object = """
    {
        "fruits":["apple","banana", "cherry"],
        "calories":[100,200,50]
    }
"""
json_array = """[1,2,true,[4,5],"a"]"""

a catch-all event listener function which prints the events

def _catch_all(event_name, *args):
    print('\t{} : {}'.format(event_name, args))

JSONStreamer Example

Event listeners get events in their parameters and must have appropriate signatures for receiving their specific event of interest.

JSONStreamer provides the following events:

  • doc_start
  • doc_end
  • object_start
  • object_end
  • array_start
  • array_end
  • key - this also carries the name of the key as a string param
  • value - this also carries the value as a string|int|float|boolean|None param
  • element - this also carries the value as a string|int|float|boolean|None param

Listener methods must have signatures that match

For example for events: doc_start, doc_end, object_start, object_end, array_start and array_end the listener must be as such, note no params required

def listener():
    pass

OR, if your listener is a class method, it can have an additional 'self' param as such

def listener(self):
    pass

For events: key, value, element listeners must also receive an additional payload and must be declared as such

def key_listener(key_string):
    pass

import and run jsonstreamer on 'json_object'

from jsonstreamer import JSONStreamer 

print("\nParsing the json object:")
streamer = JSONStreamer() 
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_object[0:10]) #note that partial input is possible
streamer.consume(json_object[10:])
streamer.close()

output

Parsing the json object:
    doc_start : ()
    object_start : ()
    key : ('fruits',)
    array_start : ()
    element : ('apple',)
    element : ('banana',)
    element : ('cherry',)
    array_end : ()
    key : ('calories',)
    array_start : ()
    element : (100,)
    element : (200,)
    element : (50,)
    array_end : ()
    object_end : ()
    doc_end : ()

run jsonstreamer on 'json_array'

print("\nParsing the json array:")
streamer = JSONStreamer() #can't reuse old object, make a fresh one
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_array[0:5])
streamer.consume(json_array[5:])
streamer.close()

output

Parsing the json array:
    doc_start : ()
    array_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    array_start : ()
    element : (4,)
    element : (5,)
    array_end : ()
    element : ('a',)
    array_end : ()
    doc_end : ()

ObjectStreamer Example

ObjectStreamer provides the following events:

  • object_stream_start
  • object_stream_end
  • array_stream_start
  • array_stream_end
  • pair
  • element

import and run ObjectStreamer on 'json_object'

from jsonstreamer import ObjectStreamer

print("\nParsing the json object:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_object[0:9])
object_streamer.consume(json_object[9:])
object_streamer.close()

output

Parsing the json object:
    object_stream_start : ()
    pair : (('fruits', ['apple', 'banana', 'cherry']),)
    pair : (('calories', [100, 200, 50]),)
    object_stream_end : ()

run the ObjectStreamer on the 'json_array'

print("\nParsing the json array:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_array[0:4])
object_streamer.consume(json_array[4:])
object_streamer.close()

output - note that the events are different for an array

Parsing the json array:
    array_stream_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    element : ([4, 5],)
    element : ('a',)
    array_stream_end : ()

Example on attaching listeners for various events

ob_streamer = ObjectStreamer()

def pair_listener(pair):
    print('Explicit listener: Key: {} - Value: {}'.format(pair[0],pair[1]))
    
ob_streamer.add_listener('pair', pair_listener) #same for JSONStreamer
ob_streamer.consume(json_object)

ob_streamer.remove_listener(pair_listener) #if you need to remove the listener explicitly

Even easier way of attaching listeners

class MyClass:
    
    def __init__(self):
        self._obj_streamer = ObjectStreamer() #same for JSONStreamer
        
        # this automatically finds listeners in this class and attaches them if they are named
        # using the following convention '_on_eventname'. Note method names in this class
        self._obj_streamer.auto_listen(self) 
    
    def _on_object_stream_start(self):
        print ('Root Object Started')
        
    def _on_pair(self, pair):
        print('Key: {} - Value: {}'.format(pair[0],pair[1]))
        
    def parse(self, data):
        self._obj_streamer.consume(data)
        
        
m = MyClass()
m.parse(json_object)

Troubleshooting

  • If you get an OSError('Yajl cannot be found.') Please ensure that libyajl is available in the relevant directory. For example, on mac(osx) /usr/local/lib should have a "libyajl.dylib" Linux -> libyajl.so Windows -> yajl.dll
Comments
  • Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Ubuntu 14.04 system and also verified it's presence and correct installation (refer: [1] & [2])

    Still, on running the command python3 -m jsonstreamer.jsonstreamer < test.json i.e. using it with jsonstreamer gives me the following :

      File "/usr/local/lib/python3.4/dist-packages/jsonstreamer/yajl/parse.py", line 29, in load_lib
        raise OSError('Yajl cannot be found.')
    OSError: Yajl cannot be found.
    

    Following up in https://github.com/lloyd/yajl/issues/190 it seems that there might be an issue in the parse.py file itself ? Maybe it's looking for yajl1 and not yajl2.

    Any pointers on this one ? Help appreciated.


    [1] Running gcc -lyajl yields:

    [email protected]:~$ gcc -lyajl
    ....
    /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
    (.text+0x20): undefined reference to `main'
    collect2: error: ld returned 1 exit status
    

    [2] And sudo ldconfig -p | grep yajl results in:

    [email protected]:~$ sudo ldconfig -p | grep yajl
        libyajl.so.2 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libyajl.so.2
    
    opened by jigyasa-grover 10
  • Ensure exception __str__ methods return strings

    Ensure exception __str__ methods return strings

    Hi there,

    Issues that throw JSONStreamerException classes are difficult to debug because there is no expectation that a str will be returned. This makes debugging a PITA.

    awesome_module.py", line 51, in map_step
        url + '\n' + str(e))
    TypeError: __str__ returned non-string (type bytes)
    
    opened by mach-kernel 3
  • Missing tests & tags

    Missing tests & tags

    PyPI has 1.3.6 , and no tests.

    GitHub only has a tag for v1.0.0 , so I cant use that.

    Could you tag v1.3.6 in GitHub, so I can use it to get tests, and finish https://build.opensuse.org/package/show/home:jayvdb:py-new/python-jsonstreamer after https://github.com/kashifrazzaqui/again/issues/8 is also fixed.

    opened by jayvdb 2
  • SyntaxError: invalid syntax

    SyntaxError: invalid syntax

    Traceback (most recent call last): File "test_jsonstreamer.py", line 3, in from jsonstreamer import JSONStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/init.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/jsonstreamer.py", line 12, in from again import events File "/usr/local/lib/python2.7/dist-packages/again/init.py", line 4, in from .events import EventSource, AsyncEventSource File "/usr/local/lib/python2.7/dist-packages/again/events.py", line 49 yield from each(*args, **kwargs) ^ SyntaxError: invalid syntax python --version Python 2.7.3

    opened by tuhaolam 2
  • Want to split a 22M JSON file into smaller files to track a problem

    Want to split a 22M JSON file into smaller files to track a problem

    I have a large JSON file that has an error somewhere. I want to split the up the JSON file into smaller files that are also JSON so that I can find out where the error is. Possible with your package ?

    opened by winash12 1
  • Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Windows 10 system and installed it as below:

    C:\Users\mianand\Downloads\lloyd-yajl-2.1.0-0-ga0ecdde\lloyd-yajl-66cb08c\build>nmake install

    Microsoft (R) Program Maintenance Utility Version 14.00.24210.0 Copyright (C) Microsoft Corporation. All rights reserved.

    [ 30%] Built target yajl_s [ 60%] Built target yajl [ 66%] Built target yajl_test [ 72%] Built target gen-extra-close [ 78%] Built target json_reformat [ 84%] Built target json_verify [ 90%] Built target parse_config [100%] Built target perftest Install the project... -- Install configuration: "Release" -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.dll -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl_s.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_parse.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_gen.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_common.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_tree.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_version.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/share/pkgconfig/yajl.pc -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_reformat.exe -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_verify.exe

    Still, on running the conda with python 3.6 gives me the following :

    from jsonstreamer import JSONStreamer Traceback (most recent call last): File "", line 1, in File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer_init_.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\jsonstreamer.py", line 14, in from .yajl.parse import YajlParser, YajlListener, YajlError File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 32, in yajl = load_lib() File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 29, in load_lib raise OSError('Yajl cannot be found.') OSError: Yajl cannot be found.

    Any pointers on this one ? Help appreciated.

    opened by mitendraanand 1
  • Not looking for yajl.dll when loading Yajl

    Not looking for yajl.dll when loading Yajl

    In the method load_lib(), there is never an attempt to load Yajl from yajl.dll, which is the name of Yajl on windows. I think it would be rather easy to add this, and make this package useful on Windows as well.

    opened by Groomtar 1
  • pypi version ahead of master branch

    pypi version ahead of master branch

    Please update the PyPI entry of json-streamer https://pypi.python.org/pypi/jsonstreamer/1.3.6 and consider linking there from the short text description here.

    opened by johnyf 1
  • outdated pypi package

    outdated pypi package

    Hi,

    Could you update the pypi package? As far as I see, there were some commits since the last pypi upload. Also, I think it is a bit confusing that there is one tagged release, which is 1.0, while pypi package has 1.3.6 version number, but both of them almost a year older than some important fixes, e.g. the exponential floats. (I can install the file on my own, but I think it would be nice to update the releases.)

    opened by dvolgyes 0
Releases(v1.3.8)
Owner
Kashif Razzaqui
https://medium.com/@kashifrazzaqui
Kashif Razzaqui
JSONManipulator is a Python package to retrieve, add, delete, change and store objects in JSON files.

JSONManipulator JSONManipulator is a Python package to retrieve, add, delete, change and store objects in JSON files. Installation Use the package man

Andrew Polukhin 1 Jan 07, 2022
Editor for json/standard python data

Editor for json/standard python data

1 Dec 07, 2021
Random JSON Key:Pair Json Generator

Random JSON Key:Value Pair Generator This simple script take an engish dictionary of words and and makes random key value pairs. The dictionary has ap

Chris Edwards 1 Oct 14, 2021
Ibmi-json-beautify - Beautify json string with python

Ibmi-json-beautify - Beautify json string with python

Jefferson Vaughn 3 Feb 02, 2022
Atom, RSS and JSON feed parser for Python 3

Atoma Atom, RSS and JSON feed parser for Python 3. Quickstart Install Atoma with pip: pip install atoma

Nicolas Le Manchet 95 Nov 28, 2022
Make JSON serialization easier

Make JSON serialization easier

4 Jun 30, 2022
Generate code from JSON schema files

json-schema-codegen Generate code from JSON schema files. Table of contents Introduction Currently supported languages Requirements Installation Usage

Daniele Esposti 30 Dec 23, 2022
A daily updated JSON dataset of all the Open House London venues, events, and metadata

Open House London listings data All of it. Automatically scraped hourly with updates committed to git, autogenerated per-day CSV's, and autogenerated

Jonty Wareing 4 Jan 01, 2022
Easy JSON wrapper modfied to wrok with suggestions

🈷️ Suggester Easy JSON wrapper modfied to wrok with suggestions. This was made for small discord bots, for big bots you should not use this. 📥 Usage

RGBCube 1 Jan 22, 2022
With the help of json txt you can use your txt file as a json file in a very simple way

json txt With the help of json txt you can use your txt file as a json file in a very simple way Dependencies re filemod pip install filemod Installat

Kshitij 1 Dec 14, 2022
Convert Wii UI formats to JSON5 and vice versa

Convert Wii UI formats to JSON5 and vice versa

Pablo Stebler 11 Aug 28, 2022
A tools to find the path of a specific key in deep nested JSON.

如何快速从深层嵌套 JSON 中找到特定的 Key #公众号 在爬虫开发的过程中,我们经常遇到一些 Ajax 加载的接口会返回 JSON 数据。

kingname 56 Dec 13, 2022
JSON for Modern C++ Release Scripts

JSON for Modern C++ Release Scripts Preparations Install required tools: make install_requirements. Add required keys to config.json (apparently not c

Niels Lohmann 4 Sep 19, 2022
Json utils is a python module that you can use when working with json files.

Json-utils Json utils is a python module that you can use when working with json files. it comes packed with a lot of featrues Features Converting jso

Advik 4 Apr 24, 2022
A JSON utility library for Python featuring Django-style queries and mutations.

JSON Enhanced JSON Enhanced implements fast and pythonic queries and mutations for JSON objects. Installation You can install json-enhanced with pip:

Collisio Technologies 4 Aug 22, 2022
A Cobalt Strike Scanner that retrieves detected Team Server beacons into a JSON object

melting-cobalt 👀 A tool to hunt/mine for Cobalt Strike beacons and "reduce" their beacon configuration for later indexing. Hunts can either be expans

Splunk GitHub 150 Nov 23, 2022
Fileson - JSON File database tools

Fileson is a set of Python scripts to create JSON file databases

Joonas Pihlajamaa 2 Feb 02, 2022
Convert your subscriptions csv file into a valid json for Newpipe!

Newpipe-CSV-Fixer Convert your Google subscriptions CSV file into a valid JSON for Newpipe! Thanks to nikcorg for sharing how to convert the CSV into

Juanjo 44 Dec 29, 2022
JSON Schema validation library

jsonschema A JSON Schema validator implementation. It compiles schema into a validation tree to have validation as fast as possible. Supported drafts:

Dmitry Dygalo 309 Jan 01, 2023
JsonParser - Parsing the Json file by provide the node name

Json Parser This project is based on Parsing the json and dumping it to CSV via

Ananta R. Pant 3 Aug 08, 2022