A Persistent Embedded Graph Database for Python

Overview

PyPI version Python 3.8 Build Status License: MIT codecov

Cog - Embedded Graph Database for Python

ScreenShot

cogdb.io

New release: 2.0.5!

ScreenShot

Installing Cog

pip install cogdb

Cog is a persistent embedded graph database implemented purely in Python. Torque is Cog's graph query language. Cog also provides a low level API to its fast persistent key-value store.

Cog is ideal for python applications that does not require a full featured database. Cog can easily be used as a library from within a Python application. Cog be used interactively in an IPython environment like Jupyter notebooks.

Cog can load a graph stored as N-Triples, a serialization format for RDF. See Wikipedia, W3C for details.

In short, an N-Triple is sequence of subject, predicate and object in a single line that defines a connection between two vertices:

vertex vertex

Learn more about RDF triples

Creating a graph

from cog.torque import Graph
g = Graph("people")
g.put("alice","follows","bob")
g.put("bob","follows","fred")
g.put("bob","status","cool_person")
g.put("charlie","follows","bob")
g.put("charlie","follows","dani")
g.put("dani","follows","bob")
g.put("dani","follows","greg")
g.put("dani","status","cool_person")
g.put("emily","follows","fred")
g.put("fred","follows","greg")
g.put("greg","status","cool_person")

Create a graph from CSV file

from cog.torque import Graph
g = Graph("books")
g.load_csv('test/test-data/books.csv', "isbn")

Torque query examples

Scan vertices

g.scan(3)

{'result': [{'id': 'bob'}, {'id': 'emily'}, {'id': 'charlie'}]}

Scan edges

g.scan(3, 'e')

{'result': [{'id': 'status'}, {'id': 'follows'}]}

Starting from a vertex, follow all outgoing edges and list all vertices

g.v("bob").out().all()

{'result': [{'id': 'cool_person'}, {'id': 'fred'}]}

Everyone with status 'cool_person'

g.v().has("status", 'cool_person').all()

{'result': [{'id': 'bob'}, {'id': 'dani'}, {'id': 'greg'}]}

Include edges in the results

g.v().has("follows", "fred").inc().all('e')

{'result': [{'id': 'dani', 'edges': ['follows']}, {'id': 'charlie', 'edges': ['follows']}, {'id': 'alice', 'edges': ['follows']}]}

starting from a vertex, follow all outgoing edges and count vertices

g.v("bob").out().count()

'2'

See who is following who and create a view of that network

Note: render() is supported only in IPython environment like Jupyter notebook otherwise use view(..).url.

By tagging the vertices 'from' and 'to', the resulting graph can be visualized.

g.v().tag("from").out("follows").tag("to").view("follows").render()

ScreenShot

g.v().tag("from").out("follows").tag("to").view("follows").url

file:///Path/to/your/cog_home/views/follows.html

List all views

g.lsv()

['follows']

Load existing visualization

g.getv('follows').render()

starting from a vertex, follow all out going edges and tag them

g.v("bob").out().tag("from").out().tag("to").all()

{'result': [{'from': 'fred', 'id': 'greg', 'to': 'greg'}]}

starting from a vertex, follow all incoming edges and list all vertices

g.v("bob").inc().all()

{'result': [{'id': 'alice'}, {'id': 'charlie'}, {'id': 'dani'}]}

Loading data from a file

Triples file

from cog.torque import Graph
g = Graph(graph_name="people")
g.load_triples("/path/to/triples.nt", "people")

Edgelist file

from cog.torque import Graph
g = Graph(graph_name="people")
g.load_edgelist("/path/to/edgelist", "people")

Low level key-value store API:

Every record inserted into Cog's key-value store is directly persisted on to disk. It stores and retrieves data based on hash values of the keys, it can perform fast look ups (O(1) avg) and fast (O(1) avg) inserts.

from cog.database import Cog

cogdb = Cog('path/to/dbdir')

# create a namespace
cogdb.create_namespace("my_namespace")

# create new table
cogdb.create_table("new_db", "my_namespace")

# put some data
cogdb.put(('key','val'))

# retrieve data 
cogdb.get('key')

# put some more data
cogdb.put(('key2','val2'))

# scan
scanner = cogdb.scanner()
for r in scanner:
    print r
    
# delete data
cogdb.delete('key1')

Config

If no config is provided when creating a Cog instance, it will use the defaults:

COG_PATH_PREFIX = "/tmp"
COG_HOME = "cog-test"

Example updating config

from cog import config
config.COG_HOME = "app1_home"
data = ('user_data:id=1','{"firstname":"Hari","lastname":"seldon"}')
cog = Cog(config)
cog.create_namespace("test")
cog.create_table("db_test", "test")
cog.put(data)
scanner = cog.scanner()
for r in scanner:
    print r

Performance

Put and Get calls performance:

put ops/second: 18968

get ops/second: 39113

The perf test script is included with the tests: insert_bench.py and get_bench.py

INDEX_LOAD_FACTOR on an index determines when a new index file is created, Cog uses linear probing to resolve index collisions. Higher INDEX_LOAD_FACTOR leads slightly lower performance on operations on index files that have reached the target load.

Put and Get performance profile

Put Perf Get Perf

Comments
  • Installation Problem with pip

    Installation Problem with pip

    I get the following message when I try to install with pip. Any ideas for troubleshooting?

    bash-3.2$ pip install cogdb
    Collecting cogdb
    Could not find a version that satisfies the requirement cogdb (from versions: )
    No matching distribution found for cogdb
    
    good first issue 
    opened by npmontgomery 6
  • Can't use it...

    Can't use it...

    Toms-MacBook-Pro:graphtest tomsmith$ pip3 install cogdb Collecting cogdb Using cached cogdb-0.1.2.tar.gz (9.0 kB) Building wheels for collected packages: cogdb Building wheel for cogdb (setup.py) ... done Created wheel for cogdb: filename=cogdb-0.1.2-py3-none-any.whl size=8679 sha256=4081944f7748d8b81a35d568a923d0ef3b64629ad2182c6f8f545faf7f4d1d04 Stored in directory: /Users/tomsmith/Library/Caches/pip/wheels/8b/31/32/daa3d657e6c6bf56132ccca6081671b85dd37a302302e3cefc Successfully built cogdb Installing collected packages: cogdb Successfully installed cogdb-0.1.2 WARNING: You are using pip version 20.3.1; however, version 20.3.3 is available. You should consider upgrading via the '/usr/local/opt/[email protected]/bin/python3.9 -m pip install --upgrade pip' command. Toms-MacBook-Pro:graphtest tomsmith$ python Python 3.9.1 (default, Dec 17 2020, 03:41:37) [Clang 12.0.0 (clang-1200.0.32.27)] on darwin Type "help", "copyright", "credits" or "license" for more information.

    from cog.torque import Graph Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.9/site-packages/cog/torque.py", line 1, in from cog.database import Cog File "/usr/local/lib/python3.9/site-packages/cog/database.py", line 18, in from core import Table ModuleNotFoundError: No module named 'core'

    opened by everythingability 3
  • Attribute datatypes

    Attribute datatypes

    This is an awesome library.

    I just started digging into this, but are there plans to support additional types (other than strings)? It would allow us to extend functionality with more extensive querying, such as getting all nodes with a score attribute greater than x, etc.

    Thanks!

    opened by madhu-kt 1
  • Help with example

    Help with example

    Maybe I am missing it, but how do I get all edges connecting two vertices? Is it possible? I can't figure it out without filtering edges after the query.

    opened by arnicas 1
  • Web View

    Web View

    I just discovered cogdb and so far it looks like it has potential. I am still looking through everything, but is there a way to automatically set cog graph view width/height other than manually editing the HTML file? Also when showing the full graph for the books.csv it gets rendered slow in web browser and takes awhile to load, I am using Firefox. Is there any way to speed this up? Running the javascript source locally didn't help, thought maybe it had to do cloud-flare but maybe trying on GPU would help.

    opened by fastlaner 1
  • Python3 support

    Python3 support

    Hello! I took a first stab at porting this library to python3 (#13). I've gotten it to a state where the tests pass without warnings (and still pass in python 2.7), but I haven't validated it on an existing codebase, since I was checking this out for a new project :)

    opened by uniphil 1
  • Exception during an import

    Exception during an import

    Python 3.7 when tried to import Graph, throws exception

    `>>> from cog.core import Table

    from cog.torque import Graph Traceback (most recent call last): File "", line 1, in File "/Users/vulogov/Library/Python/3.7/lib/python/site-packages/cog/torque.py", line 1, in from cog.database import Cog File "/Users/vulogov/Library/Python/3.7/lib/python/site-packages/cog/database.py", line 18, in from core import Table ModuleNotFoundError: No module named 'core'`

    opened by vulogov 1
  • column abstraction

    column abstraction

    At database level, implement column abstraction. Column should be stored as value in the kv store. DB Object parses columns and returns column values.

    A straight forward implementation would be simply store a py dict or JSON in values. Each property being a column name.

    Columns will eventually be used for "select" queries.

    Use dumps to serialize json: https://docs.python.org/2/library/json.html and then index it.

    opened by arun1729 1
  • R 3.0.0

    R 3.0.0

    • New and improved indexing and storage.
    • Introduced caching.
    • Query performance improvements.
    • Bug fixes.
    • Note: Data stored in CogDB 2.xx is not compatible with 3.xx
    opened by arun1729 0
  • Update render method in Torque to include view height, width parameters

    Update render method in Torque to include view height, width parameters

    Update the render method in Torque to include two arguments: height and width which should be to used set the iframe height and width in the html template. The current values should be the default values.

    Render: https://github.com/arun1729/cog/blob/242dbc9bb188263158223e79bc9da339e03da111/cog/torque.py#L318

    help wanted good first issue 
    opened by arun1729 0
  • R 2.0.0 alpha

    R 2.0.0 alpha

    Since most graph operations are read heavy, there is a need for faster read response times. This PR contains the following:

    • Performance optimization for core - 100x index look up and read speed up.
    • Design change to use a larger index file, this optimization would enable storing and querying of larger graphs. Trade off is storage and memory, which is reasonable in this case.
    opened by arun1729 0
  • Save graph to file

    Save graph to file

    Currentlt cog only provides APIs for loading graph from file like load_csv, load_triples, load_edgelist, can you provide corresponding APIs for saving, like save_csv, save_triples, save_edgelist?

    enhancement 
    opened by dalek-who 2
  • render() throws error

    render() throws error

    Environment: Python 3.8.9 (default, Apr 21 2021, 23:14:29) [GCC 10.2.0] on cygwin cogdb==2.0.5 xxhash==2.0.0

    Issue: When running your introductory example from the main page of https://cogdb.io/ (in a jupyter notebook), the following error was thrown:


    OSError Traceback (most recent call last) in 14 g.put("greg","status","cool_person") 15 ---> 16 g.v().tag("from").out("follows").tag("to").view("follows").render()

    /usr/local/lib/python3.8/site-packages/cog/torque.py in render(self) 324 iframe_html = r""" """.format(self.html) 325 from IPython.core.display import display, HTML --> 326 display(HTML(iframe_html)) 327 328 def persist(self):

    /usr/lib/python3.8/site-packages/IPython/core/display.py in init(self, data, url, filename, metadata) 716 if warn(): 717 warnings.warn("Consider using IPython.display.IFrame instead") --> 718 super(HTML, self).init(data=data, url=url, filename=filename, metadata=metadata) 719 720 def repr_html(self):

    /usr/lib/python3.8/site-packages/IPython/core/display.py in init(self, data, url, filename, metadata) 628 self.metadata = {} 629 --> 630 self.reload() 631 self._check_data() 632

    /usr/lib/python3.8/site-packages/IPython/core/display.py in reload(self) 653 """Reload the raw data from file or URL.""" 654 if self.filename is not None: --> 655 with open(self.filename, self._read_flags) as f: 656 self.data = f.read() 657 elif self.url is not None:

    OSError: [Errno 91] File name too long: ' <iframe srcdoc='\n\n\n \n Cog Graph\n \n\n\n <script\n type="text/javascript"\n src="https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.min.js"\n >\n \n \n \n

    \n\n \n \n\n\n' width="700" height="700"> '

    Thanks

    opened by mgierdal 5
Releases(3.0.2)
LaikaDB, banco de dados para projetos simples.

LaikaDB LaikaDB é um banco de dados noSQL para uso local e simples, onde você pode realizar gravações e leituras de forma eficiente e simples. Todos o

Jaedson Silva 0 Jun 24, 2022
Tiny local JSON database for Python.

Pylowdb Simple to use local JSON database 🦉 # This is pure python, not specific to pylowdb ;) db.data['posts'] = ({ 'id': 1, 'title': 'pylowdb is awe

Hussein Sarea 3 Jan 26, 2022
LightDB is a lightweight JSON Database for Python

LightDB What is this? LightDB is a lightweight JSON Database for Python that allows you to quickly and easily write data to a file Installing pip3 ins

Stanislaw 14 Oct 01, 2022
Simple json type database for python3

What it is? Simple json type database for python3! What about speed? The speed is great! All data is stored in RAM until saved. How to install? pip in

3 Feb 11, 2022
ClutterDB - Extremely simple JSON database made for infrequent changes which behaves like a dict

extremely simple JSON database made for infrequent changes which behaves like a dict this was made for ClutterBot

Clutter Development 1 Jan 12, 2022
Tools for analyzing Git history using SQLite

git-history Tools for analyzing Git history using SQLite Installation Install this tool using pip: $ pip install git-history Usage This tool can be r

Simon Willison 128 Jan 02, 2023
Makes google's political ad database actually useful

Making Google's political ad transparency library suck less This is a series of scripts that takes Google's political ad transparency data and makes t

The Guardian 7 Apr 28, 2022
Python function to query SQLite files stored on S3

sqlite-s3-query Python function to query a SQLite file stored on S3. It uses multiple HTTP range requests per query to avoid downloading the entire fi

Michal Charemza 189 Dec 27, 2022
This project is related to a No-SQL database, whose data are referred to autoctone botanic species

This project is related to a No-SQL database, whose data are referred to autoctone botanic species. The final goal is creating a function that performs the estimation of the ornamental value, given t

Amatofrancesco99 2 Mar 08, 2022
Python function to extract all the rows from a SQLite database file while iterating over its bytes, such as while downloading it

Python function to extract all the rows from a SQLite database file while iterating over its bytes, such as while downloading it

Department for International Trade 16 Nov 09, 2022
Decentralised graph database management system

Decentralised graph database management system To get started clone the repo, and run the command below. python3 database.py Now, create a new termina

Omkar Patil 2 Apr 18, 2022
A Simple , ☁️ Lightweight , 💪 Efficent JSON based database for 🐍 Python.

A Simple, Lightweight, Efficent JSON based DataBase for Python The current stable version is v1.6.1 pip install pysondb==1.6.1 Support the project her

PysonDB 282 Jan 07, 2023
HTTP graph database built in Python 3

KiwiDB HTTP graph database built in Python 3. Reference Format References are strings in the format: { JanCraft 1 Dec 17, 2021

Monty, Mongo tinified. MongoDB implemented in Python !

Monty, Mongo tinified. MongoDB implemented in Python ! Was inspired by TinyDB and it's extension TinyMongo

David Lai 523 Jan 02, 2023
Youtube Kanalinda tanittigim ve Programladigim SQLite3 ile calisan Kütüphane Programi

SQLite3 Kütüphane Uygulamasi SQLite3 ile calisan Kütüphane Arayüzü Yükleme Yerel veritabani olusacaktir. Yaptiginiz islemler kaybolmaz! Temel Gereksin

Mikael Pikulski 6 Aug 13, 2022
Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries.

Mini-SQL-Engine Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries. The query engine supports Project, A

Prashant Raj 1 Dec 03, 2021
A Persistent Embedded Graph Database for Python

Cog - Embedded Graph Database for Python cogdb.io New release: 2.0.5! Installing Cog pip install cogdb Cog is a persistent embedded graph database im

Arun Mahendra 214 Dec 30, 2022
Postgres full text search options (tsearch, trigram) examples

postgres-full-text-search Postgres full text search options (tsearch, trigram) examples. Create DB CREATE DATABASE ftdb; To feed db with an example

Jarosław Orzeł 97 Dec 30, 2022
Mongita is to MongoDB as SQLite is to SQL

Mongita is a lightweight embedded document database that implements a commonly-used subset of the MongoDB/PyMongo interface. Mongita differs from MongoDB in that instead of being a server, Mongita is

Scott Rogowski 809 Jan 07, 2023
TinyDB is a lightweight document oriented database optimized for your happiness :)

Quick Links Example Code Supported Python Versions Documentation Changelog Extensions Contributing Introduction TinyDB is a lightweight document orien

Markus Siemens 5.6k Dec 30, 2022