Python function to extract all the rows from a SQLite database file while iterating over its bytes, such as while downloading it

Related tags

Databasestream-sqlite
Overview

stream-sqlite CircleCI Test Coverage

Python function to extract all the rows from a SQLite database file concurrently with iterating over its bytes, without needing random access to the file.

Note that the SQLite file format is not designed to be streamed; the data is arranged in pages of a fixed number of bytes, and the information to identify a page often comes after the page in the stream (sometimes a great deal after). Therefore, pages are buffered in memory until they can be identified.

Installation

pip install stream-sqlite

Usage

from stream_sqlite import stream_sqlite
import httpx

# Iterable that yields the bytes of a sqlite file
def sqlite_bytes():
    with httpx.stream('GET', 'http://www.parlgov.org/static/stable/2020/parlgov-stable.db') as r:
        yield from r.iter_bytes(chunk_size=65_536)

# If there is a single table in the file, there will be exactly one iteration of the outer loop.
# If there are multiple tables, each can appear multiple times.
for table_name, pragma_table_info, rows in stream_sqlite(sqlite_bytes(), max_buffer_size=1_048_576):
    for row in rows:
        print(row)

Recommendations

If you have control over the SQLite file, VACUUM; should be run on it before streaming. In addition to minimising the size of the file, VACUUM; arranges the pages in a way that often reduces the buffering required when streaming. This is especially true if it was the target of intermingled INSERTs and/or DELETEs over multiple tables.

Also, indexes are not used for extracting the rows while streaming. If streaming is the only use case of the SQLite file, and you have control over it, indexes should be removed, and VACUUM; then run.

Some tests suggest that if the file is written in autovacuum mode, i.e. PRAGMA auto_vacuum = FULL;, then the pages are arranged in a way that reduces the buffering required when streaming. Your mileage may vary.

Owner
Department for International Trade
Department for International Trade
A Simple , ☁️ Lightweight , 💪 Efficent JSON based database for 🐍 Python.

A Simple, Lightweight, Efficent JSON based DataBase for Python The current stable version is v1.6.1 pip install pysondb==1.6.1 Support the project her

PysonDB 282 Jan 07, 2023
Simple json type database for python3

What it is? Simple json type database for python3! What about speed? The speed is great! All data is stored in RAM until saved. How to install? pip in

3 Feb 11, 2022
Monty, Mongo tinified. MongoDB implemented in Python !

Monty, Mongo tinified. MongoDB implemented in Python ! Was inspired by TinyDB and it's extension TinyMongo

David Lai 523 Jan 02, 2023
ChaozzDBPy - A python implementation based on the original ChaozzDB from Chaozznl with some new features

ChaozzDBPy About ChaozzDBPy is a python implementation based on the original Cha

Igor Iglesias 1 May 25, 2022
Tiny local JSON database for Python.

Pylowdb Simple to use local JSON database 🦉 # This is pure python, not specific to pylowdb ;) db.data['posts'] = ({ 'id': 1, 'title': 'pylowdb is awe

Hussein Sarea 3 Jan 26, 2022
Code for a db backend that relies on bash tools (grep, cat, echo, etc)

Simple-nosql-db is a python backend for a database that relies on unix tools such as cat, echo and grep. Funny enough I got the idea from this discuss

Sebastian Alonso 10 Aug 13, 2019
A simple GUI that interacts with a database to keep track of a collection of US coins.

CoinCollectorGUI A simple gui designed to interact with a database. The goal of the database is to make keeping track of collected coins simple. The G

Builder212 1 Nov 09, 2021
Elara DB is an easy to use, lightweight NoSQL database that can also be used as a fast in-memory cache.

Elara DB is an easy to use, lightweight NoSQL database written for python that can also be used as a fast in-memory cache for JSON-serializable data. Includes various methods and features to manipula

Saurabh Pujari 101 Jan 04, 2023
Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries.

Mini-SQL-Engine Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries. The query engine supports Project, A

Prashant Raj 1 Dec 03, 2021
Codeqlcompile - 自动反编译闭源应用,创建codeql数据库

codeql_compile 自动反编译闭源应用,创建codeql数据库 准备 首先下载ecj.jar和idea提供反编译的java-decompiler.ja

236 Jan 05, 2023
Postgres full text search options (tsearch, trigram) examples

postgres-full-text-search Postgres full text search options (tsearch, trigram) examples. Create DB CREATE DATABASE ftdb; To feed db with an example

Jarosław Orzeł 97 Dec 30, 2022
Simple embedded in memory json database

dbj dbj is a simple embedded in memory json database. It is easy to use, fast and has a simple query language. The code is fully documented, tested an

Pedro Gonring 25 Aug 12, 2022
Simpledb-py: Simple JSON database

Simpledb-py: Simple JSON database

тейлс 2 Feb 09, 2022
Makes google's political ad database actually useful

Making Google's political ad transparency library suck less This is a series of scripts that takes Google's political ad transparency data and makes t

The Guardian 7 Apr 28, 2022
TinyDB is a lightweight document oriented database optimized for your happiness :)

Quick Links Example Code Supported Python Versions Documentation Changelog Extensions Contributing Introduction TinyDB is a lightweight document orien

Markus Siemens 5.6k Dec 30, 2022
Youtube Kanalinda tanittigim ve Programladigim SQLite3 ile calisan Kütüphane Programi

SQLite3 Kütüphane Uygulamasi SQLite3 ile calisan Kütüphane Arayüzü Yükleme Yerel veritabani olusacaktir. Yaptiginiz islemler kaybolmaz! Temel Gereksin

Mikael Pikulski 6 Aug 13, 2022
Mongita is to MongoDB as SQLite is to SQL

Mongita is a lightweight embedded document database that implements a commonly-used subset of the MongoDB/PyMongo interface. Mongita differs from MongoDB in that instead of being a server, Mongita is

Scott Rogowski 809 Jan 07, 2023
MyReplitDB - the most simplistic and easiest wrapper to use for replit's database system.

MyReplitDB is the most simplistic and easiest wrapper to use for replit's database system. Installing You can install it from the PyPI Or y

kayle 4 Jul 03, 2022
AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database.

AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database. This database is completely free* 💸

Oren Leung 42 Nov 25, 2022
A Modular MWDB Utility to Collect Fresh Malware Samples

MWDB Feeds A Modular MWDB Utility to Collect Fresh Malware Samples This project is FREE as in FREE 🍺 , use it commercially, privately or however you

c3rb3ru5 32 Jul 07, 2022