Python virtual filesystem for SQLite to read from and write to S3

Overview

sqlite-s3vfs CircleCI Test Coverage

Python virtual filesystem for SQLite to read from and write to S3.

No locking is performed, so client code must ensure that writes do not overlap with other writes or reads. If multiple writes happen at the same time, the database will probably become corrupt and data be lost.

Inspired by phiresky's sql.js-httpvfs, dacort's Stack Overflow answer, and sqlite-s3-query.

How does it work?

sqlite-s3vfs stores the SQLite database in fixed-sized blocks, and each is stored as a separate object in S3. SQLite stores its data in fixed-size pages, and always writes exactly a page at a time. This virtual filesystem translates page reads and writes to block reads and writes. In the case of SQLite pages being the same size as blocks, which is the case by default, each page write results in exactly one block write.

Separate objects are required since S3 does not support the partial replace of an object; to change even 1 byte, it must be re-uploaded in full.

Installation

sqlite-s3vfs depends on APSW, which is not officially available on PyPI, but can be installed directly from GitHub.

pip install sqlite-s3vfs
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--sqlite --global-option=build --global-option=--enable-all-extensions

Usage

sqlite-s3vfs is an APSW virtual filesystem that requires boto3 for its communication with S3.

import apsw
import boto3
import sqlite_s3vfs

# A boto3 bucket resource
bucket = boto3.Session().resource('s3').Bucket('my-bucket')

# An S3VFS for that bucket
s3vfs = sqlite_s3vfs.S3VFS(bucket=bucket)

# sqlite-s3vfs stores many objects under this prefix
# Note that it's not typical to start a key prefix with '/'
key_prefix = 'my/path/cool.sqlite'

# Connect, insert data, and query
with apsw.Connection(key_prefix, vfs=s3vfs.name) as db:
    cursor = db.cursor()
    cursor.execute(f'''
        CREATE TABLE foo(x,y);
        INSERT INTO foo VALUES(1,2);
    ''')
    cursor.execute('SELECT * FROM foo;')
    print(cursor.fetchall())

See the APSW documentation for more examples.

Serializing (getting a regular SQLite file out of the VFS)

The bytes corresponding to a regular SQLite file can be extracted with the serialize_iter function, which returns an iterable,

for chunk in s3vfs.serialize_iter(key_prefix=key_prefix):
    print(chunk)

or with serialize_fileobj, which returns a non-seekable file-like object. This can be passed to Boto3's upload_fileobj method to upload a regular SQLite file to S3.

target_obj = boto3.Session().resource('s3').Bucket('my-target-bucket').Object('target/cool.sqlite')
target_obj.upload_fileobj(s3vfs.serialize_fileobj(key_prefix=key_prefix))

Deserializing (getting a regular SQLite file into the VFS)

# Any iterable that yields bytes can be used. In this example, bytes come from
# a regular SQLite file already in S3
source_obj = boto3.Session().resource('s3').Bucket('my-source-bucket').Object('source/cool.sqlite')
bytes_iter = source_obj.get()['Body'].iter_chunks()

s3vfs.deserialize_iter(key_prefix='my/path/cool.sqlite', bytes_iter=bytes_iter)

Tests

The tests require the dev dependencies and APSW to installed, and MinIO started

pip install -r requirements-dev.txt
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--all --global-option=build --global-option=--enable-all-extensions
./start-minio.sh

can be run with pytest

pytest

and finally Minio stopped

./stop-minio.sh
Owner
Department for International Trade
Department for International Trade
Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule.

simaster.ics Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule. Usage Getting the events.json file from SIMASTER O

Faiz Jazadi 8 Nov 02, 2022
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Samuel Ko 1 Feb 05, 2022
shred - A cross-platform library for securely deleting files beyond recovery.

shred Help the project financially: Donate: https://smartlegion.github.io/donate/ Yandex Money: https://yoomoney.ru/to/4100115206129186 PayPal: https:

4 Sep 04, 2021
Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

Jason R. Coombs 1k Dec 28, 2022
Remove [x]_ from StudIP zip Archives and archive_filelist.csv completely

This tool removes the "[x]_" at the beginning of StudIP zip Archives. It also deletes the "archive_filelist.csv" file

Kelke vl 1 Jan 19, 2022
CSV To VCF (Multiples en un archivo)

CSV To VCF Convierte archivo CSV a Tarjeta VCF (varias en una) How to use En main.py debes reemplazar CONTACTOS.csv por tu archivo csv, y debes respet

Jorge Ivaldi 2 Jan 12, 2022
Various converters to convert value sets from CSV to JSON, etc.

ValueSet Converters Tools for converting value sets in different formats. Such as converting extensional value sets in CSV format to JSON format able

Health Open Terminology Ecosystem 4 Sep 08, 2022
Dragon Age: Origins toolset to extract/build .erf files, patch language-specific .dlg files, and view the contents of files in the ERF or GFF format

DAOTools This is a set of tools for Dragon Age: Origins modding. It can patch the text lines of .dlg files, extract and build an .erf file, and view t

8 Dec 06, 2022
Uncompress DEFLATE streams in pure Python

stream-inflate Uncompress DEFLATE streams in pure Python. Installation pip install stream-inflate Usage from stream_inflate import stream_inflate impo

Michal Charemza 7 Oct 13, 2022
Extract the windows major and minor build numbers from an ISO file, and automatically sort the iso files.

WindowsBuildFromISO Extract the windows major and minor build numbers from an ISO file, and automatically sort the iso files. Features Parse multiple

Podalirius 9 Nov 09, 2022
A JupyterLab extension that allows opening files and directories with external desktop applications.

A JupyterLab extension that allows opening files and directories with external desktop applications.

martinRenou 0 Oct 14, 2021
Extract an archive file (zip file or tar file) stored on AWS S3

S3 Extract Extract an archive file (zip file or tar file) stored on AWS S3. Details Downloads archive from S3 into memory, then extract and re-upload

Evan 1 Dec 14, 2021
LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

Jose Rodriguez 6 Mar 05, 2022
Copy only text-like files from the folder

copy-only-text-like-files-from-folder-python copy only text-like files from the folder This project is for those who want to copy only source code or

1 May 17, 2022
Listreqs is a simple requirements.txt generator. It's an alternative to pipreqs

⚡ Listreqs Listreqs is a simple requirements.txt generator. It's an alternative to pipreqs. Where in Pipreqs, it helps you to Generate requirements.tx

Soumyadip Sarkar 4 Oct 15, 2021
Python library for reading and writing tabular data via streams.

tabulator-py A library for reading and writing tabular data (csv/xls/json/etc). [Important Notice] We have released Frictionless Framework. This frame

Frictionless Data 231 Dec 09, 2022
Nmap XML output to CSV and HTTP/HTTPS URLS.

xml-to-csv-url Convert NMAP's XML output to CSV file and print URL addresses for HTTP/HTTPS ports. NOTE: OS Version Parsing is not working properly ye

1 Dec 21, 2021
A simple file module for creating, editing and saving files.

A simple file module for creating, editing and saving files.

1 Nov 25, 2021
A tiny Python library for writing multi-channel TIFF stacks.

xtiff A tiny Python library for writing multi-channel TIFF stacks. The aim of this library is to provide an easy way to write multi-channel image stac

23 Dec 27, 2022
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Department for International Trade 206 Jan 02, 2023