Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Overview

stream-unzip CircleCI Test Coverage

Python function to stream unzip all the files in a ZIP archive, without loading the entire ZIP file into memory or any of its uncompressed files.

While the ZIP format does have its main directory at the end, each compressed file in the archive can be prefixed with a header that contains its name, compressed size, and uncompressed size: this is what makes streaming decompression of ZIP files possible.

Unfortunately not all ZIP files have this: some have their compressed and uncompressed sizes after the file data in the stream. In this case a ValueError will be raised.

Installation

pip install stream-unzip

Usage

A single function is exposed, stream_unzip, that takes a single argument: an iterable that should yield the bytes of a ZIP file. It returns an iterable, where each yielded item is a tuple of the file name, file size, and another iterable itself yielding the unzipped bytes of that file.

from stream_unzip import stream_unzip
import httpx

def zipped_chunks():
    # Any iterable that yields a zip file
    with httpx.stream('GET', 'https://www.example.com/my.zip') as r:
        yield from r.iter_bytes()

for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
    for chunk in unzipped_chunks:
        print(chunk)

The file name and file size are extracted as reported from the file. If you don't trust the creator of the ZIP file, these should be treated as untrusted input.

Owner
Department for International Trade
Department for International Trade
A simple tool to find and replace all the matches of a regular expression in file(s).

FindREp A simple tool to find and replace all the matches of a regular expression in file(s). You can either select the file(s) directly or select a f

Biraj 5 Oct 18, 2022
A simple file sharing tool written in python

Share it A simple file sharing tool written in python Installation If you are using Windows os you can directly Run .exe file -- download If you are

Sachit Yadav 7 Dec 16, 2022
Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

3 Feb 09, 2022
A python script generate password files in plain text

KeePass (or any desktop pw manager?) Helper WARNING: This script will generate password files in plain text. ITS NOT SECURE. I needed help remembering

Eric Thomas 1 Nov 21, 2021
A tiny Configuration File Parser for Python Projects

A tiny Configuration File Parser for Python Projects. Currently working on JSON Config Files only.

Tanmoy Sen Gupta 1 Feb 12, 2022
PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

phithon 53 Nov 07, 2022
Yadl - it is a simple library for working with both dotenv files and environment variables.

Yadl Yadl - it is a simple library for working with both dotenv files and environment variables. Features Validation of whitespaces. Validation of num

Ivan Kapranov 3 Oct 19, 2021
Media file renamer and organizion tool

mnamer mnamer (media renamer) is an intelligent and highly configurable media organization utility. It parses media filenames for metadata, searches t

Jessy Williams 533 Dec 29, 2022
BREP : Binary Search in plaintext and gzip files

BREP : Binary Search in plaintext and gzip files Search large files in O(log n) time using binary search. We support plaintext and Gzipped files. Benc

Arnaud de Saint Meloir 5 Dec 24, 2021
Find potentially sensitive files

find_files Find potentially sensitive files This script searchs for potentially sensitive files based off of file name or string contained in the file

4 Aug 20, 2022
This is just a GUI that detects your file's real extension using the filetype module.

Real-file.extnsn This is just a GUI that detects your file's real extension using the filetype module. Requirements Python 3.4 and above filetype modu

1 Aug 08, 2021
fast change directory with python and ruby

fcdir fast change directory with python and ruby run run python script , chose drirectoy and change your directory need you need python and ruby deskt

XCO 2 Jun 20, 2022
The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

JareBear 2 Nov 20, 2021
OnedataFS is a PyFilesystem interface to Onedata virtual file system

OnedataFS OnedataFS is a PyFilesystem interface to Onedata virtual file system. As a PyFilesystem concrete class, OnedataFS allows you to work with On

onedata 0 Jan 10, 2022
Simple, convenient and cross-platform file date changing library. 📝📅

Simple, convenient and cross-platform file date changing library.

kubinka0505 15 Dec 18, 2022
A Python library that provides basic functions to read / write Aseprite format files

A Python library that provides basic functions to read / write Aseprite format files

Joe Trewin 1 Jan 13, 2022
Python code snippets for extracting PDB codes from .fasta files

Python_snippets_for_bioinformatics Python code snippets for extracting PDB codes from .fasta files If you have a single .fasta file for all protein se

Sofi-Mukhtar 3 Feb 09, 2022
Python virtual filesystem for SQLite to read from and write to S3

Python virtual filesystem for SQLite to read from and write to S3

Department for International Trade 70 Jan 04, 2023
This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it

This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it. In the current state, it outputs a PrettyTable to txt file as

Joshua Wren 1 Nov 09, 2021
A tool written in python to generate basic repo files from github

A tool written in python to generate basic repo files from github

Riley 7 Dec 02, 2021