A Python module and command line utility for working with web archive data using the WACZ format specification

Overview

py-wacz

The py-wacz repository contains a Python module and command line utility for working with web archive data using the WACZ format specification. Web Archive Collection Zipped (WACZ) allows web archives to be shared and distributed by providing a predictable way of packaging up web archive data and metadata as a ZIP file. The wacz command line utility supports converting any WARC files into WACZ files, and optionally generating full-text search indices of pages.

Install

Use pip to install the module and a command line utility:

pip install wacz

Once installed you can use the wacz command line utility to create and validate WACZ files.

Create

To create a WACZ package you can point wacz at a WARC file and tell it where to write the WACZ with the -o option:

wacz create -o myfile.wacz 
   

   

The resulting myfile.wacz should be loadable via ReplayWeb.page.

wacz accepts the following options for customizing how the WACZ file is assembled.

-f --file

Explicitly declare the file being passed to the create function.

wacz create -f tests/fixtures/example-collection.warc

-o --output

Explicitly declare the name of the wacz being created

wacz create tests/fixtures/example-collection.warc -o mywacz.wacz

-t --text

Generates pages.jsonl page index with a full-text index, must be run in conjunction with --detect-pages. Will have no effect if run alone

wacz create tests/fixtures/example-collection.warc -t

--detect-pages

Generates pages.jsonl page index without a full-text index

wacz create tests/fixtures/example-collection.warc --detect-pages

-p --pages

Overrides the pages index generation with the passed jsonl pages.

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl

-t --text

You can add a full text index by including the --text tag

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl --text

--ts

Overrides the ts metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --ts TIMESTAMP

--url

Overrides the url metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --url URL

--title

Overrides the titles metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --title TITLE

--desc

Overrides the desc metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --desc DESC

--hash-type

Allows the user to specify the hash type used: (sha256 or md5):

wacz create tests/fixtures/example-collection.warc --hash-type md5

Validate

You can also validate an existing WACZ file by running:

wacz validate myfile.wacz

-f --file

Explicitly declare the file being passed to the validate function.

wacz validate -f tests/fixtures/example-collection.warc

Testing

If you are developing wacz you can run the unit tests with pytest:

pytest tests
Owner
Webrecorder
Webrecorder Project
Webrecorder
Present - A terminal-based presentation tool with colors and effects.

present A terminal-based presentation tool with colors and effects. You can also play a codio (pre-recorded code block) on a slide. present is built o

Vinayak Mehta 4.2k Jan 03, 2023
Objexplore is an interactive Python object explorer for the terminal.

Objexplore is an interactive Python object explorer for the terminal. Use it while debugging, or exploring a new library, or whatever! 9D1FAC73-B2A5-4

kylepollina 249 Dec 23, 2022
Command line parser for common log format (Nginx default).

Command line parser for common log format (Nginx default).

Lucian Marin 138 Dec 19, 2022
3DigitDev 29 Jan 17, 2022
A command-line based, minimal torrent streaming client made using Python and Webtorrent-cli.

ABOUT A command-line based, minimal torrent streaming client made using Python and Webtorrent-cli. Installation pip install -r requirements.txt It use

Janardon Hazarika 17 Dec 11, 2022
Pyrdle - Play Wordle in the CLI. Write an algorithm to play Wordle for you. Ruin all of the fun you've been having

Pyrdle - Play Wordle in the CLI. Write an algorithm to play Wordle for you. Ruin all of the fun you've been having

Charles Tapley Hoyt 11 Feb 11, 2022
A communist shell written in Python

kash A communist shell written in Python It doesn't support escapes, quotes, comment lines, |, &&, , or similar yet. If you need help, get it from

Çınar Yılmaz 1 Dec 10, 2021
AthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.

Introduction AthenaCLI is a command line interface (CLI) for the Athena service that can do auto-completion and syntax highlighting, and is a proud me

dbcli 192 Jan 07, 2023
Personal and work vim 8 configuration with submodules

vimfiles Windows Vim 8 configuration files based on the recommendations of Ruslan Osipov, Keep Your vimrc file clean and The musings of bluz71. :help

1 Aug 27, 2022
Fetch is use to get information about anything on the shell using Wikipedia.

Fetch Search wikipedia article on command line [Why This?] [Support the Project] [Installation] [Configuration] Why this? Fetch helps you to quickly l

Yash Singh 340 Dec 18, 2022
dotfilery, configuration, environment settings, automation, etc.

┌┬┐┌─┐┌─┐┌─┐┬ ┬┌┬┐┬ ┬┬┌─┐ │││├┤ │ ┬├─┤│ │ │ ├─┤││ :: bits & bobs, dots & things. ┴ ┴└─┘└─┘┴ ┴┴─┘┴ ┴ ┴ ┴┴└─┘ @megalithic 🚀 Instal

Seth Messer 89 Dec 25, 2022
A Terminal UI for Discord

ToastCord ToastCord is a Discord Terminal UI. At the moment you can only look at Direct messages. TODO: - Add support for guilds - Message sending sup

toast 82 Dec 18, 2022
A command line tool to publish ads on ebay-kleinanzeigen.de

kleinanzeigen-bot Feedback and high-quality pull requests are highly welcome! About Installation Usage Development Notes License About kleinanzeigen-b

83 Dec 26, 2022
A simple CLI productivity tool to quickly display the syntax of a desired piece of code

Iforgor Iforgor is a customisable and easy to use command line tool to manage code samples. It's a good way to quickly get your hand on syntax you don

Solaris 21 Jan 03, 2023
Faza - Faza terminal, Faza help to beginners for pen testing

Faza terminal simple tool for pen testers Use small letter only for commands Don't use space after command 'help' for more information Installation gi

Ag3ntQ 5 Feb 20, 2022
adds flavor of interactive filtering to the traditional pipe concept of UNIX shell

percol __ ____ ___ ______________ / / / __ \/ _ \/ ___/ ___/ __ \/ / / /_/ / __/ / / /__/ /_/ / / / .__

Masafumi Oyamada 3.2k Jan 07, 2023
CLI translator based on Google translate API

Translate-CLI CLI переводчик основанный на Google translate API как пользоваться ? запустить в консоли скомпилированный скрипт (exe - windows, bin - l

7 Aug 13, 2022
Wordle - Wordle solver with python

wordle what is wordle? https://www.powerlanguage.co.uk/wordle/ preparing $ pip i

shidocchi 0 Jan 24, 2022
A simple terminal-based localhost chat application written in python

Chat House A simple terminal-based localhost chat application written in python How to Use? Clone the repo git clone https://github.com/heksadecimal/c

Heks 10 Nov 09, 2021
Urial (URI Addition tooL) intelligently updates URIs stored in Finder comments of macOS files

Urial Urial (URI addition tool) is a simple but intelligent command-line tool to add or replace URIs found inside macOS Finder comments. Table of cont

Mike Hucka 3 Sep 14, 2022