The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Last update: Dec 25, 2022

Overview

tiara - The Internet Archive Research Assistant

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

by Kay Savetz, May 2021.

Searches Internet Archive using its full text search for new items matching the keywords you specify. Run this script once a day via crontab for daily updates about new items relevant to your ongoing research subjects. It keeps track of the items it has already found, so will only alert you to new-to-you items. The script outputs its findings to an html file, and optionally emails that file to you via SendGrid or your system mail (eg Sendmail or Postfix).

Put your keywords in searchlist.txt, one search term per line. Very general terms (like "dogs") provide too many daily hits to be useful. More specific phrases work better.

Dependency: Internet Archive command line tool (Install with pip install internetarchive) The script also requires read-write access to the directory it lives in.

Issue: Internet Archive cannot generate thumbnails for all items. In these cases, you may see a broken image icon. Issue: Internet Archive's full text search doesn't seem to allow exact phrase matching. So a search for "Pliny The Elder" may turn up items mentioning Pliny The Younger, or with "Pliny" on one page and "elder" on another.

If you find this tool useful, please donate to Internet Archive

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Related tags

Overview

tiara - The Internet Archive Research Assistant

Owner

Kay Savetz

Contains descriptions and code of the mini-projects developed in various programming languages

Задания КЕГЭ по информатике 2021 на Python

A Japanese tokenizer based on recurrent neural networks

This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

Modeling cumulative cases of Covid-19 in the US during the Covid 19 Delta wave using Bayesian methods.

Training open neural machine translation models

Task-based datasets, preprocessing, and evaluation for sequence models.

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

Contact Extraction with Question Answering.

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

GooAQ 🥑 : Google Answers to Google Questions!

PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation

Scikit-learn style model finetuning for NLP

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.