I³ Tracker for Essential Open Innovation Datasets

Overview

I³ Tracker for Essential Open Innovation Datasets

This repository is set up to track, version, and contribute updates to the I³ Essential Open Innovation Dataset Index, which consists of lists of datasets and tools relevant to Innovation Data. This index may be collaboratively edited, either by making edits to markdown files contained in this repository, or editing metadata in the Google Sheet.

The repository checks the Google Sheet for changes every 5min (and will update the site if there are any), and will also re-build the site automatically when somebody makes an edit via git. The site is generated from markdown files in this repository using the static site generator Jekyll.

Add/edit a Dataset using Git

Each record in the index has a corresponding markdown file (auto-generated) in the folder datasets/. These files contain the basic metadata associated with the record in the frontmatter, and also allow more long-form information, such as details of queries, images, and other written information, to be added. Both of these things are editable.

When a markdown file is added to the datasets/ folder, a GitHub action publishes the metadata in the frontmatter to the Google Sheet, and to the archive csv, to keep the records up to date. This script calls various metadata scrapers to automatically pull information like permalinks, citation information, and versioning. Once the file has been successfully committed, a second action will run to refresh the state of the website to reflect the edits.

Contribution Steps

  1. fork the repository, create a markdown in the folder 'datasets' based on the template file
  2. add as much metadata as you like, and create a pull request in this repository
  3. all being well, this should automatically merge. if not, you can check the GitHub actions log, or open an issue. (make sure it's in the correct folder, and has a .md file extension before doing so)

If the dataset is hosted on a platform with parseable citation metadata (Dataverse, Zenodo, ICPSR, and major university repositories are examples of these), then the tool will automatically pull most of the data associated with the dataset -- fields that will auto-fill are indicated by a comment. If the dataset is hosted on e.g. a personal site, then you might want to include some more information -- but ultimately, only a title and URL is really necessary. However you fill out your dataset, a uuid and timestamp will be generated for it automatically; these aren't fields you need to include (hence not included in the template).

The reason we've done this is to save you from copy-pasting a lot of information from existing repositories, and to make it easier for you to curate more useful and harder-to-scrape metadata -- such as the timeframes of datasets, links to code and documentation, and datasets that might be built on top of it but don't use an easy-to-parse citation. So definitely prioritise these fields!

If you're unsure of how to make a pull request, github has some good guides to doing this. You can also just make an edit to the Google Sheet, which will have an equivalent effect.

If there's a piece of metadata you think we should collect but don't, please add it to the frontmatter of the markdown files you contribute. (Nothing will break!) Then open an issue mentioning the new field, so that we can discuss adding it to the repository officially too.

To contribute a new dataset via pull request, please use the template file datasets/_template.md as a reference:

---
title: #required
url: #required
doi: #scrapeable
citation: #scrapeable
description: #scrapeable
timeframe:
documentation:
error_metrics:
code:
versioning: #scrapeable
terms_of_use: #scrapeable
tags:
references:
---


body text. info about `queries`, links and images goes here :)

Collections

The site also indexes collections, which are pages containing thematic information about datasets, tools and resources. These are housed in the folder collections/. The collection intro.md is an example -- this particular collection is also rendered on the front page of the site.

In the same manner as datasets, collection files can be added or edited using pull requests, where the repository is forked, and additions or edits to the collections can be made. The collections are not currently tracked via Google Sheets, and so may only be edited via git.

To create a new collection, the collection template may be copied to use as a reference:

---
title:
author:
tags:
---

Collections are a way to list resources around a theme, relevant to a research agenda or set of papers, or as an introduction to various aspects of the field. They are formatted in markdown:

To list a dataset that's in the index, use a relative link, e.g.

```markdown
[local dataset name](/datasets/dataset_shortname)

Dataset shortnames can be found either by looking at the urls directly, or through the 'shortnames' column of the Google Sheet.

Index

A versioned .csv file containing the index may be accessed in the folder index_archive. If you'd like to browse and query either sheet, you can do so using Github's Flat Data tool here. The Github Action that pulls the sheet is based on Dolthub's Gsheets-to-csv action.

Screenshot 2021-07-13 at 13 35 49

Problem 5: Fermat near-misses

Problem 5: Fermat near-misses fermatnearmiss This is a script that computes fermat nearm misses when the -f option is set and requires users to input

CHRIS BYRON (Int0x80) 1 Jan 08, 2022
pyRTOS is a real-time operating system (RTOS), written in Python.

pyRTOS Introduction pyRTOS is a real-time operating system (RTOS), written in Python. The primary goal of pyRTOS is to provide a pure Python RTOS that

Ben Williams 96 Dec 30, 2022
Demo of a WAM Prolog implementation in Python

Prol: WAM demo This is a simplified Warren Abstract Machine (WAM) implementation for Prolog, that showcases the main instructions, compiling, register

Bruno Kim Medeiros Cesar 62 Dec 26, 2022
A Python3 script to decode an encoded VBScript file, often seen with a .vbe file extension

vbe-decoder.py Decode one or multiple encoded VBScript files, often seen with a .vbe file extension. Usage usage: vbe-decoder.py [-h] [-o output] file

John Hammond 147 Nov 15, 2022
Pokemon sword replay capture

pokemon-sword-replay-capture This is an old version (March 2020) pokemon-sword-replay-capture-mar-2020-version of my Pokemon Replay Capture software.

11 May 15, 2022
A simple and usefull python calculator.

simplepy-calculator Your simple and fresh calculator. Getting Started Install python3 from the oficial python website or via terminal. Clone this repo

Felix Sanchez 1 Jan 18, 2022
Password manager using MySQL and Python 3.10.2

Password Manager Password manager using MySQL and Python 3.10.2 Installation Install my-project with github git clone https://github.com/AyaanSiddiq

1 Feb 18, 2022
An osu! cheat made in c++ rewritten in python and currently undetected.

megumi-python An osu! cheat made in c++ rewritten in python and currently undetected. Installation Guide Download python 3.9 from https://python.org C

Elaina 2 Nov 18, 2022
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python

Scalene: a high-performance CPU, GPU and memory profiler for Python by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno. Scalene community Slack Ab

PLASMA @ UMass 7k Dec 30, 2022
A python tool for synchronizing the messages from different threads, processes, or hosts.

Sync-stream This project is designed for providing the synchoronization of the stdout / stderr among different threads, processes, devices or hosts.

Yuchen Jin 0 Aug 11, 2021
dta Convert Dict To Attributes!

dta (Dict to Attributes) dta is very small dict (or json) to attributes converter. It is only have 1 files and applied to every python versions.

Rukchad Wongprayoon 0 Dec 31, 2021
Enjoy Discords Unlimited Storage

Discord Storage V.3.5 (Beta) Made by BoKa Enjoy Discords free and unlimited storage... Prepare: Clone this from Github, make sure there either a folde

0 Dec 16, 2021
Mahadi-6 - This Is Bangladeshi All Sim 6 Digit Cloner Tools

BANGLADESHI ALL SIM 6 DIGIT CLONER TOOLS TOOLS $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 2 Jan 23, 2022
Python Service for MISP Feed Management

Python Service for MISP Feed Management This set of scripts is designed to offer better reliability and more control over the fetching of feeds into M

Chris 7 Aug 24, 2022
CMPE 204 Modelling Project

CISC/CMPE 204 Modelling Project Welcome to the major project for CISC/CMPE 204 (Fall 2021)! Change this README.md file to summarize your project (few

totallyrin 2 May 16, 2022
Bots in moderation and a game (for now)

Tutorial: come far funzionare il bot e durarlo per 24/7 (o quasi...) Ci sono 17 passi per seguire: Andare sul sito Replit https://replit.com/ Vedrete

ZacyKing 1 Dec 27, 2021
Arabic to Roman Converter in Python

Arabic-to-Roman-Converter Made together with https://github.com/goltaraya . Arabic to Roman Converter in Python. -Instructions: 1 - Make sure you have

Pedro Lucas Tomazeti Fernandes 6 Oct 28, 2021
A sage package for working with circular genomes represented by signed or unsigned permutations

Circular genome tools (cgt) A sage package for working with circular genomes represented by signed or unsigned permutations. It includes tools for con

Joshua Stevenson 1 Mar 10, 2022
A system for assigning and grading notebooks

nbgrader Linux: Windows: Forum: Coverage: Cite: A system for assigning and grading Jupyter notebooks. Documentation can be found on Read the Docs. Hig

Project Jupyter 1.2k Dec 26, 2022
A joke conlang with minimal semantics

SyntaxLang Reserved Defined Words Word Function fo Terminates a noun phrase or verb phrase tu Converts an adjective block or sentence to a noun to Ter

Leo Treloar 1 Dec 07, 2021