Repositorio com arquivos processados da CPI da COVID para facilitar analise

Related tags

Miscellaneouscpi4all
Overview

cpi4all

Repositorio com arquivos processados da CPI da COVID para facilitar analise

Organização

No site do senado é possivel encontrar a lista de todos os documentos coletados pela CPI da COVID.

A tabela no site possui a seguinte estrutura:

No Arquivos Data de recebimento Remetente Origem Descrição Caixa Em Resposta
1 Link1 ... ... ... ... ... ...
2 Link2/link3 ... ... ... ... ... ...

Esses links levam ao download de arquivos PDF com os documentos em questão.

Nesse repositorio você podera encontrar a versão txt desses arquivos. O nome do arquivo nesse repositorio é formado por <No do documento>_<numero do link>. Por exemplo:

link1 = 1_1 porque ele é relativo ao arquivo No 1, e é o primeiro link.

link2 = 2_1 porque ele é relativo ao arquivo No 2, e é o primeiro link dessa linha.

link3 = 2_2 porque ele é relativo ao arquivo No 2, e é o segundo link da linha.

A versão texto de todos os documentos está na pasta database/txts/.

Exemplos:

Arquivo No 1, primeiro link: 1_1

Arquivo No 4, quarto link: 3_4

Nota 1: Nem todos os arquivos foram convertidos ainda

Nota 2: A conversão usa reconhecimento de imagem e pode ficar bem ruim as vezes, gerando erros ortograficos ou palavras sem nexo algum.

Para desenvolvedores

Os scripts funcionam na seguinte sequencia:

  1. extract_rows.py: Vai no site do senado e extrai as informações de cada linha da tabela. Todos os dados são salvos em database/rows.
  2. extract_headers.py: Para cada link em cada linha, esse script pega metadados do arquivo (tamanho, tipo) que vão ser uteis depois. Esses dados são salvos em database/headers.
  3. download_pdfs.py: Baixa todos os PDFs descritos em database/headers e salva em database/pdfs.
  4. convert_pdf_to_jpg.py: Converte todos os PDFs em database/pdfs para imagens em database/jpgs.
  5. convert_jpg_to_txt.py: Converte todos as imagens em database/jpgs para texto em database/txt.

Por motivos de performance, apenas as pastas database/rows, database/headers e database/txts sao salvas nesse repositorio.

TODO: 0. Melhorar esse readme :)

  1. Usar o githubpages para gerar um site estatico que permite pesquisar em todos os txt
  2. Terminar de converter todos os arquivos
  3. Investigar arquivos em que a conversão ficou pessima.
  4. Fazer extração automatica de datas e prover um json com a ordem cronologica dos arquivos.
Owner
Breno Rodrigues Guimarães
Breno Rodrigues Guimarães
An-7 tool for python

***An-7 tool - Anonime-X Team*** An-x Menu : SPAM Android web malware interpreter Spam Tools : scampages letters mailers smtpcrack wpbrute shell Andro

Hamza Anonime 8 Nov 18, 2021
A simple PID tuner and simulator.

PIDtuner-V0.1 PlantPy PID tuner version 0.1 Features Supports first order and ramp process models. Supports Proportional action on PV or error or a sp

3 Jun 23, 2022
Change ACLs for QNAP LXD unprivileged container.

qnaplxdunpriv If Advanced Folder Permissions is enabled in QNAP NAS, unprivileged LXD containers won't start. qnaplxdunpriv changes ACLs of some Conta

1 Jan 10, 2022
Starscape is a Blender add-on for adding stars to the background of a scene.

Starscape Starscape is a Blender add-on for adding stars to the background of a scene. Features The add-on provides the following features: Procedural

Marco Rossini 5 Jun 24, 2022
The purpose of this tool is to check RDP capabilities of a user on specific targets.

RDPChecker The purpose of this tool is to check RDP capabilities of a user on specific targets. Programming concept was taken from RDPassSpray and thu

Hypnoze57 57 Aug 04, 2022
Swim between bookmarks in the Windows terminal

Marlin Swim between bookmarks in the terminal! Marlin is an easy to use bookmark manager for the terminal. Choose a folder, bookmark it and swim there

wilfredinni 7 Nov 03, 2022
An electron application to check battery of bluetooth devices connected to linux devices.

bluetooth-battery-electron An electron application to check battery of bluetooth devices connected to linux devices. This project provides an electron

Vasu Sharma 15 Dec 03, 2022
A python package to adjust the bias of probabilistic forecasts/hindcasts using "Mean and Variance Adjustment" method.

Documentation A python package to adjust the bias of probabilistic forecasts/hindcasts using "Mean and Variance Adjustment" method. Read documentation

1 Feb 02, 2022
Feapder的管道扩展

FEAPDER 管道扩展 简介 此模块为feapder的pipelines扩展,感谢广大开发者对feapder的贡献 随着feapder支持的pipelines越来越多,为减少feapder的体积,特将pipelines提出,使用者可按需安装 管道 PostgreSQL 贡献者:沈瑞祥 联系方式:r

boris 9 Dec 07, 2022
Python script which allows for automatic registration in Golfbox

Python script which allows for automatic registration in Golfbox

Guðni Þór Björnsson 8 Dec 04, 2021
Arabic to Roman Converter in Python

Arabic-to-Roman-Converter Made together with https://github.com/goltaraya . Arabic to Roman Converter in Python. -Instructions: 1 - Make sure you have

Pedro Lucas Tomazeti Fernandes 6 Oct 28, 2021
Annotates sequences with Eggnog-mapper and hhblits against PDB70

Annotating "hypothetical" proteins with the PDB See config/ for configuration information. This workflow takes as input a set of protein sequences. It

1 Apr 05, 2022
Coffeematcher is a python library to randomly match participants for coffee meetings.

coffeematcher coffeematcher is a python library to randomly match participants for coffee meetings. Installation Clone the repository: git clone https

Thomas Wesselink 3 May 06, 2022
The most widely used Python to C compiler

Welcome to Cython! Cython is a language that makes writing C extensions for Python as easy as Python itself. Cython is based on Pyrex, but supports mo

7.6k Jan 03, 2023
Blender pluggin (python script) that adds a randomly generated tree with random branches and bend orientations

Blender pluggin (python script) that adds a randomly generated tree with random branches and bend orientations

Travis Gruber 2 Dec 24, 2021
Transform a Google Drive server into a VFX pipeline ready server

Google Drive VFX Server VFX Pipeline About The Project Quick tutorial to setup a Google Drive Server for multiple machines access, and VFX Pipeline on

Valentin Beaumont 17 Jun 27, 2022
This is a fork of the BakeTool with some improvements that I did to have better workflow.

blender-bake-tool This is a fork of the BakeTool with some improvements that I did to have better workflow. 99.99% of work was done by BakeTool team.

Acvarium 3 Oct 04, 2022
Simple application that does transformation with HPF and LPFs.

Simple application that applies Butterworth, Gaussian & Ideal kernels on HPF and LPFs -aka Frequency Domain Filtering- Upload image from sidebar, set

Merve Noyan 3 Jul 06, 2022
Alerts for Western Australian Covid-19 exposure locations via email and Slack

WA Covid Mailer Sends alerts from Healthy WA's Covid19 Exposure Locations via email and slack. Setup Edit the configuration items in wacovidmailer.py

13 Mar 29, 2022
Unofficial Python implementation of the DNMF overlapping community detection algorithm

DNMF Unofficial Python implementation of the Discrete Non-negative Matrix Factorization (DNMF) overlapping community detection algorithm Paper Ye, Fan

Andrej Janchevski 3 Nov 30, 2021