Dados Públicos de CNPJ disponibilizados pela Receita Federal do Brasil

Overview

Dados Públicos CNPJ

  • Fonte oficial da Receita Federal do Brasil, aqui.
  • Layout dos arquivos, aqui.

A Receita Federal do Brasil disponibiliza bases com os dados públicos do cadastro nacional de pessoas jurídicas (CNPJ).

De forma geral, nelas constam as mesmas informações que conseguimos ver no cartão do CNPJ, quando fazemos uma consulta individual, acrescidas de outros dados de Simples Nacional, sócios e etc. Análises muito ricas podem sair desses dados, desde econômicas, mercadológicas até investigações.

Nesse repositório consta um processo de ETL para i) baixar os arquivos; ii) descompactar; iii) ler, tratar e iv) inserir num banco de dados relacional PostgreSQL.


Infraestrutura necessária:

  • Python 3.8 - libraries:

    • wget
    • pandas
    • ftplib
    • datetime
    • gzip
    • urllib
    • bs4
    • re
    • os
    • zipfile
    • sqlalchemy
    • psycopg2
    • time
  • Banco de dados:


How to use:

  1. Com o Postgre instalado, inicie a instância do servidor (pode ser local) e crie o banco de dados conforme o arquivo banco_de_dados.sql.

  2. Conforme o seu ambiente, substitua as variáveis abaixo no arquivo ETL_coletar_dados_e_gravar_BD.py:

    • output_files: diretório de destino para o donwload dos arquivos
    • user: usuário do banco de dados criado pelo arquivo banco_de_dados.sql
    • passw: senha do usuário do BD
    • host: host da conexão com o BD
    • port: porta da conexão com o BD
    • database: nome da base de dados na instância (Dados_RFB - conforme arquivo banco_de_dados.sql)
  3. Executar o arquivo ETL_coletar_dados_e_gravar_BD.py e aguardar a finalização do processo.

    • Os arquivos são grandes: dependendo da infraestrutura isso deve levar muitas horas para conclusão.
    • Arquivos de 08/05/2021: 4,68 GB compactados e 17,1 GB descompactados.

Tabelas geradas:

  • Para maiores informações, consulte o layout.

    • empresa: dados cadastrais da empresa em nível de matriz
    • estabelecimento: dados analíticos da empresa por unidade / estabelecimento (telefones, endereço, filial, etc)
    • socios: dados cadastrais dos sócios das empresas
    • simples: dados de MEI e Simples Nacional
    • cnae: código e descrição dos CNAEs
    • quals: tabela de qualificação das pessoas físicas - sócios, responsável e representante legal.
    • natju: tabela de naturezas jurídicas - código e descrição.
    • moti: tabela de motivos da situação cadastral - código e descrição.
    • pais: tabela de países - código e descrição.
    • munic: tabela de municípios - código e descrição.
  • Pelo volume de dados, as tabelas empresa, estabelecimento, socios e simples possuem índices para a coluna cnpj_basico, que é a principal chave de ligação entre elas.

Modelo de Entidade Relacionamento:

alt text

Owner
Aphonso Henrique do Amaral Rafael
Economist, accountant and data & analytics enthusiastic. Data science and statistics permanently student.
Aphonso Henrique do Amaral Rafael
A minimal open source mtg-like tcg game made in python that can be played on a terminal emulator using a keyboard.

TCG-TERM Project state: 🔧 🚧 🚧 🚧 Incomplete, In development 🚧 🚧 🚧 👷 (Keep in mind that at the moment, This project is currently undone, and wil

Amos 3 Aug 29, 2021
unofficial library for discord components(on development)

discord.py-buttons unofficial library for discord buttons(on development) Install pip install --upgrade discord_buttons Example from discord import Cl

kiki7000 129 Dec 31, 2022
🎵 RythmReloaded 🎵 A bot that can play music on Telegram Group and Channel Voice Chats

🎵 RythmReloaded 🎵 A bot that can play music on Telegram Group and Channel Voice Chats POWERED BY MARSHALX TGCALLS Available on telegram as @OptimusP

0 Nov 03, 2021
A Python Script to automate searching of available vaccination centers in the city and hence booking

Cowin Vaccine Availability Notifier Cowin Vaccine Availability Notifier takes your City or PIN code as an input and automatically notifies you via ema

Jayesh Padhiar 7 Sep 05, 2021
The Best Multipurpose Discord Bot!

Polsu The Best Multipurpose Discord Bot! • Introduction • Screenshots • Setup • License Introduction Polsu is a Multipurpose Discord Bot. Polsu has a

Polsulpicien 1 Nov 09, 2021
An script where it logs in your instagram account and follows people and likes their posts

InstaFollower An script where it logs in your instagram account and follows people and likes their posts (uses the tags to fetch people) Requirements:

Bless 3 Nov 29, 2022
Ini Hanya Shortcut Untuk Menambahkan Kunci Tambahan Pada Termux & Membantu Para Nub Yang Decode Script Orang:v

Ini Hanya Shortcut Untuk Menambahkan Kunci Tambahan Pada Termux & Membantu Para Nub Yang Decode Script Orang:v

Lord_Ammar 1 Jan 23, 2022
This repository contains the best Data Science free hand-picked resources to equip you with all the industry-driven skills and interview preparation kit.

Best Data Science Resources Hey, Data Enthusiasts out there! Finally, after lots of requests from the community I finally came up with the best free D

Mohit Kumar 415 Dec 31, 2022
Demo of using Telegram to send alert message

MIAI_Telegram Demo of using Telegram to send alert message Video link: https://youtu.be/oZ9CsIrlMgg #MìAI Fanpage: http://facebook.com/miaiblog Group

4 Jun 20, 2021
A Python library to access Instagram's private API.

Instagram Private API A Python wrapper for the Instagram private API with no 3rd party dependencies. Supports both the app and web APIs. Overview I wr

2.6k Jan 05, 2023
A python API for BSCScan (Binance Smart Chain Explorer), available on PyPI.

bscscan-python A complete Python API for BscScan.com, available on PyPI. Powered by BscScan.com APIs. This is a gently modified fork of the etherscan-

Panagiotis Kotsias 246 Dec 31, 2022
A bot that can play songs in Telegram group voice chats like AK 47

🎧 47Music Player 🎧 A bot that can play songs in Telegram group voice chats like AK 47 ✨ Easy To Deploy Pyrogram Session Config Vars API_ID : Assista

Janindu Malshan 23 Dec 07, 2022
Python client and API for monitoring and controling energy diversion devices from MyEnergi

Python client and API for monitoring and controling energy diversion devices from MyEnergi A set of library functions and objects for interfacing with

1 Dec 17, 2021
A python oriented telegram with API of yobit.net

YoBit-BTC A python oriented telegram bot with API of https://yobit.net/ Developed By @riz4d What is Yobit? ➪ YoBit is a cryptocurrency exchange that w

Muhammed Rizad 6 Apr 02, 2022
PackMyPayload - Emerging Threat of Containerized Malware

This tool takes a file or directory on input and embeds them into an output file acting as an archive/container.

Mariusz Banach 594 Dec 29, 2022
Simple python program to execute terminal commands on telegram chats directly.

Small python code which can be handy when using telegram and you don't want to use VPS again and again. By configuring the code in your VPS, You can execute commands and get your output within telegr

Veshraj Ghimire 34 Dec 05, 2022
Migration Manager (MM) is a very small utility that can list source servers in a target account and apply mass launch template modifications.

Migration Manager Migration Manager (MM) is a very small utility that can list source servers in a target account and apply mass launch template modif

Cody 2 Nov 04, 2021
Step by Step Guide To Install Discord Py Master Branch on Replit

Guide to Install Discord Py Master Branch on Replit Step 1 Create an empty repl on replit Step 2 Add this Basic Code to the file main.py so as to chec

Pranav Saxena 7 Nov 18, 2022
A python bot that will allow you to have maximum luck during Veve drops.

VeveBot You can follow me here Github | Twitter Features: - Click on the purchase at the time of the drop. - Be able to choose to do more than one tes

Rodz 1 Dec 04, 2021
A suite of utilities for AWS Lambda Functions that makes tracing with AWS X-Ray, structured logging and creating custom metrics asynchronously easier

A suite of utilities for AWS Lambda Functions that makes tracing with AWS X-Ray, structured logging and creating custom metrics asynchronously easier

Amazon Web Services - Labs 1.9k Jan 07, 2023