Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Overview

Fundamentus com framework scrapy

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Baixa informacões que os outros scrapys do fundamentus não realizam.

Para iniciar, dentro da pasta fundamentus digite: scrapy crawl detalhes -O nomedoarquivocriado.csv ou scrapy crawl resultado -O nomedoarquivocriado.csv

Não é um codigo elegante, mas funcional, realiza o scrapy de forma rapida.

As informacões baixadas são:

       columns = ['Papel', 'Cotação', 'Tipo', 'Data ult cot', 'Empresa', 'Min 52 sem',
                  'Setor', 'Max 52 sem', 'Subsetor', 'Vol $ méd (2m)', 'Valor de mercado',
                  'Últ balanço processado', 'Valor da firma', 'Nro. Ações',

                  'Dia', 'P/L',
                  'LPA', 'Mês', 'P/VP', 'VPA', '30 dias', 'P/EBIT', 'Marg. Bruta',
                  '12 meses', 'PSR', 'Marg. EBIT', '2021', 'P/Ativos', 'Marg. Líquida',
                  '2020', 'P/Cap. Giro', 'EBIT / Ativo', '2019', 'P/Ativ Circ Liq',
                  'ROIC', '2018', 'Div. Yield', 'ROE', '2017', 'EV / EBITDA',
                  'Liquidez Corr', '2016', 'EV / EBIT', 'Div Br/ Patrim', '2015',
                  'Cres. Rec (5a)', 'Giro Ativos',

                  'Ativo',
                  'Dív. Bruta',
                  'Disponibilidades',
                  'Dív. Líquida',
                  'Ativo Circulante',               
                  'Depósitos',
                  'Cart. de Crédito',
                  'Patrim. Líq',

                  'Receita Líquida_12meses',         
                  'Receita Líquida_3meses', 'EBIT_12meses', 'EBIT_3meses',
                  'Lucro Líquido_12meses', 'Lucro Líquido_3meses']
                  
                  e mais algumas informações...

Realizei este projeto com o fim de aprendizado e por não encontrar no github nenhum scrapy que pegue todas as informaçoes que eu precisava como setores e subsetores para realizar modelos KNN e KMC de machine learning.

Owner
Guilherme Silva Uchoa
Biomédico por formação, apaixonado por tecnologia.
Guilherme Silva Uchoa
Python Web Scrapper Project

Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

Jordan Ítalo Amaral 2 Jan 04, 2022
Dictionary - Application focused on word search through web scraping

Dictionary - Application focused on word search through web scraping, in addition to other functions such as dictation, spell and conjugation of syllables.

Juan Manuel 2 May 09, 2022
Meme-videos - Scrapes memes and turn them into a video compilations

Meme Videos Scrapes memes from reddit using praw and request and then converts t

Partho 12 Oct 28, 2022
Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

GetTss python Package extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file. Install $ pip install GetTss Us

laojunjun 6 Nov 21, 2022
A simple django-rest-framework api using web scraping

Apicell You can use this api to search in google, bing, pypi and subscene and get results Method : POST Parameter : query Example import request url =

Hesam N 1 Dec 19, 2021
Scrape puzzle scrambles from csTimer.net

Scroodle Selenium script to scrape scrambles from csTimer.net csTimer runs locally in your browser, so this doesn't strain the servers any more than i

Jason Nguyen 1 Oct 29, 2021
An arxiv spider

An Arxiv Spider 做为一个cser,杰出男孩深知内核对连接到计算机上的硬件设备进行管理的高效方式是中断而不是轮询。每当小伙伴发来一篇刚挂在arxiv上的”热乎“好文章时,杰出男孩都会感叹道:”师兄这是每天都挂在arxiv上呀,跑的好快~“。于是杰出男孩找了找 github,借鉴了一下其

Jie Liu 11 Sep 09, 2022
Minimal set of tools to conduct stealthy scraping.

Stealthy Scraping Tools Do not use puppeteer and playwright for scraping. Explanation. We only use the CDP to obtain the page source and to get the ab

Nikolai Tschacher 88 Jan 04, 2023
Libextract: extract data from websites

Libextract is a statistics-enabled data extraction library that works on HTML and XML documents and written in Python

499 Dec 09, 2022
Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

NewsScraper A simple Python 3 module to get crypto or news articles and their content from various RSS feeds. 🔧 Installation Clone the repo locally.

Rokas 3 Jan 02, 2022
Web Content Retrieval for Humans™

Lassie Lassie is a Python library for retrieving basic content from websites. Usage import lassie lassie.fetch('http://www.youtube.com/watch?v

Mike Helmick 570 Dec 19, 2022
Async Python 3.6+ web scraping micro-framework based on asyncio

Ruia 🕸️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚡ Write less, run faster. Overview Ruia is an async web scraping micro-frame

howie.hu 1.6k Jan 01, 2023
Scraping Thailand COVID-19 data from the DDC's tableau dashboard

Scraping COVID-19 data from DDC Dashboard Scraping Thailand COVID-19 data from the DDC's tableau dashboard. Data is updated at 07:30 and 08:00 daily.

Noppakorn Jiravaranun 5 Jan 04, 2022
This script is intended to crawl license information of repositories through the GitHub API.

GithubLicenseCrawler This script is intended to crawl license information of repositories through the GitHub API. Taking a csv file with requirements.

schutera 4 Oct 25, 2022
A simple code to fetch comments below an Instagram post and save them to a csv file

fetch_comments A simple code to fetch comments below an Instagram post and save them to a csv file usage First you have to enter your username and pas

2 Jul 14, 2022
基于Github Action的定时HITsz疫情上报脚本,开箱即用

HITsz Daily Report 基于 GitHub Actions 的「HITsz 疫情系统」访问入口 定时自动上报脚本,开箱即用。 感谢 @JellyBeanXiewh 提供原始脚本和 idea。 感谢 @bugstop 对脚本进行重构并新增 Easy Connect 校内代理访问。

Ter 56 Nov 27, 2022
:arrow_double_down: Dumb downloader that scrapes the web

You-Get NOTICE: Read this if you are looking for the conventional "Issues" tab. You-Get is a tiny command-line utility to download media contents (vid

Mort Yao 46.4k Jan 03, 2023
Here I provide the source code for doing web scraping using the python library, it is Selenium.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

M Khaidar 1 Nov 13, 2021
A Python library for automating interaction with websites.

Home page https://mechanicalsoup.readthedocs.io/ Overview A Python library for automating interaction with websites. MechanicalSoup automatically stor

4.3k Jan 07, 2023
Get-web-images - A python code that get images from any site

image retrieval This is a python code to retrieve an image from the internet, a

CODE 1 Dec 30, 2021