Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

Overview

NewsScraper

A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

🔧 Installation

  1. Clone the repo locally.
  2. Use the package manager pip to install the requirements.
pip install -r requirements.txt

Basic Usage

import NewsScraper

all_data = NewsScraper.fetch_all()
news_data = NewsScraper.fetch_news_data()
crypto_data = NewsScraper.fetch_crypto_data()

fetch_all()

Returns a set of NewsScraper.Result containing fetched results from all available RSS feeds

Can include categories: GLOBAL, US, EU, CRYPTO, BLOCKCHAIN, BTC, ETH, LTC.

fetch_news_data()

Returns a set of NewsScraper.Result containing fetched results from CNN, ABC News, Yahoo News, Fox News RSS feeds

Can include categories: GLOBAL, US, EU.

fetch_crypto_data()

Returns a set of NewsScraper.Result containing fetched results from CoinJournal, Crypto Currency News RSS feeds.

Can include categories: CRYPTO, BLOCKCHAIN, BTC, ETH, LTC.

🔨 Advanced Usage

NewsScraper.Result class

A class used to represent a returned article.

Attributes
  • context : str

    A string describing the category of the article.

    ex. "GLOBAL", "US", "BLOCKCHAIN", "BTC".

  • title : str

    A string containing the name of the article.

  • summary : str

    A string containing the summary of the article.

    NOTE: sometimes it can have the value of "", because the RSS feed didn't provide a summary.

  • content : str

    A string containing the content of the article.

Methods
  • Result.json()

    Returns a dictionary with the attributes of the class formatted in JSON.

    ex.

{
  "context": "global",
  "title": "title of the article",
  "summary": "summary of the article",
  "content": "content of the article"
}

News RSS Feeds

All of these functions return a set of NewsScraper.Result containing fetched results of the described RSS feeds.

fetch_abc()
fetch_cnn()
fetch_yahoo()
fetch_fox_news()

Can include categories: GLOBAL, US, EU.

Alternatively, you can use fetch_news_data() to receive results from all of them.


Crypto RSS Feeds

All of these functions return a set of NewsScraper.Result containing fetched results of the described RSS feeds.

fetch_coinjournal()
fetch_cryptocurrencynews()

Can include categories: CRYPTO, BLOCKCHAIN, BTC, ETH, LTC.

Alternatively, you can use fetch_news_data() to receive results from all of them.

🤝 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

📝 License

This project is licensed under the MIT license.

Owner
Rokas
Rokas
CreamySoup - a helper script for automated SourceMod plugin updates management.

CreamySoup/"Creamy SourceMod Updater" (or just soup for short), a helper script for automated SourceMod plugin updates management.

3 Jan 03, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 08, 2023
A simple proxy scraper that utilizes the requests module in python.

Proxy Scraper A simple proxy scraper that utilizes the requests module in python. Usage Depending on your python installation your commands may vary.

3 Sep 08, 2021
Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Playwright Browser Pool This example illustrates how it's possible to use a pool of browsers to retrieve page urls in a single asynchronous process. i

Bernardas Ališauskas 8 Oct 27, 2022
Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

proxy scraper 🔎 Installation: git clone https://github.com/ebankoff/proxy_scraper Required pip libraries (pip install library name): lxml beautifulso

Eban'ko 19 Dec 07, 2022
Searching info from Google using Python Scrapy

Python-Search-Engine-Scrapy || Python-爬虫-索引/利用爬虫获取谷歌信息**/ Searching info from Google using Python Scrapy /* 利用 PYTHON 爬虫获取天气信息,以及城市信息和资料**/ translatio

HONGVVENG 1 Jan 06, 2022
A low-code tool that generates python crawler code based on curl or url

KKBA Intruoduction A low-code tool that generates python crawler code based on curl or url Requirement Python = 3.6 Install pip install kkba Usage Co

8 Sep 20, 2021
Dex-scrapper - Hobby project for scrapping dex data on VeChain

Folders /zumo_abis # abi extracted from zumo repo /zumo_pools # runtime e

3 Jan 20, 2022
An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

Social Media Scraper An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line! Go to the website » Vie

2 Aug 03, 2022
A web scraper which checks price of a product regularly and sends price alerts by email if price reduces.

Amazon-Web-Scarper Created a web scraper using simple functions to check price of a product on amazon (can be duplicated to check price at other marke

Swaroop Todankar 1 Jan 17, 2022
SkyScrapers: A collection of variety of Scraping Apps

SkyScrapers Collection of variety of Web Scraping Apps The web-scrapers involved

Biplov Pokhrel 3 Feb 17, 2022
An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.

TikTok Scraper An utility library to scrape data from TikTok hassle-free Go to the website » View Demo · Report Bug · Request Feature About The Projec

6 Jan 08, 2023
An helper library to scrape data from Instagram effortlessly, using the Influencer Hunters APIs.

Instagram Scraper An utility library to scrape data from Instagram hassle-free Go to the website » View Demo · Report Bug · Request Feature About The

2 Jul 06, 2022
一些爬虫相关的签名、验证码破解

cracking4crawling 一些爬虫相关的签名、验证码破解,目前已有脚本: 小红书App接口签名(shield)(2020.12.02) 小红书滑块(数美)验证破解(2020.12.02) 海南航空App接口签名(hnairSign)(2020.12.05) 说明: 脚本按目标网站、App命

XNFA 90 Feb 09, 2021
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re

Scrapy project 859 Dec 29, 2022
A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Universal Online Judge Spider Introduction This is a spider for Universal Online Judge (UOJ) system (https://uoj.ac/). It also works for all other Onl

TriNitroTofu 1 Dec 07, 2021
Console application for downloading images from Reddit in Python

RedditImageScraper Console application for downloading images from Reddit in Python Introduction This short Python script was created for the mass-dow

James 0 Jul 04, 2021
This tool crawls a list of websites and download all PDF and office documents

This tool crawls a list of websites and download all PDF and office documents. Then it analyses the PDF documents and tries to detect accessibility issues.

AccessibilityLU 7 Sep 30, 2022
Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

Scrapy project 45.5k Jan 07, 2023
An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework Published at palewi.re/docs/first-github-scraper/ Contrib

Ben Welsh 15 Nov 24, 2022