Newsemble is an API that provides easy access to the current news for programmatic analysis

Overview

📰 Newsemble 📰


Logo
An API for fetching the current news.

python   Flask   MongoDB  Heroku

GitHub release Visits Badge Stars Badge Fork Badge Github all releases watchers Badge

🔖 About 🔖


Blog Post

Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.
The data is scraped from these news websites every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.
Developers can make use of this API to fetch current data with each article having the following fields:
Headlines, Content, Source, Link and Time.



🗒️ Table of contents

💻 Technologies

Newsemble is created with:

  • Python 3
  • Flask
  • PyMongo
  • BeautifulSoup

📂 File Structure and Description

  • app.py - Flask code for the API
  • scraper.py - Collection of scrapers for the various news sites.
  • db.py - Connecting and Using MongoDB
  • utils.py - Utility Functions
  • scheduler.py - Scheduler
  • Procfile - For Deployment
  • requirements.txt - Python Requirments

🛠️ Pipeline

Newsemble pipeline

🚀 Getting-started

This project can be accessed by using following setup

Links

Links Description
www.newsemble.ml/news Link to fetch all the data from all sources
www.newsemble.ml/news/toi Link to fetch data from Times of India
www.newsemble.ml/news/th Link to fetch data from The Hindu
www.newsemble.ml/news/tie Link to fetch data from The Indian Express
www.newsemble.ml/news/ndtv Link to fetch data from NDTV news
www.newsemble.ml/news/it Link to fetch data from India Today

Request format

$ import requests
$ url = "http://www.newsemble.ml/news/"
$ requests.get(url).json()

Response format

{   
    ‘link’      :  $source_link$,
    ‘content’   :  $content_text$,    
    ‘source’    :  $news_source$,
    ‘title’     :  $headline$, 
    ‘time       :  $date_time_of_article$  
 }

Sample output

image

⚙️ Currently Supported Sites



🙏 Thanks!

All contributions are welcome and appreciated. 👍
If you liked this project, or found it useful in any way, please drop a 🌟 !

✍️ Authors ✍️

✒️ Rishabh Gupta
✒️ Vishal Singhania
✒️ Roshan Kumar

You might also like...
music downloader written in python.   (Uses jiosaavn API)
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Mobile based API for Crunchyroll BETA (and Downloader).

Mobile based API for Crunchyroll BETA (and Downloader). Not restricted on servers and NO CLOUDFLARE

Pypixiv - A fully-typed, asynchronous api wrapper for pixiv

pypixiv this library is a fully-typed, asynchronous api wrapper for pixiv. featu

This project is helps to download contents from Streamtape by utilizing the API

It scrapes Streamtape api and download contents from the site.

A python script that discovers hidden YouTube API clients. Just a research project.

YouTube-Internal-Clients A script that discovers hidden internal clients of the YouTube (Innertube) API using bruteforce methods. The script tries cli

Simple Python script to download images and videos from public subreddits without using Reddit's API 😎
Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Subreddit Media Downloader Download images and videos from any public subreddit without using Reddit's API Made with ❤ by Nico 💬 About: This script a

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

News-app - This is a news web app for reading news from different sources and topics
News-app - This is a news web app for reading news from different sources and topics

News-app - This is a news web app for reading news from different sources and topics

Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

Nasdaq Cloud Data Service (NCDS) Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and ot

The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

Repo Home WPDrawBot - (Repo, Home, WP) A powerful programmatic 2D drawing application for MacOS X which generates graphics from Python scripts. (graphics, dev, mac)

DrawBot DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics. The built-in

A curated list of programmatic weak supervision papers and resources
A curated list of programmatic weak supervision papers and resources

A curated list of programmatic weak supervision papers and resources

Learning trajectory representations using self-supervision and programmatic supervision.
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

Programmatic interface to Synapse services for Python

A Python client for Sage Bionetworks' Synapse, a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate

Django API that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format).

"Casares News" API Api that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format). Usage Consume the articles

This python module can analyse cryptocurrency news for any number of coins given and return a sentiment. Can be easily integrated with a Trading bot to keep an eye on the news.

Python script that analyses news headline or body sentiment and returns the overall media sentiment of any given coin. It can take multiple coins an

NLP project that works with news (NER, context generation, news trend analytics)
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

💰 An Alfred Workflow that provides current price of cryptocurrency
💰 An Alfred Workflow that provides current price of cryptocurrency

Coin Ticker for Alfred Workflow An Alfred Workflow that provides current price and status about cryptocurrency from cryptocompare.com. Supports Alfred

Comments
  • Bump lxml from 4.5.0 to 4.6.5

    Bump lxml from 4.5.0 to 4.6.5

    Bumps lxml from 4.5.0 to 4.6.5.

    Changelog

    Sourced from lxml's changelog.

    4.6.5 (2021-12-12)

    Bugs fixed

    • A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script content through SVG images (CVE-2021-43818).

    • A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script content through CSS imports and other crafted constructs (CVE-2021-43818).

    4.6.4 (2021-11-01)

    Features added

    • GH#317: A new property system_url was added to DTD entities. Patch by Thirdegree.

    • GH#314: The STATIC_* variables in setup.py can now be passed via env vars. Patch by Isaac Jurado.

    4.6.3 (2021-03-21)

    Bugs fixed

    • A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.

    4.6.2 (2020-11-26)

    Bugs fixed

    • A vulnerability (CVE-2020-27783) was discovered in the HTML Cleaner by Yaniv Nizry, which allowed JavaScript to pass through. The cleaner now removes more sneaky "style" content.

    4.6.1 (2020-10-18)

    ... (truncated)

    Commits
    • a9611ba Fix a test in Py2.
    • a3eacbc Prepare release of 4.6.5.
    • b7ea687 Update changelog.
    • 69a7473 Cleaner: cover some more cases where scripts could sneak through in specially...
    • 54d2985 Fix condition in test decorator.
    • 4b220b5 Use the non-depcrecated TextTestResult instead of _TextTestResult (GH-333)
    • d85c6de Exclude a test when using the macOS system libraries because it fails with li...
    • cd4bec9 Add macOS-M1 as wheel build platform.
    • fd0d471 Install automake and libtool in macOS build to be able to install the latest ...
    • f233023 Cleaner: Remove SVG image data URLs since they can embed script content.
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(v1.0)
Owner
Rishabh
CSE Undergrad at IIIT-D.
Rishabh
This repository contains code for a youtube-dl GUI written in PyQt.

youtube-dl-GUI This repository contains code for a youtube-dl GUI written in PyQt. It is based on youtube-dl which is a Video downloading script maint

M.Yasoob Ullah Khalid ☺ 191 Jan 02, 2023
Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included

WV-AMZN-4K-RIPPER Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included For CDM You can Mail :- Denis Trunov 179 Dec 17, 2022

Fast TikTok NO Watermark Video Downloader (username or url)

💎 TD [ TikDown v4 ] Star ⭐ if you want more Discord Server * discord.gg/onlp | Waxor#9999 Why not open source anymore ? * BECAUSE PEOPLE SKID, STEA

Tekky 26 Dec 01, 2022
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Rohn Chatterjee 35 Jul 20, 2022
An automatic beatmapset downloader via txt file, suitable for tourney mappools.

Pooler Pooler is a bulk osu! mapset downloader, perfect for use with osu! Tournament Mappools. Prerequisites Python 3.10 Requests (pip install request

Thomas 0 Feb 11, 2022
Code to scrape , download and upload to youtube daily

Youtube_Automated_Channel Code to scrape , download and upload to youtube daily INSTRUCTIONS Download the Github Repository Download and install Pytho

Atsiksdong 2 Dec 19, 2021
GTK4 + Python tutorial with code examples

Taiko's GTK4 Python tutorial Wanna make apps for Linux but not sure how to start with GTK? This guide will hopefully help! The intent is to show you h

190 Jan 08, 2023
A downloader for the ISIS service of TU Berlin

isis_dl A downloading utility for the ISIS tool of TU-Berlin. Version 0.4 Features Downloads all Material from all courses of your ISIS page. Efficien

1 Nov 06, 2021
YouTube-Downloader - YouTube Video Downloader made using python

YouTube-Downloader YouTube Videos Downloder made using python.

Shivam 1 Jan 16, 2022
A cross platform front-end GUI of the popular youtube-dl written in wxPython.

youtube-dlG A cross platform front-end GUI of the popular youtube-dl media downloader written in wxPython. Supported sites Screenshots Requirements Py

8.7k Dec 31, 2022
ComicDownloader - Downloads Comics from readcomiconline.li

ComicDownloader Downloads Comics from readcomiconline.li To use this script from

2 Nov 08, 2022
Stremio addon for fetching videos from your google drive.

stremio-gdrive Instructions: There are two ways to go about: Method 1 is hard and long but might give you better performance and you need to make your

72 Dec 31, 2022
Youtube_dl_helper - A hacky python script meant to automate the process of downloading mp3 files from YouTube using youtube-dl library

youtube_dl_helper A helper program meant to automate the process of downloading mp3 files from YouTube using youtube-dl library Dependencies In order

Guilherme Bittencourt de Borba 1 Jan 04, 2022
YTPY Youtube Downloader Made by: Ferreira, Amarau and Rodric

YTPY Youtube Downloader Made by: Ferreira, Amarau and Rodric How to Install on Linux: sudo apt install python3 python3-pip git pip install pytube git

7 Nov 24, 2022
A Python package for downloading / archiving all available episodes from a podcast RSS feed.

allcasts 📻 🗃 A Python package for downloading all available episodes from a podcast RSS feed. Useful for making private archives of your favourite p

Lewis Gentle 5 Nov 20, 2022
A collection of modules I have created to programmatically search for/download imagery from live cam feeds across the state of California.

A collection of modules that I have created to programmatically search for/download imagery from all publicly available live cam feeds across the state of California. In no way am I affiliated with a

Chad Groom 5 Nov 21, 2022
Download Web-10K data by querying Bing Image Search

gpv2-web10k This repository contains the script to download images from the Web-10K dataset. The script takes in a list of queries, queries Bing Image

AI2 8 Sep 06, 2022
This is Yt Downloader. Coded with Python (my first repository)

Get Started Download & install Python first before using this software. Download Python Installing Python and Pytube Library (IMPORTANT) Installing Py

Qi 2 Oct 25, 2021
A program which takes an Anime name or URL and downloads the specified range of episodes.

super-anime-downloader A console application written in Python3.x (GUI will be added soon) which takes a Anime Name/URL as input and downloads the ran

Sayyid Ali Sajjad Rizavi 26 Jul 18, 2022
Smule Video Downloader

Smule Video Downloader Using Requests,Re & Urllib Installation - apt install git (for vps) or pkg install git (for termux)

Hansen Gianto 4 Aug 31, 2022