A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    Async Python 3.6+ web scraping micro-framework based on asyncio

    Ruia 🕸️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚡ Write less, run faster. Overview Ruia is an async web scraping micro-frame

    howie.hu 1.6k Jan 01, 2023
    Scraping weather data using Python to receive umbrella reminders

    A Python package which scrapes weather data from google and sends umbrella reminders to specified email at specified time daily.

    Edula Vinay Kumar Reddy 1 Aug 23, 2022
    TarkovScrappy - A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov!

    TarkovScrappy A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov! Hideout items

    Joshua Smeda 2 Apr 11, 2022
    Lovely Scrapper

    Lovely Scrapper

    Tushar Gadhe 2 Jan 01, 2022
    Scrape puzzle scrambles from csTimer.net

    Scroodle Selenium script to scrape scrambles from csTimer.net csTimer runs locally in your browser, so this doesn't strain the servers any more than i

    Jason Nguyen 1 Oct 29, 2021
    CreamySoup - a helper script for automated SourceMod plugin updates management.

    CreamySoup/"Creamy SourceMod Updater" (or just soup for short), a helper script for automated SourceMod plugin updates management.

    3 Jan 03, 2022
    PyQuery-based scraping micro-framework.

    demiurge PyQuery-based scraping micro-framework. Supports Python 2.x and 3.x. Documentation: http://demiurge.readthedocs.org Installing demiurge $ pip

    Matias Bordese 109 Jul 20, 2022
    a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

    This is George's Scraping Project To get started cd into the theZoo file and run: chmod +x script.sh then: ./script.sh This will spin up a Postgres co

    George Reyes 7 Nov 27, 2022
    Python web scrapper

    Website scrapper Web scrapping project in Python. Created for learning purposes. Start Install python Update configuration with websites Launch script

    Nogueira Vitor 1 Dec 19, 2021
    Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

    This is a quick-and-dirty tool used to scrape bitcoin/bitcoin pull request and commentary data. Each output/pr number folder contains comments.json:

    James O'Beirne 8 Oct 12, 2022
    Scrapy-soccer-games - Scraping information about soccer games from a few websites

    scrapy-soccer-games Esse projeto tem por finalidade pegar informação de tabela d

    Caio Alves 2 Jul 20, 2022
    Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.

    Agroforestry Species Switchboard 2.0 Scraper Scrape plants scientific name information from Species Switchboard 2.0. Requirements python = 3.10 (you

    Mgs. M. Rizqi Fadhlurrahman 2 Dec 23, 2021
    New World Market Scraper

    Bean Seller A New Worlds market scraper. Deployment This must be installed on Windows as it uses the Windows api to do its stuff Install Prerequisites

    4 Sep 21, 2022
    A universal package of scraper scripts for humans

    Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains.

    299 Dec 15, 2022
    Scraping news from Ucsal portal with Scrapy.

    NewsScraping Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional Tecno

    Crissiano Pires 0 Sep 30, 2021
    WebScraper - A script that prints out a list of all EXTERNAL references in the HTML response to an HTTP/S request

    Project A: WebScraper A script that prints out a list of all EXTERNAL references

    2 Apr 26, 2022
    一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

    QQ音乐歌词爬虫 一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件,默认去除了所有演唱会(Live)版本的歌曲。 使用方法 直接运行python run.py即可,然后输入你想获取的歌手名字,然后静静等待片刻。 output目录下保存生成的歌词和歌名文件。以周杰伦为例,会生成两

    Yang Wei 11 Jul 27, 2022
    Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

    trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

    Adrien Barbaresi 704 Jan 06, 2023
    哔哩哔哩爬取器:以个人为中心

    Open Bilibili Crawer 哔哩哔哩是一个信息非常丰富的社交平台,我们基于此构造社交网络。在该网络中,节点包括用户(up主),以及视频、专栏等创作产物;关系包括:用户之间,包括关注关系(following/follower),回复关系(评论区),转发关系(对视频or动态转发);用户对创

    Boshen Shi 3 Oct 21, 2021
    🕷 Phone Crawler with multi-thread functionality

    Phone Crawler: Phone Crawler with multi-thread functionality Disclaimer: I'm not responsible for any illegal/misuse actions, this program was made for

    Kmuv1t 3 Feb 10, 2022