Web Downloader With Python

Overview

Web Downloader

Introduction

This module will provide API to download the webpage components : html file, image file, css fil, javascript file, href link file based on the input url (the url must start with 'http' or 'https' ).

To prosses multiple URLs at the same time, The user can list all the url he wants to download in the file "urllist.txt" as shown below:

# Add the URL you want to download line by line(The url must start with 'http' or 'https' ):
# example: https://www.google.com
https://www.google.com
https://www.carousell.sg/
https://www.google.com/search?q=github&sxsrf=AOaemvJh3t5_h8H85AE8Ajbb1IMnBrRISA%3A1636698503535&source=hp&ei=hwmOYY6mHdGkqtsPq8S9sAY&iflsig=ALs-wAMAAAAAYY4Xl7GLWS16_xc2Q9XrG0p3q277DpkL&oq=&gs_lcp=Cgdnd3Mtd2l6EAEYADIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzINCC4QxwEQowIQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJ1AAWABgjgdoAXAAeACAAQCIAQCSAQCYAQCwAQo&sclient=gws-wiz
https://stackoverflow.com/questions/66022042/how-to-let-kubernetes-pod-run-a-local-script/66025424

Program Setup

Development Environment : python 3.7.4
Additional Lib/Software Need
  1. beautifulsoup4 4.10.0

    install:

    pip install beautifulsoup4
    

    Lib link: https://pypi.org/project/beautifulsoup4/

Hardware Needed : None
Program File List

version: v0.1

Program File Execution Env Description
webDownload.py python 3 Main executable program use the API.
urllist.txt url record list.

Program Usage

Module API Usage
  1. Downloader init:
soup = urlDownloader(imgFlg=True, linkFlg=True, scriptFlg=True)
  • imgFlg: Set to "True" to download all the "" tag files.
  • linkFlg: Set to "True" to download all the html section, image, icon, css file imported by ""
  • scriptFlg: set to "True" to download all the js file.
  1. Call API method savePage to scape url and save the data in a folder

    soup.savePage('
         
          ', '
          
           ')
    
    # Exampe:
    soup.savePage('https://www.google.com', 'www_google_com')
    
          
         
Program Execution
  1. Copy the url you want to check in the url record file "urllist.txt"

  2. Cd to the program folder and run program execution cmd:

    python webDownload.py
    
  3. Check the result:

    For example, if you copy the url "https://www.carousell.sg/" as the first url you want to check into the file "urllist.txt" file, all the html files, image file and js files will be under folder "1_www.carousell.sg_files"

    • The main web page will be saved as: "1_www.carousell.sg_files/1_www.carousell.sg.html"
    • The image used in the page will be saved in folder: "1_www.carousell.sg_files/img"
    • The html/imge/css import by href will be saved in folder: "1_www.carousell.sg_files/link"
    • The js file used by the page will be saved in fodler: "1_www.carousell.sg_files/script"

Problem and Solution

Problem[0]: Files download got slight different

Why there is a slight different between the files which download by using the program and the files which downlaod I use some-webBrowser's "page save as " for the same URL such as www.google.com

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

This is normal situation, the logic of web scrape and browser display are different: if you type www.google.ccom if different people's browser, you can see the page shown on different browser are also different. This is because the browser cache, token in the local storage , cookie will make influence of the "GET" request. So when different people type in the google URL in their browser, they can see their own Gmail Icon shows on the right top corner. If you remove all the cache, token in the local storage , cookie of your browser and try "page save as ", the file downloaded by "page save as " should be same as the program.

Problem[2]: Some download Image are empty

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

If a web use third party's storage to save the image and the net-storage need to authorization before download, our program download request will be reject and got 'null' when download the file. Then the saved image will be empty.


Last edit by LiuYuancheng([email protected]) at 13/11/2021

This script fully automates of downloading tiktok videos, editing them,compiling them and finally uploading them to youtube.

This script fully automates of downloading tiktok videos, editing them,compiling them and finally uploading them to youtube. If you wanted to create a tiktok video compiilation youtubbe channel this

Supriyo Sarkar 32 Dec 16, 2022
A Python script that allows you to download all of an anime's episodes at once.

BitAnime A Python script that allows you to download all of an anime's episodes at once. · Download executable version · About BitAnime BitAnime is a

sh1nobu 17 Aug 10, 2022
A YouTube downloader app built with Django.

YouTube Downloader ⭐️ Star this project ⭐️ Requirements Python3+ Git Installation Install the dependencies and start the server. git clone https://git

Gabriel Tavares 26 Aug 19, 2022
Downloads .ksy files and their dependencies straight from the official kaitai-struct format gallery.

ksy-dl Downloads .ksy files and their dependencies straight from the official kaitai-struct format gallery. This tool will: Fetch any of the official

3 Jun 20, 2022
Implementation of Cross-category Video Highlight Detection via Set-based Learning (ICCV 2021).

Cross-category Video Highlight Detection via Set-based Learning Introduction This project is an implementation of ``Cross-category Video Highlight Det

Minghao (Alan) Xu 49 Dec 17, 2022
This simple Python script allows you to download songs on Telegram🌸❤️😁

SongsDownloaderTgBot 📺 YouTube Song Downloader Bot For Telegram 🔮 3X Fast Telethon Based Bot ⚜ Open Source Bot 👨🏻‍💻 Demo : 𝗔𝗻𝗻𝗶𝗲 - 𝗘𝗹𝗶𝘇?

Sehath Perera 23 Dec 03, 2022
A Python package for downloading / archiving all available episodes from a podcast RSS feed.

allcasts 📻 🗃 A Python package for downloading all available episodes from a podcast RSS feed. Useful for making private archives of your favourite p

Lewis Gentle 5 Nov 20, 2022
Easy automated ebook downloader using openbooks as the backend

Easy automated ebook downloader using openbooks as the backend

27 Nov 06, 2022
Python library to download bulk of images from Bing.com

Python library to download bulk of images form Bing.com. This package uses async url, which makes it very fast while downloading.

Guru Prasad Singh 105 Dec 14, 2022
ImageScraper is a cross-platform tool for downloading a specified count from xkcd, Astronomy Picture of the Day and Existential Comics

ImageScraper The ImageScraper is a cross-platform tool for downloading a specified count from xkcd, Astronomy Picture of the Day and Existential Comic

1amnobody 1 Jan 25, 2022
Python code to crawl computer vision papers from top CV conferences. Currently it supports CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH

Python code to crawl computer vision papers from top CV conferences. Currently it supports CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH. It leverages selenium, a website testing framework to crawl

Xiaoyang Huang 39 Nov 21, 2022
Download Web-10K data by querying Bing Image Search

gpv2-web10k This repository contains the script to download images from the Web-10K dataset. The script takes in a list of queries, queries Bing Image

AI2 8 Sep 06, 2022
Throttle qBittorrent on Plex stream Start/Stop

Dependencies Python 3.6+ 'qbittorrent-api' Python Library Tautulli Script Setup Edit qbittorrent_throttle.py and set qBittorrent username, password an

6 Sep 24, 2022
利用python3,爬取并下载91porn网站上面的视频

91porn_python 利用python3,爬取并下载91porn网站上面的视频 增加爬取t66y论坛图片的脚本 该脚本支持一下功能: 支持多线程 下载视频有进度条显示 支持从特定页的特定视频开始下载 将m3u8和mp4格式的视频下载到不同文件夹,加以分类 自动过滤已经下载过的视频

253 Feb 23, 2021
TikTok downloader video without watermark from Telegram bot

⬇️ How to download video from Tik Tok via telegram bot? Send a link to the video from tik tok to our telegram bot and it will send you a video without

1 Mar 04, 2022
Heroic-gogdl - GOG Downloading module for Heroic Games Launcher

heroic-gogdl GOG download module for Heroic Games Launcher Purpose This will tak

Paweł Lidwin 36 Dec 23, 2022
Python youtube playlist downloader

Youtube-Playlist-Downloader-python 👍 This program is a simple Youtube playlist downloader where you input the playlist link, and then the desired pat

Pepczenko 2 Dec 25, 2021
New York Times Front Page Downloader.

TIMETRAVELER New York Times Front Page Downloader. Usage python3 timetraveler.py All data will be saved at ~/timetraveler/ Goals To keep a historica

Daeshon Jones 0 Oct 31, 2021
Um projeto modesto para baixar vídeos do youtube usando tkinter como gui

Youtube Downloader Um projeto modesto para baixar vídeos do youtube usando tkinter como gui Instalação dos requirements: python3 setup.py ou python se

Sunlyx 2 Nov 25, 2021