:arrow_double_down: Dumb downloader that scrapes the web

Related tags

Web Crawlingyou-get
Overview

You-Get

Build Status PyPI version Gitter

NOTICE: Read this if you are looking for the conventional "Issues" tab.


You-Get is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it.

Here's how you use you-get to download a video from YouTube:

$ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site:                YouTube
title:               Me at the zoo
stream:
    - itag:          43
      container:     webm
      quality:       medium
      size:          0.5 MiB (564215 bytes)
    # download-with: you-get --itag=43 [URL]

Downloading Me at the zoo.webm ...
 100% (  0.5/  0.5MB) ├██████████████████████████████████┤[1/1]    6 MB/s

Saving Me at the zoo.en.srt ... Done.

And here's why you might want to use it:

  • You enjoyed something on the Internet, and just want to download them for your own pleasure.
  • You watch your favorite videos online from your computer, but you are prohibited from saving them. You feel that you have no control over your own computer. (And it's not how an open Web is supposed to work.)
  • You want to get rid of any closed-source technology or proprietary JavaScript code, and disallow things like Flash running on your computer.
  • You are an adherent of hacker culture and free software.

What you-get can do for you:

  • Download videos / audios from popular websites such as YouTube, Youku, Niconico, and a bunch more. (See the full list of supported sites)
  • Stream an online video in your media player. No web browser, no more ads.
  • Download images (of interest) by scraping a web page.
  • Download arbitrary non-HTML contents, i.e., binary files.

Interested? Install it now and get started by examples.

Are you a Python programmer? Then check out the source and fork it!

Installation

Prerequisites

The following dependencies are necessary:

Option 1: Install via pip

The official release of you-get is distributed on PyPI, and can be installed easily from a PyPI mirror via the pip package manager. Note that you must use the Python 3 version of pip:

$ pip3 install you-get

Option 2: Install via Antigen (for Zsh users)

Add the following line to your .zshrc:

antigen bundle soimort/you-get

Option 3: Download from GitHub

You may either download the stable (identical with the latest release on PyPI) or the develop (more hotfixes, unstable features) branch of you-get. Unzip it, and put the directory containing the you-get script into your PATH.

Alternatively, run

$ [sudo] python3 setup.py install

Or

$ python3 setup.py install --user

to install you-get to a permanent path.

Option 4: Git clone

This is the recommended way for all developers, even if you don't often code in Python.

$ git clone git://github.com/soimort/you-get.git

Then put the cloned directory into your PATH, or run ./setup.py install to install you-get to a permanent path.

Option 5: Homebrew (Mac only)

You can install you-get easily via:

$ brew install you-get

Option 6: pkg (FreeBSD only)

You can install you-get easily via:

# pkg install you-get

Shell completion

Completion definitions for Bash, Fish and Zsh can be found in contrib/completion. Please consult your shell's manual for how to take advantage of them.

Upgrading

Based on which option you chose to install you-get, you may upgrade it via:

$ pip3 install --upgrade you-get

or download the latest release via:

$ you-get https://github.com/soimort/you-get/archive/master.zip

In order to get the latest develop branch without messing up the PIP, you can try:

$ pip3 install --upgrade git+https://github.com/soimort/[email protected]

Getting Started

Download a video

When you get a video of interest, you might want to use the --info/-i option to see all available quality and formats:

$ you-get -i 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site:                YouTube
title:               Me at the zoo
streams:             # Available quality and codecs
    [ DASH ] ____________________________________
    - itag:          242
      container:     webm
      quality:       320x240
      size:          0.6 MiB (618358 bytes)
    # download-with: you-get --itag=242 [URL]

    - itag:          395
      container:     mp4
      quality:       320x240
      size:          0.5 MiB (550743 bytes)
    # download-with: you-get --itag=395 [URL]

    - itag:          133
      container:     mp4
      quality:       320x240
      size:          0.5 MiB (498558 bytes)
    # download-with: you-get --itag=133 [URL]

    - itag:          278
      container:     webm
      quality:       192x144
      size:          0.4 MiB (392857 bytes)
    # download-with: you-get --itag=278 [URL]

    - itag:          160
      container:     mp4
      quality:       192x144
      size:          0.4 MiB (370882 bytes)
    # download-with: you-get --itag=160 [URL]

    - itag:          394
      container:     mp4
      quality:       192x144
      size:          0.4 MiB (367261 bytes)
    # download-with: you-get --itag=394 [URL]

    [ DEFAULT ] _________________________________
    - itag:          43
      container:     webm
      quality:       medium
      size:          0.5 MiB (568748 bytes)
    # download-with: you-get --itag=43 [URL]

    - itag:          18
      container:     mp4
      quality:       small
    # download-with: you-get --itag=18 [URL]

    - itag:          36
      container:     3gp
      quality:       small
    # download-with: you-get --itag=36 [URL]

    - itag:          17
      container:     3gp
      quality:       small
    # download-with: you-get --itag=17 [URL]

By default, the one on the top is the one you will get. If that looks cool to you, download it:

$ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site:                YouTube
title:               Me at the zoo
stream:
    - itag:          242
      container:     webm
      quality:       320x240
      size:          0.6 MiB (618358 bytes)
    # download-with: you-get --itag=242 [URL]

Downloading Me at the zoo.webm ...
 100% (  0.6/  0.6MB) ├██████████████████████████████████████████████████████████████████████████████┤[2/2]    2 MB/s
Merging video parts... Merged into Me at the zoo.webm

Saving Me at the zoo.en.srt ... Done.

(If a YouTube video has any closed captions, they will be downloaded together with the video file, in SubRip subtitle format.)

Or, if you prefer another format (mp4), just use whatever the option you-get shows to you:

$ you-get --itag=18 'https://www.youtube.com/watch?v=jNQXAC9IVRw'

Note:

  • At this point, format selection has not been generally implemented for most of our supported sites; in that case, the default format to download is the one with the highest quality.
  • ffmpeg is a required dependency, for downloading and joining videos streamed in multiple parts (e.g. on some sites like Youku), and for YouTube videos of 1080p or high resolution.
  • If you don't want you-get to join video parts after downloading them, use the --no-merge/-n option.

Download anything else

If you already have the URL of the exact resource you want, you can download it directly with:

$ you-get https://stallman.org/rms.jpg
Site:       stallman.org
Title:      rms
Type:       JPEG Image (image/jpeg)
Size:       0.06 MiB (66482 Bytes)

Downloading rms.jpg ...
100.0% (  0.1/0.1  MB) ├████████████████████████████████████████┤[1/1]  127 kB/s

Otherwise, you-get will scrape the web page and try to figure out if there's anything interesting to you:

$ you-get http://kopasas.tumblr.com/post/69361932517
Site:       Tumblr.com
Title:      kopasas
Type:       Unknown type (None)
Size:       0.51 MiB (536583 Bytes)

Site:       Tumblr.com
Title:      tumblr_mxhg13jx4n1sftq6do1_1280
Type:       Portable Network Graphics (image/png)
Size:       0.51 MiB (536583 Bytes)

Downloading tumblr_mxhg13jx4n1sftq6do1_1280.png ...
100.0% (  0.5/0.5  MB) ├████████████████████████████████████████┤[1/1]   22 MB/s

Note:

  • This feature is an experimental one and far from perfect. It works best on scraping large-sized images from popular websites like Tumblr and Blogger, but there is really no universal pattern that can apply to any site on the Internet.

Search on Google Videos and download

You can pass literally anything to you-get. If it isn't a valid URL, you-get will do a Google search and download the most relevant video for you. (It might not be exactly the thing you wish to see, but still very likely.)

$ you-get "Richard Stallman eats"

Pause and resume a download

You may use Ctrl+C to interrupt a download.

A temporary .download file is kept in the output directory. Next time you run you-get with the same arguments, the download progress will resume from the last session. In case the file is completely downloaded (the temporary .download extension is gone), you-get will just skip the download.

To enforce re-downloading, use the --force/-f option. (Warning: doing so will overwrite any existing file or temporary file with the same name!)

Set the path and name of downloaded file

Use the --output-dir/-o option to set the path, and --output-filename/-O to set the name of the downloaded file:

$ you-get -o ~/Videos -O zoo.webm 'https://www.youtube.com/watch?v=jNQXAC9IVRw'

Tips:

  • These options are helpful if you encounter problems with the default video titles, which may contain special characters that do not play well with your current shell / operating system / filesystem.
  • These options are also helpful if you write a script to batch download files and put them into designated folders with designated names.

Proxy settings

You may specify an HTTP proxy for you-get to use, via the --http-proxy/-x option:

$ you-get -x 127.0.0.1:8087 'https://www.youtube.com/watch?v=jNQXAC9IVRw'

However, the system proxy setting (i.e. the environment variable http_proxy) is applied by default. To disable any proxy, use the --no-proxy option.

Tips:

  • If you need to use proxies a lot (in case your network is blocking certain sites), you might want to use you-get with proxychains and set alias you-get="proxychains -q you-get" (in Bash).
  • For some websites (e.g. Youku), if you need access to some videos that are only available in mainland China, there is an option of using a specific proxy to extract video information from the site: --extractor-proxy/-y.

Watch a video

Use the --player/-p option to feed the video into your media player of choice, e.g. mpv or vlc, instead of downloading it:

$ you-get -p vlc 'https://www.youtube.com/watch?v=jNQXAC9IVRw'

Or, if you prefer to watch the video in a browser, just without ads or comment section:

$ you-get -p chromium 'https://www.youtube.com/watch?v=jNQXAC9IVRw'

Tips:

  • It is possible to use the -p option to start another download manager, e.g., you-get -p uget-gtk 'https://www.youtube.com/watch?v=jNQXAC9IVRw', though they may not play together very well.

Load cookies

Not all videos are publicly available to anyone. If you need to log in your account to access something (e.g., a private video), it would be unavoidable to feed the browser cookies to you-get via the --cookies/-c option.

Note:

  • As of now, we are supporting two formats of browser cookies: Mozilla cookies.sqlite and Netscape cookies.txt.

Reuse extracted data

Use --url/-u to get a list of downloadable resource URLs extracted from the page. Use --json to get an abstract of extracted data in the JSON format.

Warning:

  • For the time being, this feature has NOT been stabilized and the JSON schema may have breaking changes in the future.

Supported Sites

Site URL Videos? Images? Audios?
YouTube https://www.youtube.com/
Twitter https://twitter.com/
VK http://vk.com/
Vine https://vine.co/
Vimeo https://vimeo.com/
Veoh http://www.veoh.com/
Tumblr https://www.tumblr.com/
TED http://www.ted.com/
SoundCloud https://soundcloud.com/
SHOWROOM https://www.showroom-live.com/
Pinterest https://www.pinterest.com/
MTV81 http://www.mtv81.com/
Mixcloud https://www.mixcloud.com/
Metacafe http://www.metacafe.com/
Magisto http://www.magisto.com/
Khan Academy https://www.khanacademy.org/
Internet Archive https://archive.org/
Instagram https://instagram.com/
InfoQ http://www.infoq.com/presentations/
Imgur http://imgur.com/
Heavy Music Archive http://www.heavy-music.ru/
Freesound http://www.freesound.org/
Flickr https://www.flickr.com/
FC2 Video http://video.fc2.com/
Facebook https://www.facebook.com/
eHow http://www.ehow.com/
Dailymotion http://www.dailymotion.com/
Coub http://coub.com/
CBS http://www.cbs.com/
Bandcamp http://bandcamp.com/
AliveThai http://alive.in.th/
interest.me http://ch.interest.me/tvn
755
ナナゴーゴー
http://7gogo.jp/
niconico
ニコニコ動画
http://www.nicovideo.jp/
163
网易视频
网易云音乐
http://v.163.com/
http://music.163.com/
56网 http://www.56.com/
AcFun http://www.acfun.cn/
Baidu
百度贴吧
http://tieba.baidu.com/
爆米花网 http://www.baomihua.com/
bilibili
哔哩哔哩
http://www.bilibili.com/
豆瓣 http://www.douban.com/
斗鱼 http://www.douyutv.com/
凤凰视频 http://v.ifeng.com/
风行网 http://www.fun.tv/
iQIYI
爱奇艺
http://www.iqiyi.com/
激动网 http://www.joy.cn/
酷6网 http://www.ku6.com/
酷狗音乐 http://www.kugou.com/
酷我音乐 http://www.kuwo.cn/
乐视网 http://www.le.com/
荔枝FM http://www.lizhi.fm/
懒人听书 http://www.lrts.me/
秒拍 http://www.miaopai.com/
MioMio弹幕网 http://www.miomio.tv/
MissEvan
猫耳FM
http://www.missevan.com/
痞客邦 https://www.pixnet.net/
PPTV聚力 http://www.pptv.com/
齐鲁网 http://v.iqilu.com/
QQ
腾讯视频
http://v.qq.com/
企鹅直播 http://live.qq.com/
Sina
新浪视频
微博秒拍视频
http://video.sina.com.cn/
http://video.weibo.com/
Sohu
搜狐视频
http://tv.sohu.com/
Tudou
土豆
http://www.tudou.com/
阳光卫视 http://www.isuntv.com/
Youku
优酷
http://www.youku.com/
战旗TV http://www.zhanqi.tv/lives
央视网 http://www.cntv.cn/
Naver
네이버
http://tvcast.naver.com/
芒果TV http://www.mgtv.com/
火猫TV http://www.huomao.com/
阳光宽频网 http://www.365yg.com/
西瓜视频 https://www.ixigua.com/
新片场 https://www.xinpianchang.com/
快手 https://www.kuaishou.com/
抖音 https://www.douyin.com/
TikTok https://www.tiktok.com/
中国体育(TV) http://v.zhibo.tv/
http://video.zhibo.tv/
知乎 https://www.zhihu.com/

For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page.

Known bugs

If something is broken and you-get can't get you things you want, don't panic. (Yes, this happens all the time!)

Check if it's already a known problem on https://github.com/soimort/you-get/wiki/Known-Bugs. If not, follow the guidelines on how to report an issue.

Getting Involved

You can reach us on the Gitter channel #soimort/you-get (here's how you set up your IRC client for Gitter). If you have a quick question regarding you-get, ask it there.

If you are seeking to report an issue or contribute, please make sure to read the guidelines first.

Legal Issues

This software is distributed under the MIT license.

In particular, please be aware that

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Translated to human words:

In case your use of the software forms the basis of copyright infringement, or you use the software for any other illegal purposes, the authors cannot take any responsibility for you.

We only ship the code here, and how you are going to use it is left to your own discretion.

Authors

Made by @soimort, who is in turn powered by , 🍺 and 🍜 .

You can find the list of all contributors here.

Comments
  • [youku]update api

    [youku]update api

    NOTE:

    1. yxon not impled. F. I. 10 segs in hd3, 8 ones in hd2, 5 ones both in mp4 and flv. 28 extra network requests will be sent, which makes the program seem stuck or produces lots of noise in the debugging log. #1058

    2. Failed with http://v.youku.com/v_show/id_XMjc3MjQ0MjEwMA==.html. ep40 of Zetianji. But it works when downloading ep1 and ep45. Perhaps you should curse the CDN before a VIP user of youku find out what's wrong.


    1. yxon没有放进代码里。比如一个视频,hd3有10个片段,hd2有8个,mp4和flv各5个,那么一共需要发出28个网络请求,程序会看上去像卡住了一样,或者输出大量没什么用的debugging log。另外可以参考 #1058 ,有人需要。

    2. 择天记第40集下载会失败(hd3,其他分辨率正常)。但是择天记的第一集或是第45集都正常,选1080p需要vip所以也没法在浏览器里面看是不是正常。大概只能怪youku的CDN了。

    $ ./you-get -d 'http://v.youku.com/v_show/id_XMjc4MTQ3MzY5Mg==.html'
    [DEBUG] get_content: https://ups.youku.com/ups/get.json?vid=XMjc4MTQ3MzY5Mg==&ccode=0401&client_ip=192.168.1.1&utid=3nOsESeJhAUCAbZyXYm8UflX&client_ts=1495631326
    site:                优酷 (Youku)
    title:               李志、电声与管弦乐 03.尽头
    stream:
        - format:        hd3
          container:     flv
          video-profile: 1080P
          size:          60.3 MiB (63205554 bytes)
        # download-with: you-get --format=hd3 [URL]
    
    Downloading 李志、电声与管弦乐 03.尽头.flv ...
     100% ( 60.3/ 60.3MB) ├█████████████████████████████████████████┤[2/2]    1 MB/s
    Merging video parts... ffmpeg version 3.3.1 Copyright (c) 2000-2017 the FFmpeg developers
      built with Apple LLVM version 8.1.0 (clang-802.0.42)
      configuration: --prefix=/usr/local/Cellar/ffmpeg/3.3.1 --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-libmp3lame --enable-libx264 --enable-libxvid --enable-opencl --disable-lzma --enable-vda
      libavutil      55. 58.100 / 55. 58.100
      libavcodec     57. 89.100 / 57. 89.100
      libavformat    57. 71.100 / 57. 71.100
      libavdevice    57.  6.100 / 57.  6.100
      libavfilter     6. 82.100 /  6. 82.100
      libavresample   3.  5.  0 /  3.  5.  0
      libswscale      4.  6.100 /  4.  6.100
      libswresample   2.  7.100 /  2.  7.100
      libpostproc    54.  5.100 / 54.  5.100
    [flv @ 0x7fa849800a00] Auto-inserting h264_mp4toannexb bitstream filter
    Input #0, concat, from './李志、电声与管弦乐 03.尽头.mp4.txt':
      Duration: N/A, start: 0.000000, bitrate: 1877 kb/s
        Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1748 kb/s, 25.03 fps, 25 tbr, 1k tbn, 50 tbc
        Stream #0:1: Audio: aac (LC), 44100 Hz, stereo, fltp, 128 kb/s
    Output #0, mp4, to './李志、电声与管弦乐 03.尽头.mp4':
      Metadata:
        encoder         : Lavf57.71.100
        Stream #0:0: Video: h264 (High) ([33][0][0][0] / 0x0021), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 1748 kb/s, 25.03 fps, 25 tbr, 16k tbn, 1k tbc
        Stream #0:1: Audio: aac (LC) ([64][0][0][0] / 0x0040), 44100 Hz, stereo, fltp, 128 kb/s
    Stream mapping:
      Stream #0:0 -> #0:0 (copy)
      Stream #0:1 -> #0:1 (copy)
    Press [q] to stop, [?] for help
    [flv @ 0x7fa849800a00] Auto-inserting h264_mp4toannexb bitstream filter
    frame= 3451 fps=0.0 q=-1.0 size=   31623kB time=00:02:17.98 bitrate=1877.4kbits/frame= 4651 fps=4651 q=-1.0 size=   45169kB time=00:03:06.00 bitrate=1989.4kbitsframe= 5542 fps=3676 q=-1.0 size=   55299kB time=00:03:41.62 bitrate=2044.1kbitsframe= 6595 fps=3615 q=-1.0 Lsize=   61601kB time=00:04:23.85 bitrate=1912.5kbits/s speed= 145x
    video:57281kB audio:4123kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.320389%
    Merged into 李志、电声与管弦乐 03.尽头.mp4
    

    fix #2012 #2014 #2037 #1847

    It's a quick fix, far from satisfactory but too many users need youku.py.

    opened by rosynirvana 73
  • B站视频下载错误

    B站视频下载错误

    Error Traceback (most recent call last): File "/home/ylc/PycharmProjects/you-get/tests/test.py", line 44, in test_bilibili bilibili.download("https://www.bilibili.com/video/BV1Nx411f7MT/?spm_id_from=autoNext", info=True) File "/home/ylc/PycharmProjects/you-get/src/you_get/extractor.py", line 48, in download_by_url self.prepare(**kwargs) File "/home/ylc/PycharmProjects/you-get/src/you_get/extractors/bilibili.py", line 220, in prepare current_quality = playinfo['data']['quality'] or None # 0 indicates an error, fallback to None KeyError: 'data'

    opened by qcymkxyc 45
  • iqiyi: support more stream quality

    iqiyi: support more stream quality

    algorism form @ERioK thank you @ERioK

    Signed-off-by: Zhang Ning [email protected]

    for #1211 but no M3u8 parsing..

    python3 you-get -id http://www.iqiyi.com/v_19rrldw9sg.html 
    [DEBUG] get_content: http://cache.m.iqiyi.com/tmts/493448700/a0326d0224db24415f2c0ccacc800266/?t=1467202675054&sc=99cf652172f294d864d967ec6ce7d8ba&src=76f90cbd92f94a2e925d83e8ccd22cb7
    site:                爱奇艺 (Iqiyi)
    title:               狭路第1集
    streams:             # Available quality and codecs
        [ DEFAULT ] _________________________________
        - format:        BD
          container:     m3u8
          video-profile: 全高清
          size:          0.0 MiB (0 bytes)
        # download-with: you-get --format=BD [URL]
    
        - format:        FD
          container:     m3u8
          video-profile: 超高清
          size:          0.0 MiB (0 bytes)
        # download-with: you-get --format=FD [URL]
    
        - format:        TD
          container:     m3u8
          video-profile: 超清
          size:          0.0 MiB (0 bytes)
        # download-with: you-get --format=TD [URL]
    
        - format:        HD
          container:     m3u8
          video-profile: 高清
          size:          0.0 MiB (0 bytes)
        # download-with: you-get --format=HD [URL]
    
        - format:        SD
          container:     m3u8
          video-profile: 标清
          size:          0.0 MiB (0 bytes)
        # download-with: you-get --format=SD [URL]
    
        - format:        LD
          container:     m3u8
          video-profile: 流畅
          size:          0.0 MiB (0 bytes)
        # download-with: you-get --format=LD [URL]
    

    This change is Reviewable

    opened by zhangn1985 26
  • Fixed bilibili download 403 error

    Fixed bilibili download 403 error

    It's not just the UA's fault, because other browsers can access it successfully with the same UA. Add a Accept request-header and everything seems to work.

    opened by FSpark 18
  • bilibili multi fixes

    bilibili multi fixes

    ~~在common.py里加了get_content_and_redirected这样一个函数,以tuple返回content和redirected_url~~

    1. 用url_locations处理redirects,修正了#1693 test case
    ./you-get -i 'http://www.bilibili.com/video/av7873772/'
    Site:       bilibili.com
    Title:      URARA迷路帖 [1 少女与占卜、偶尔的露肚皮]
    Type:       Flash video (video/x-flv)
    Size:       345.69 MiB (362486000 Bytes)
    
    1. 正确处理bangumi的episode titles test case
    ./you-get -i 'http://bangumi.bilibili.com/anime/835/play#15014' 'http://bangumi.bilibili.com/anime/3462/play#89073' 'http://bangumi.bilibili.com/anime/1660/play'
    Site:       bilibili.com
    Title:      我们仍未知道那天所看见的花的名字。 [1 超和平busters]
    Type:       Flash video (video/x-flv)
    Size:       176.81 MiB (185402909 Bytes)
    
    Site:       bilibili.com
    Title:      双星之阴阳师 [17 师傅赋予的红色证明]
    Type:       Flash video (video/x-flv)
    Size:       259.48 MiB (272085344 Bytes)
    
    Site:       bilibili.com
    Title:      夏目友人帐 [1 猫和友人帐]
    Type:       Flash video (video/x-flv)
    Size:       143.16 MiB (150117218 Bytes)
    

    部分修正了 #1735

    1. 把timeout的try except block限定到download_urls上去,修正#1897 timeout增加到15
    ./you-get http://www.bilibili.com/video/av9954010/
    Site:       bilibili.com
    Title:      【西瓜新】一个小小的八音盒【有一点花絮】
    Type:       MPEG-4 video (video/mp4)
    Size:       25.6 MiB (26847580 Bytes)
    
    Downloading 【西瓜新】一个小小的八音盒【有一点花絮】.mp4 ...
     100% ( 25.6/ 25.6MB) ├█████████████████████████████████████████┤[1/1]    9 MB/s
    
    Downloading 【西瓜新】一个小小的八音盒【有一点花絮】.cmt.xml ...
    
     ./you-get http://bangumi.bilibili.com/anime/1660/play
    Site:       bilibili.com
    Title:      夏目友人帐 [1 猫和友人帐]
    Type:       Flash video (video/x-flv)
    Size:       143.16 MiB (150117218 Bytes)
    
    Downloading 夏目友人帐 [1 猫和友人帐].flv ...
     100% (143.2/143.2MB) ├█████████████████████████████████████████┤[1/1]  566 kB/s
    
    Downloading 夏目友人帐 [1 猫和友人帐].cmt.xml ...
    
    ./you-get http://www.bilibili.com/video/av10199346
    Site:       bilibili.com
    Title:      【GNZ48王盈】20170429 出道一周年联合特别公演《糖》王盈 陈桂君 农燕萍
    Type:       Flash video (video/x-flv)
    Size:       85.81 MiB (89977095 Bytes)
    
    Downloading 【GNZ48王盈】20170429 出道一周年联合特别公演《糖》王盈 陈桂君 农燕萍.flv ...
     100% ( 85.8/ 85.8MB) ├█████████████████████████████████████████┤[1/1]    5 MB/s
    
    Downloading 【GNZ48王盈】20170429 出道一周年联合特别公演《糖》王盈 陈桂君 农燕萍.cmt.xml ...
    

    现在不会重新打印print_info

    1. 修正了live.bilibili.com中cid动态加载的问题 test case:
    ./you-get -u 'http://live.bilibili.com/1329719'
    Site:       bilibili.com
    Title:      画点健康的,,身体吃不消l - 冰山小公主不二 - 哔哩哔哩直播
    Type:       Flash video (video/x-flv)
    Size:       0.0 MiB (0 Bytes)
    
    Real URLs:
    http://txy.live-play.acgvideo.com/live-txy/958553/live_3568670_2103044.flv?wsSecret=3df3922b3e4b0604f2f5bdf148d4eb50&wsTime=1493788374
    

    This change is Reviewable

    opened by rosynirvana 17
  • [youku] Update ccode for youku.py

    [youku] Update ccode for youku.py

    Update cccode to 0503 for youku.py

    Get from:

    http://g.alicdn.com/player/ykplayer/0.5.48/youku-player.min.js

    {"0505":"interior","050F":"interior","0501":"interior","0502":"interior","0503":"interior","0510":"adshow","0512":"BDskin","0590":"BDskin"}

    Before the fix:

    you-get http://v.youku.com/v_show/id_XMzU5NjkxNTM1Ng==.html
    you-get: 客户端无权播放,201
    

    After the fix:

    you-get http://v.youku.com/v_show/id_XMzU5NjkxNTM1Ng==.html
    site:                优酷 (Youku)
    title:               货币战争(下):头号玩家
    stream:
        - format:        mp4hd2v2
          container:     mp4
          video-profile: 超清
          size:          313.3 MiB (328519347 bytes)
          m3u8_url:      http://pl-ali.youku.com/playlist/m3u8?vid=XMzU5NjkxNTM1Ng%3D%3D&type=hd2&ups_client_netip=2656b782&utid=VDmbEwBJ5BICASZWt4LcxGeI&ccode=0590&psid=1a8f1df72d66951b32f9b69076859a6d&duration=3071&expire=18000&drm_type=1&drm_device=10&ups_ts=1528056664&onOff=0&encr=0&ups_key=89ee30ca8564351c4e59998d1fd08fcf
        # download-with: you-get --format=mp4hd2v2 [URL]
    
    opened by chenrui333 15
  • icourse: add supprt

    icourse: add supprt

    Add support for iCourses

    Should close #1451

    Known issues

    • Playlist download ONLY can originate from http://www.icourses.cn/coursestatic/course_*.html like sites
    • Sometimes, the server will refuse the request due to high traffic (rush hour?)
    • Sometimes, the request will fail due to poor network condition (but will retry)

    This change is Reviewable

    opened by liushuyu 15
  • bilibili无法正常下载及解决办法

    bilibili无法正常下载及解决办法

    bilibili downloading is not working, it shows you-get -d https://www.bilibili.com/video/av46679537/?p=2 [DEBUG] get_content: https://www.bilibili.com/video/av46679537/?p=2 [DEBUG] HTTP Error with code403 [DEBUG] HTTP Error with code403 [DEBUG] HTTP Error with code403 you-get: version 0.4.1328, a tiny downloader that scrapes the web. you-get: Namespace(URL=['https://www.bilibili.com/video/av46679537/?p=2'], auto_ rename=False, cookies=None, debug=True, extractor_proxy=None, force=False, forma t=None, help=False, http_proxy=None, info=False, input_file=None, insecure=False , itag=None, json=False, no_caption=False, no_merge=False, no_proxy=False, outpu t_dir='.', output_filename=None, password=None, player=None, playlist=False, ski p_existing_file_size_check=False, socks_proxy=None, stream=None, timeout=600, ur l=False, version=False) Traceback (most recent call last): File "e:\ruanjian2\python37\lib\runpy.py", line 193, in run_module_as_main "main", mod_spec) File "e:\ruanjian2\python37\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "E:\ruanjian2\python37\Scripts\you-get.exe_main.py", line 9, in File "e:\ruanjian2\python37\lib\site-packages\you_get_main.py", line 92, i n main main(**kwargs) File "e:\ruanjian2\python37\lib\site-packages\you_get\common.py", line 1759, i n main script_main(any_download, any_download_playlist, **kwargs) File "e:\ruanjian2\python37\lib\site-packages\you_get\common.py", line 1647, i n script_main **extra File "e:\ruanjian2\python37\lib\site-packages\you_get\common.py", line 1303, i n download_main download(url, **kwargs) File "e:\ruanjian2\python37\lib\site-packages\you_get\common.py", line 1750, i n any_download m.download(url, **kwargs) File "e:\ruanjian2\python37\lib\site-packages\you_get\extractor.py", line 48, in download_by_url self.prepare(**kwargs) File "e:\ruanjian2\python37\lib\site-packages\you_get\extractors\bilibili.py", line 173, in prepare initial_state = json.loads(initial_state_text) File "e:\ruanjian2\python37\lib\json_init_.py", line 341, in loads raise TypeError(f'the JSON object must be str, bytes or bytearray, ' TypeError: the JSON object must be str, bytes or bytearray, not NoneType

    解决方法及作出的修改: 原代码中User-Agent疑似被封, 修改User-Agent后恢复正常

    经测试修改后已经可以正常下载

    confirmed 
    opened by GuanFoxyier 14
  • Added test case for bilibili nonetype error

    Added test case for bilibili nonetype error

    You-get returned this error when I am trying to download video.

    Traceback (most recent call last):
      File "tests/test.py", line 35, in test_bilibili_nonetype_error
        bilibili.download('https://www.bilibili.com/video/av22195308/?p=3', playlist=True, output_dir='test_output', merge=True)
      File "/usr/lib/python3.7/site-packages/you_get/extractor.py", line 59, in download_by_url
        self.download(**kwargs)
      File "/usr/lib/python3.7/site-packages/you_get/extractor.py", line 201, in download
        self.p(stream_id)
      File "/usr/lib/python3.7/site-packages/you_get/extractor.py", line 139, in p
        self.p_stream(stream_id)
      File "/usr/lib/python3.7/site-packages/you_get/extractor.py", line 108, in p_stream
        if 'size' in stream and stream['container'].lower() != 'm3u8':
    AttributeError: 'NoneType' object has no attribute 'lower'
    

    Test is made based on the latest develop branch and the error remains.

    Test case is added to the test.py

    opened by RedTailBullet 14
  • Support ximalaya

    Support ximalaya

    test case:

    1. single
    ./you-get 'http://www.ximalaya.com/5916736/sound/27694473'
    Site:        ximalaya.com
    title:       【奇迹暖暖原创同人曲】叶格尔的情诗
    Type:        MPEG-4 audio m4a
    Size:        N/A
    Downloading 【奇迹暖暖原创同人曲】叶格尔的情诗.m4a ...
     100% (  1.8/  1.8MB) ├█████████████████████████████████████████┤[1/1]    2 MB/s
    
    1. album
    ./you-get -dl 'http://www.ximalaya.com/5916736/album/6293696'
    [DEBUG] get_content: http://www.ximalaya.com/5916736/album/6293696
    [DEBUG] get_content: http://www.ximalaya.com/5916736/album/6293696
    [DEBUG] get_content: http://www.ximalaya.com/tracks/27694500.json
    [DEBUG] ximalaya_download_by_id: http://audio.xmcdn.com/group22/M04/08/59/wKgJLlhjMMzD_aIHAC-t6uS6CVc270.m4a
    Site:        ximalaya.com
    title:       再起荣耀——《全职高手》宣传曲
    Type:        MPEG-4 audio m4a
    Size:        N/A
    Downloading 再起荣耀——《全职高手》宣传曲.m4a ...
     100% (  3.0/  3.0MB) ├█████████████████████████████████████████┤[1/1]    2 MB/s
    
    [DEBUG] get_content: http://www.ximalaya.com/tracks/27694473.json
    [DEBUG] ximalaya_download_by_id: http://audio.xmcdn.com/group21/M06/07/2E/wKgJLVhjMIiyRjdGAB1o9XQ265M942.m4a
    Site:        ximalaya.com
    title:       【奇迹暖暖原创同人曲】叶格尔的情诗
    Type:        MPEG-4 audio m4a
    Size:        N/A
    Skipping ./【奇迹暖暖原创同人曲】叶格尔的情诗.m4a: file already exists
    
    1. multi-page album tested with this

    This change is Reviewable

    opened by rosynirvana 14
  • [youku] Update cccode

    [youku] Update cccode

    Update cccode to 0510 for youku.py

    Get from:

    http://g.alicdn.com/player/ykplayer/0.5.28/youku-player.min.js

    {"0505":"interior","050F":"interior","0501":"interior","0502":"interior","0503":"interior","0510":"adshow","0512":"BDskin","0590":"BDskin"}

    Before the fix:

    you-get --debug http://v.youku.com/v_show/id_XMzU5NjkxNTM1Ng==.html?spm=a2h0z.8244218.2371631.5
    [DEBUG] get_content: https://ups.youku.com/ups/get.json?vid=XMzU5NjkxNTM1Ng==&ccode=0502&client_ip=192.168.1.1&utid=%2BoR%2BE7ve%2BDsCARi5QmohYOQw&client_ts=1526175486&ckey=DIl58SLFxFNndSV1GFNnMQVYkx1PP5tKe1siZu/86PR1u/Wh1Ptd%2BWOZsHHWxysSfAOhNJpdVWsdVJNsfJ8Sxd8WKVvNfAS8aS8fAOzYARzPyPc3JvtnPHjTdKfESTdnuTW6ZPvk2pNDh4uFzotgdMEFkzQ5wZVXl2Pf1/Y6hLK0OnCNxBj3%2Bnb0v72gZ6b0td%2BWOZsHHWxysSo/0y9D2K42SaB8Y/%2BaD2K42SaB8Y/%2BahU%2BWOZsHcrxysooUeND
    you-get: 用户账户异常、请重新登录
    

    After the fix:

    you-get --debug http://v.youku.com/v_show/id_XMzU5NjkxNTM1Ng==.html?spm=a2h0z.8244218.2371631.5
    [DEBUG] get_content: https://ups.youku.com/ups/get.json?vid=XMzU5NjkxNTM1Ng==&ccode=0510&client_ip=192.168.1.1&utid=IYV%2BE0QbrVcCARi5Qmqd6hjL&client_ts=1526175525&ckey=DIl58SLFxFNndSV1GFNnMQVYkx1PP5tKe1siZu/86PR1u/Wh1Ptd%2BWOZsHHWxysSfAOhNJpdVWsdVJNsfJ8Sxd8WKVvNfAS8aS8fAOzYARzPyPc3JvtnPHjTdKfESTdnuTW6ZPvk2pNDh4uFzotgdMEFkzQ5wZVXl2Pf1/Y6hLK0OnCNxBj3%2Bnb0v72gZ6b0td%2BWOZsHHWxysSo/0y9D2K42SaB8Y/%2BaD2K42SaB8Y/%2BahU%2BWOZsHcrxysooUeND
    site:                优酷 (Youku)
    title:               货币战争(下):头号玩家
    stream:
        - format:        mp4hd3v2
          container:     mp4
          video-profile: 1080P
          size:          567.0 MiB (594580265 bytes)
          m3u8_url:      http://pl-ali.youku.com/playlist/m3u8?vid=XMzU5NjkxNTM1Ng%3D%3D&type=hd3&ups_client_netip=18b9426a&utid=IYV%2BE0QbrVcCARi5Qmqd6hjL&ccode=0510&psid=b336626aa0973701c5248c01cfa2f3f8&duration=3071&expire=18000&drm_type=1&drm_device=7&ups_ts=1526175526&onOff=0&encr=0&ups_key=bcfaf87bad590f7a77bfa4ac5ffc8793
        # download-with: you-get --format=mp4hd3v2 [URL]
    
    opened by zhenglu0 13
  • Download multipage video collection

    Download multipage video collection

    When there're more than single page videos in a collection, Download all videos as current code will only handle first page.

    For 'space_channel_series' and 'space_channel_collection'

    opened by arix00 1
  • Add complete for shell by shtab

    Add complete for shell by shtab

    ❯ you-get --print-completion bash | sudo tee /usr/share/bash-completion/completions/you-get
    ❯ you-get --print-completion zsh | sudo tee /usr/share/zsh/site-functions/_you-get
    ❯ you-get --print-completion tcsh|sudo tee /etc/profile.d/you-get.completion.csh
    

    Because shtab's bug for zsh, I temporarily fix it and sent a PR to shtab. A fixed version:

    ❯ you-get --print-completion zsh | xsel -ib
    #compdef you-get
    
    # AUTOMATCALLY GENERATED by `shtab`
    
    
    _shtab_you_get_commands() {
      local _commands=(
        
      )
      _describe 'you-get commands' _commands
    }
    
    _shtab_you_get_options=(
      "(- : *)--print-completion[print shell completion script]:print_completion:(bash zsh tcsh)"
      {-V,--version}"[Print version and exit]"
      {-h,--help}"[Print this help message and exit]"
      {-i,--info}"[Print extracted information]"
      {-u,--url}"[Print extracted information with URLs]"
      "--json[Print extracted URLs in JSON format]"
      {-n,--no-merge}"[Do not merge video parts]"
      "--no-caption[Do not download captions (subtitles, lyrics, danmaku, ...)]"
      "--postfix[Postfix downloaded files with unique identifiers]"
      {-f,--force}"[Force overwriting existing files]"
      "--skip-existing-file-size-check[Skip existing file without checking file size]"
      {-F,--format}"[Set video format to STREAM_ID]:format:"
      {-O,--output-filename}"[Set output filename]:output_filename:_files"
      {-o,--output-dir}"[Set output directory]:output_dir:_files -/"
      {-p,--player}"[Stream extracted URL to a PLAYER]:player:"
      {-c,--cookies}"[Load cookies.txt or cookies.sqlite]:cookies:_files"
      {-t,--timeout}"[Set socket timeout]:timeout:"
      {-d,--debug}"[Show traceback and other debug info]"
      {-I,--input-file}"[Read non-playlist URLs from FILE]:input_file:_files"
      {-P,--password}"[Set video visit password to PASSWORD]:password:"
      {-l,--playlist}"[Prefer to download a playlist]"
      "--first[the first number]:first:"
      "--last[the last number]:last:"
      {--size,--page-size}"[the page size number]:size:"
      {-a,--auto-rename}"[Auto rename same name different files]"
      {-k,--insecure}"[ignore ssl errors]"
      {-x,--http-proxy}"[Use an HTTP proxy for downloading]:http_proxy:_hosts"
      {-y,--extractor-proxy}"[Use an HTTP proxy for extracting only]:extractor_proxy:_hosts"
      "--no-proxy[Never use a proxy]"
      {-s,--socks-proxy}"[Use an SOCKS5 proxy for downloading]:socks_proxy:_hosts_users"
      {-m,--m3u8}"[download video using an m3u8 url]"
      "(*)::URL:_urls"
    )
    
    
    _shtab_you_get() {
      local context state line curcontext="$curcontext"
    
      one_or_more='(-)*'
      reminder='(*)'
      if ((${_shtab_you_get_options[(I)${(q)one_or_more}*]} + ${_shtab_you_get_options[(I)${(q)reminder}*]} == 0)); then  # noqa: E501
        _shtab_you_get_options+=(': :_shtab_you_get_commands' '*::: :->you-get')
      fi
      _arguments -C $_shtab_you_get_options
    
      case $state in
        you-get)
          words=($line[1] "${words[@]}")
          (( CURRENT += 1 ))
          curcontext="${curcontext%:*:*}:_shtab_you_get-$line[1]:"
          case $line[1] in
            
          esac
      esac
    }
    
    # Custom Preamble
    _hosts_users() {
      _alternative 'hosts: :_hosts' 'users: :_users'
    }
    
    # End Custom Preamble
    
    
    typeset -A opt_args
    _shtab_you_get "$@"
    
    opened by Freed-Wu 2
  • QQ download fhd

    QQ download fhd

    腾讯视频导入 cookie 后,下载 1080P 版本。

    多个不同子域的同 key 不同值的 cookie,会互相覆盖导致认证失败,改为只加载同域 cookie。

    修改前

    
    you-get -i https://v.qq.com/x/cover/mzc00200mhgrghr/p00398zxwzu.html -c cookies.txt
    Site:       QQ.com
    Title:      能源_16
    Type:       MPEG-4 video (video/mp4)
    Size:       8.71 MiB (9135762 Bytes)
    

    修改后

    you-get -i https://v.qq.com/x/cover/mzc00200mhgrghr/p00398zxwzu.html -c cookies.txt
    Site:       QQ.com
    Title:      能源_16
    Type:       MPEG-4 video (video/mp4)
    Size:       70.1 MiB (73504132 Bytes)
    
    opened by 4ft35t 2
Releases(v0.4.1650)
Automatically download and crop key information from the arxiv daily paper.

Arxiv daily 速览 功能:按关键词筛选arxiv每日最新paper,自动获取摘要,自动截取文中表格和图片。 1 测试环境 Ubuntu 16+ Python3.7 torch 1.9 Colab GPU 2 使用演示 首先下载权重baiduyun 提取码:il87,放置于code/Pars

HeoLis 20 Jul 30, 2022
This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

Devansh Singh 1 Feb 10, 2022
Snowflake database loading utility with Scrapy integration

Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

Oleg T. 0 Dec 06, 2021
Scrap the 42 Intranet's elearning videos in a single click

42intra_scraper Scrap the 42 Intranet's elearning videos in a single click. Why you would want to use it ? Adjust speed at your convenience. (The intr

Noufel 5 Oct 27, 2022
A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https

Geminid Systems, Inc 6 Aug 10, 2022
Amazon scraper using scrapy, a python framework for crawling websites.

#Amazon-web-scraper This is a python program, which use scrapy python framework to crawl all pages of the product and scrap products data. This progra

Akash Das 1 Dec 26, 2021
A simplistic scraper made to download tons of random screenshots made by people.

printStealer 1.1 What is this tool? This tool is developed to show the insecurity of the screenshot utility called prnt sc. It is a site that stores s

appelsiensam 4 Jul 26, 2022
Basic-html-scraper - A complete how to of web scraping with Python for beginners

basic-html-scraper Code from YT Video This video includes a complete how to of w

John 12 Oct 22, 2022
Complete pipeline for crawling online newspaper article.

Complete pipeline for crawling online newspaper article. The articles are stored to MongoDB. The whole pipeline is dockerized, thus the user does not need to worry about dependencies. Additionally, d

newspipe 4 May 27, 2022
此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

N0el4kLs 5 Nov 19, 2021
Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Playwright Browser Pool This example illustrates how it's possible to use a pool of browsers to retrieve page urls in a single asynchronous process. i

Bernardas Ališauskas 8 Oct 27, 2022
Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

SpaceX Sofware I developed software to scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info to use the software you need Python a

Maxence Rémy 16 Aug 02, 2022
Examine.com supplement research scraper!

ExamineScraper Examine.com supplement research scraper! Why I want to be able to search pages for a specific term. For example, I want to be able to s

Tyler 15 Dec 06, 2022
A modern CSS selector implementation for BeautifulSoup

Soup Sieve Overview Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filter

Isaac Muse 151 Dec 23, 2022
A tool can scrape product in aliexpress: Title, Price, and URL Product.

Scrape-Product-Aliexpress A tool can scrape product in aliexpress: Title, Price, and URL Product. Usage: 1. Install Python 3.8 3.9 padahal halaman ins

Rahul Joshua Damanik 1 Dec 30, 2021
a small library for extracting rich content from urls

A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o

Charles Leifer 588 Dec 27, 2022
Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes.

Pyrics Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes. ./test/run.py provides the full function in terminal cmd

MisterDK 1 Feb 12, 2022
A scrapy pipeline that provides an easy way to store files and images using various folder structures.

scrapy-folder-tree This is a scrapy pipeline that provides an easy way to store files and images using various folder structures. Supported folder str

Panagiotis Simakis 7 Oct 23, 2022
Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

Comment Webpage Screenshot is a GitHub Action that helps maintainers visually review HTML file changes introduced on a Pull Request by adding comments with the screenshots of the latest HTML file cha

Maksudul Haque 21 Sep 29, 2022
Scrapes the Sun Life of Canada Philippines web site for historical prices of their investment funds and then saves them as CSV files.

slocpi-scraper Sun Life of Canada Philippines Inc Investment Funds Scraper Install dependencies pip install -r requirements.txt Usage General format:

Daryl Yu 2 Jan 07, 2022