High available distributed ip proxy pool, powerd by Scrapy and Redis

Overview

高可用IP代理池

README | 中文文档

本项目所采集的IP资源都来自互联网,愿景是为大型爬虫项目提供一个高可用低延迟的高匿IP代理池

项目亮点

  • 代理来源丰富
  • 代理抓取提取精准
  • 代理校验严格合理
  • 监控完备,鲁棒性强
  • 架构灵活,便于扩展
  • 各个组件分布式部署

快速开始

注意,代码请在release列表中下载,master分支的代码不保证能稳定运行

单机部署

服务端

  • 安装Python3和Redis。有问题可以阅读这篇文章的相关部分。

  • 根据Redis的实际配置修改项目配置文件config/settings.py中的REDIS_HOSTREDIS_PASSWORD等参数。

  • 安装scrapy-splash,并修改配置文件config/settings.py中的SPLASH_URL

  • 安装项目相关依赖

    pip install -r requirements.txt

  • 启动scrapy worker,包括代理IP采集器和校验器

    python crawler_booter.py --usage crawler

    python crawler_booter.py --usage validator

  • 启动调度器,包括代理IP定时调度和校验

    python scheduler_booter.py --usage crawler

    python scheduler_booter.py --usage validator

客户端

近日不断有同学问,如何获取该项目中可用的代理IP列表。haipproxy提供代理的方式并不是通过api api来提供,而是通过具体的客户端来提供。 目前支持的是Python客户端和语言无关的squid二级代理

python客户端调用示例

from client.py_cli import ProxyFetcher
args = dict(host='127.0.0.1', port=6379, password='123456', db=0)
# 这里`zhihu`的意思是,去和`zhihu`相关的代理ip校验队列中获取ip
# 这么做的原因是同一个代理IP对不同网站代理效果不同
fetcher = ProxyFetcher('zhihu', strategy='greedy', redis_args=args)
# 获取一个可用代理
print(fetcher.get_proxy())
# 获取可用代理列表
print(fetcher.get_proxies()) # or print(fetcher.pool)

更具体的示例见examples/zhihu

squid作为二级代理

  • 安装squid,备份squid的配置文件并且启动squid,以ubuntu为例

    sudo apt-get install squid

    sudo sed -i 's/http_access deny all/http_access allow all/g' /etc/squid/squid.conf

    sudo cp /etc/squid/squid.conf /etc/squid/squid.conf.backup

    sudo service squid start

  • 根据操作系统修改项目配置文件config/settings.py中的SQUID_BIN_PATHSQUID_CONF_PATHSQUID_TEMPLATE_PATH等参数

  • 启动squid conf的定时更新程序

    sudo python squid_update.py

  • 使用squid作为代理中间层请求目标网站,默认代理URL为'http://squid_host:3128',用Python请求示例如下

    import requests
    proxies = {'https': 'http://127.0.0.1:3128'}
    resp = requests.get('https://httpbin.org/ip', proxies=proxies)
    print(resp.text)

Docker部署

  • 安装Docker

  • 安装docker-compose

    pip install -U docker-compose

  • 修改settings.py中的SPLASH_URLREDIS_HOST参数

    # 注意,如果您使用master分支下的代码,这步可被省略
    SPLASH_URL = 'http://splash:8050'
    REDIS_HOST = 'redis'
  • 使用docker-compose启动各个应用组件

    docker-compose up

这种方式会一同部署squid,您可以通过squid调用代理IP池,也可以使用客户端调用,和单机部署调用方式一样

注意事项

  • 本项目高度依赖Redis,除了消息通信和数据存储之外,IP校验和任务定时工具也使用了Redis中的多种数据结构。 如果需要替换Redis,请自行度量
  • 由于GFW的原因,某些网站需要通过科学上网才能进行访问和采集,如果用户无法访问墙外的网站,请将rules.py task_queue SPIDER_GFW_TASKSPIDER_AJAX_GFW_TASK的任务enable属性设置为0或者启动爬虫的时候指定爬虫类型为commonajax

    python crawler_booter.py --usage crawler common ajax

  • 相同代理IP,对于不同网站的代理效果可能大不相同。如果通用代理无法满足您的需求,您可以为特定网站编写代理IP校验器

工作流程

效果测试

以单机模式部署haipproxy测试代码,以知乎为目标请求站点,实测抓取效果如下

测试代码见examples/zhihu

项目监控(可选)

项目监控主要通过sentryprometheus,通过在关键地方 进行业务埋点对项目各个维度进行监测,以提高项目的鲁棒性

项目使用SentryBug Trace工具,通过Sentry可以很容易跟踪项目健康情况

使用Prometheus+Grafana做业务监控,了解项目当前状态

捐赠作者

开源不易,如果本项目对您有用,不妨进行小额捐赠,以支持项目的持续维护

同类项目

本项目参考了Github上开源的各个爬虫代理的实现,感谢他们的付出,下面是笔者参考的所有项目,排名不分先后。

dungproxy

proxyspider

ProxyPool

proxy_pool

ProxyPool

IPProxyTool

IPProxyPool

proxy_list

proxy_pool

ProxyPool

scylla

Comments
  • Docker 部署问题

    Docker 部署问题

    docker-compose up 的时候 到了14/15 报错 尝试了各种解决方案 各种操作系统 都不行 就这一个问题折腾好几个小时了 以下是报错

    Collecting cffi>=1.7 (from cryptography>=2.1.4->pyOpenSSL->Scrapy==1.5.0->-r requirements.txt (line 1)) Downloading https://pypi.doubanio.com/packages/59/cc/0e1635b4951021ef35f5c92b32c865ae605fac2a19d724fb6ff99d745c81/cffi-1.11.5-cp35-cp35m-manylinux1_x86_64.whl (420kB) Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python3/dist-packages (from zope.interface>=4.0.2->Twisted>=13.1.0->Scrapy==1.5.0->-r requirements.txt (line 1)) Collecting pycparser (from cffi>=1.7->cryptography>=2.1.4->pyOpenSSL->Scrapy==1.5.0->-r requirements.txt (line 1)) Downloading https://pypi.doubanio.com/packages/8c/2d/aad7f16146f4197a11f8e91fb81df177adcc2073d36a17b1491fd09df6ed/pycparser-2.18.tar.gz (245kB) Building wheels for collected packages: PyDispatcher, Twisted, cryptography, pycparser Running setup.py bdist_wheel for PyDispatcher: started Running setup.py bdist_wheel for PyDispatcher: finished with status 'done' Stored in directory: /root/.cache/pip/wheels/a4/bb/5b/62151ae4ace1811e779c41f59a3a7b1a2243fa9a5611be4a7d Running setup.py bdist_wheel for Twisted: started Running setup.py bdist_wheel for Twisted: finished with status 'done' Stored in directory: /root/.cache/pip/wheels/de/88/b1/0662e9a08f542b8943f0e13306a6aaf096643b8741c943edd0 Running setup.py bdist_wheel for cryptography: started Running setup.py bdist_wheel for cryptography: finished with status 'error' Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-build-2k1hxp0x/cryptography/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" bdist_wheel -d /tmp/tmp7cz9qgczpip-wheel- --python-tag cp35: c/_cffi_backend.c:15:17: fatal error: ffi.h: No such file or directory compilation terminated. Traceback (most recent call last): File "/usr/lib/python3.5/distutils/unixccompiler.py", line 118, in _compile extra_postargs) File "/usr/lib/python3.5/distutils/ccompiler.py", line 909, in spawn spawn(cmd, dry_run=self.dry_run) File "/usr/lib/python3.5/distutils/spawn.py", line 36, in spawn _spawn_posix(cmd, search_path, dry_run=dry_run) File "/usr/lib/python3.5/distutils/spawn.py", line 159, in _spawn_posix % (cmd, exit_status)) distutils.errors.DistutilsExecError: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/lib/python3.5/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands self.run_command(cmd) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/usr/lib/python3/dist-packages/setuptools/command/bdist_egg.py", line 161, in run cmd = self.call_command('install_lib', warn_dir=0) File "/usr/lib/python3/dist-packages/setuptools/command/bdist_egg.py", line 147, in call_command self.run_command(cmdname) File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/usr/lib/python3/dist-packages/setuptools/command/install_lib.py", line 23, in run self.build() File "/usr/lib/python3.5/distutils/command/install_lib.py", line 109, in build self.run_command('build_ext') File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 49, in run _build_ext.run(self) File "/usr/lib/python3.5/distutils/command/build_ext.py", line 338, in run self.build_extensions() File "/usr/lib/python3.5/distutils/command/build_ext.py", line 447, in build_extensions self._build_extensions_serial() File "/usr/lib/python3.5/distutils/command/build_ext.py", line 472, in _build_extensions_serial self.build_extension(ext) File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 174, in build_extension _build_ext.build_extension(self, ext) File "/usr/lib/python3.5/distutils/command/build_ext.py", line 532, in build_extension depends=ext.depends) File "/usr/lib/python3.5/distutils/ccompiler.py", line 574, in compile self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts) File "/usr/lib/python3.5/distutils/unixccompiler.py", line 120, in _compile raise CompileError(msg) distutils.errors.CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 154, in save_modules yield saved File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context yield File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 243, in run_setup DirectorySandbox(setup_dir).run(runner) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 273, in run return func() File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 242, in runner _execfile(setup_script, ns) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 46, in _execfile exec(code, globals, locals) File "/tmp/easy_install-rxlnrgt4/cffi-1.11.5/setup.py", line 240, in def run_tests(self): File "/usr/lib/python3.5/distutils/core.py", line 163, in setup raise SystemExit("error: " + str(msg)) SystemExit: error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1087, in run_setup run_setup(setup_script, args) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 246, in run_setup raise File "/usr/lib/python3.5/contextlib.py", line 77, in exit self.gen.throw(type, value, traceback) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context yield File "/usr/lib/python3.5/contextlib.py", line 77, in exit self.gen.throw(type, value, traceback) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 166, in save_modules saved_exc.resume() File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 141, in resume six.reraise(type, exc, self._tb) File "/usr/lib/python3/dist-packages/pkg_resources/_vendor/six.py", line 685, in reraise raise value.with_traceback(tb) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 154, in save_modules yield saved File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context yield File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 243, in run_setup DirectorySandbox(setup_dir).run(runner) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 273, in run return func() File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 242, in runner _execfile(setup_script, ns) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 46, in _execfile exec(code, globals, locals) File "/tmp/easy_install-rxlnrgt4/cffi-1.11.5/setup.py", line 240, in def run_tests(self): File "/usr/lib/python3.5/distutils/core.py", line 163, in setup raise SystemExit("error: " + str(msg)) SystemExit: error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "", line 1, in File "/tmp/pip-build-2k1hxp0x/cryptography/setup.py", line 319, in **keywords_with_side_effects(sys.argv) File "/usr/lib/python3.5/distutils/core.py", line 108, in setup _setup_distribution = dist = klass(attrs) File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 269, in init self.fetch_build_eggs(attrs['setup_requires']) File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 313, in fetch_build_eggs replace_conflicting=True, File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 826, in resolve dist = best[req.key] = env.best_match(req, ws, installer) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1092, in best_match return self.obtain(req, installer) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1104, in obtain return installer(requirement) File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 380, in fetch_build_egg return cmd.easy_install(req) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 663, in easy_install return self.install_item(spec, dist.location, tmpdir, deps) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 693, in install_item dists = self.install_eggs(spec, download, tmpdir) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 873, in install_eggs return self.build_and_install(setup_script, setup_base) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1101, in build_and_install self.run_setup(setup_script, setup_base, args) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1089, in run_setup raise DistutilsError("Setup script exited with %s" % (v.args[0],)) distutils.errors.DistutilsError: Setup script exited with error: command 'x86_64-linux-gnu-gcc' failed with exit status 1


    Failed building wheel for cryptography Running setup.py clean for cryptography Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-build-2k1hxp0x/cryptography/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" clean --all: c/_cffi_backend.c:15:17: fatal error: ffi.h: No such file or directory compilation terminated. Traceback (most recent call last): File "/usr/lib/python3.5/distutils/unixccompiler.py", line 118, in _compile extra_postargs) File "/usr/lib/python3.5/distutils/ccompiler.py", line 909, in spawn spawn(cmd, dry_run=self.dry_run) File "/usr/lib/python3.5/distutils/spawn.py", line 36, in spawn _spawn_posix(cmd, search_path, dry_run=dry_run) File "/usr/lib/python3.5/distutils/spawn.py", line 159, in _spawn_posix % (cmd, exit_status)) distutils.errors.DistutilsExecError: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/lib/python3.5/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands self.run_command(cmd) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/usr/lib/python3/dist-packages/setuptools/command/bdist_egg.py", line 161, in run cmd = self.call_command('install_lib', warn_dir=0) File "/usr/lib/python3/dist-packages/setuptools/command/bdist_egg.py", line 147, in call_command self.run_command(cmdname) File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/usr/lib/python3/dist-packages/setuptools/command/install_lib.py", line 23, in run self.build() File "/usr/lib/python3.5/distutils/command/install_lib.py", line 109, in build self.run_command('build_ext') File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 49, in run _build_ext.run(self) File "/usr/lib/python3.5/distutils/command/build_ext.py", line 338, in run self.build_extensions() File "/usr/lib/python3.5/distutils/command/build_ext.py", line 447, in build_extensions self._build_extensions_serial() File "/usr/lib/python3.5/distutils/command/build_ext.py", line 472, in _build_extensions_serial self.build_extension(ext) File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 174, in build_extension _build_ext.build_extension(self, ext) File "/usr/lib/python3.5/distutils/command/build_ext.py", line 532, in build_extension depends=ext.depends) File "/usr/lib/python3.5/distutils/ccompiler.py", line 574, in compile self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts) File "/usr/lib/python3.5/distutils/unixccompiler.py", line 120, in _compile raise CompileError(msg) distutils.errors.CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 154, in save_modules yield saved File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context yield File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 243, in run_setup DirectorySandbox(setup_dir).run(runner) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 273, in run return func() File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 242, in runner _execfile(setup_script, ns) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 46, in _execfile exec(code, globals, locals) File "/tmp/easy_install-f142_vde/cffi-1.11.5/setup.py", line 240, in def run_tests(self): File "/usr/lib/python3.5/distutils/core.py", line 163, in setup raise SystemExit("error: " + str(msg)) SystemExit: error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1087, in run_setup run_setup(setup_script, args) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 246, in run_setup raise File "/usr/lib/python3.5/contextlib.py", line 77, in exit self.gen.throw(type, value, traceback) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context yield File "/usr/lib/python3.5/contextlib.py", line 77, in exit self.gen.throw(type, value, traceback) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 166, in save_modules saved_exc.resume() File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 141, in resume six.reraise(type, exc, self._tb) File "/usr/lib/python3/dist-packages/pkg_resources/_vendor/six.py", line 685, in reraise raise value.with_traceback(tb) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 154, in save_modules yield saved File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context yield File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 243, in run_setup DirectorySandbox(setup_dir).run(runner) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 273, in run return func() File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 242, in runner _execfile(setup_script, ns) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 46, in _execfile exec(code, globals, locals) File "/tmp/easy_install-f142_vde/cffi-1.11.5/setup.py", line 240, in def run_tests(self): File "/usr/lib/python3.5/distutils/core.py", line 163, in setup raise SystemExit("error: " + str(msg)) SystemExit: error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "", line 1, in File "/tmp/pip-build-2k1hxp0x/cryptography/setup.py", line 319, in **keywords_with_side_effects(sys.argv) File "/usr/lib/python3.5/distutils/core.py", line 108, in setup _setup_distribution = dist = klass(attrs) File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 269, in init self.fetch_build_eggs(attrs['setup_requires']) File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 313, in fetch_build_eggs replace_conflicting=True, File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 826, in resolve dist = best[req.key] = env.best_match(req, ws, installer) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1092, in best_match return self.obtain(req, installer) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1104, in obtain return installer(requirement) File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 380, in fetch_build_egg return cmd.easy_install(req) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 663, in easy_install return self.install_item(spec, dist.location, tmpdir, deps) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 693, in install_item dists = self.install_eggs(spec, download, tmpdir) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 873, in install_eggs return self.build_and_install(setup_script, setup_base) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1101, in build_and_install self.run_setup(setup_script, setup_base, args) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1089, in run_setup raise DistutilsError("Setup script exited with %s" % (v.args[0],)) distutils.errors.DistutilsError: Setup script exited with error: command 'x86_64-linux-gnu-gcc' failed with exit status 1


    Failed cleaning build dir for cryptography Running setup.py bdist_wheel for pycparser: started Running setup.py bdist_wheel for pycparser: finished with status 'done' Stored in directory: /root/.cache/pip/wheels/37/33/b4/350db3d57a6f530e7b9315c3e0fcedd631d06e4f47d269ec26 Successfully built PyDispatcher Twisted pycparser Failed to build cryptography Installing collected packages: six, PyDispatcher, idna, asn1crypto, pycparser, cffi, cryptography, pyOpenSSL, w3lib, zope.interface, constantly, incremental, attrs, Automat, hyperlink, Twisted, cssselect, lxml, parsel, pyasn1, pyasn1-modules, service-identity, queuelib, Scrapy, urllib3, certifi, chardet, requests, redis, schedule, scrapy-splash, click Running setup.py install for cryptography: started Running setup.py install for cryptography: finished with status 'error' Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-build-2k1hxp0x/cryptography/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-9a9fk4sl-record/install-record.txt --single-version-externally-managed --compile: /usr/lib/python3.5/distutils/dist.py:261: UserWarning: Unknown distribution option: 'python_requires' warnings.warn(msg) running install running build running build_py creating build creating build/lib.linux-x86_64-3.5 creating build/lib.linux-x86_64-3.5/cryptography copying src/cryptography/init.py -> build/lib.linux-x86_64-3.5/cryptography copying src/cryptography/utils.py -> build/lib.linux-x86_64-3.5/cryptography copying src/cryptography/about.py -> build/lib.linux-x86_64-3.5/cryptography copying src/cryptography/fernet.py -> build/lib.linux-x86_64-3.5/cryptography copying src/cryptography/exceptions.py -> build/lib.linux-x86_64-3.5/cryptography creating build/lib.linux-x86_64-3.5/cryptography/hazmat copying src/cryptography/hazmat/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat creating build/lib.linux-x86_64-3.5/cryptography/x509 copying src/cryptography/x509/extensions.py -> build/lib.linux-x86_64-3.5/cryptography/x509 copying src/cryptography/x509/init.py -> build/lib.linux-x86_64-3.5/cryptography/x509 copying src/cryptography/x509/name.py -> build/lib.linux-x86_64-3.5/cryptography/x509 copying src/cryptography/x509/oid.py -> build/lib.linux-x86_64-3.5/cryptography/x509 copying src/cryptography/x509/base.py -> build/lib.linux-x86_64-3.5/cryptography/x509 copying src/cryptography/x509/certificate_transparency.py -> build/lib.linux-x86_64-3.5/cryptography/x509 copying src/cryptography/x509/general_name.py -> build/lib.linux-x86_64-3.5/cryptography/x509 creating build/lib.linux-x86_64-3.5/cryptography/hazmat/bindings copying src/cryptography/hazmat/bindings/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/bindings creating build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives copying src/cryptography/hazmat/primitives/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives copying src/cryptography/hazmat/primitives/constant_time.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives copying src/cryptography/hazmat/primitives/mac.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives copying src/cryptography/hazmat/primitives/serialization.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives copying src/cryptography/hazmat/primitives/hmac.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives copying src/cryptography/hazmat/primitives/cmac.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives copying src/cryptography/hazmat/primitives/padding.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives copying src/cryptography/hazmat/primitives/keywrap.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives copying src/cryptography/hazmat/primitives/hashes.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives creating build/lib.linux-x86_64-3.5/cryptography/hazmat/backends copying src/cryptography/hazmat/backends/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends copying src/cryptography/hazmat/backends/interfaces.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends creating build/lib.linux-x86_64-3.5/cryptography/hazmat/bindings/openssl copying src/cryptography/hazmat/bindings/openssl/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/bindings/openssl copying src/cryptography/hazmat/bindings/openssl/_conditional.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/bindings/openssl copying src/cryptography/hazmat/bindings/openssl/binding.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/bindings/openssl creating build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/twofactor copying src/cryptography/hazmat/primitives/twofactor/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/twofactor copying src/cryptography/hazmat/primitives/twofactor/utils.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/twofactor copying src/cryptography/hazmat/primitives/twofactor/totp.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/twofactor copying src/cryptography/hazmat/primitives/twofactor/hotp.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/twofactor creating build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/asymmetric copying src/cryptography/hazmat/primitives/asymmetric/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/asymmetric copying src/cryptography/hazmat/primitives/asymmetric/x25519.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/asymmetric copying src/cryptography/hazmat/primitives/asymmetric/dsa.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/asymmetric copying src/cryptography/hazmat/primitives/asymmetric/utils.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/asymmetric copying src/cryptography/hazmat/primitives/asymmetric/dh.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/asymmetric copying src/cryptography/hazmat/primitives/asymmetric/rsa.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/asymmetric copying src/cryptography/hazmat/primitives/asymmetric/padding.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/asymmetric copying src/cryptography/hazmat/primitives/asymmetric/ec.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/asymmetric creating build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/kdf copying src/cryptography/hazmat/primitives/kdf/hkdf.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/kdf copying src/cryptography/hazmat/primitives/kdf/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/kdf copying src/cryptography/hazmat/primitives/kdf/x963kdf.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/kdf copying src/cryptography/hazmat/primitives/kdf/scrypt.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/kdf copying src/cryptography/hazmat/primitives/kdf/pbkdf2.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/kdf copying src/cryptography/hazmat/primitives/kdf/concatkdf.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/kdf copying src/cryptography/hazmat/primitives/kdf/kbkdf.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/kdf creating build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/ciphers copying src/cryptography/hazmat/primitives/ciphers/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/ciphers copying src/cryptography/hazmat/primitives/ciphers/modes.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/ciphers copying src/cryptography/hazmat/primitives/ciphers/aead.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/ciphers copying src/cryptography/hazmat/primitives/ciphers/algorithms.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/ciphers copying src/cryptography/hazmat/primitives/ciphers/base.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/primitives/ciphers creating build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/decode_asn1.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/init.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/x25519.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/dsa.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/aead.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/utils.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/x509.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/ciphers.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/hmac.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/dh.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/rsa.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/cmac.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/ec.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/encode_asn1.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/backend.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl copying src/cryptography/hazmat/backends/openssl/hashes.py -> build/lib.linux-x86_64-3.5/cryptography/hazmat/backends/openssl running egg_info writing src/cryptography.egg-info/PKG-INFO writing top-level names to src/cryptography.egg-info/top_level.txt writing dependency_links to src/cryptography.egg-info/dependency_links.txt writing requirements to src/cryptography.egg-info/requires.txt warning: manifest_maker: standard file '-c' not found

    reading manifest file 'src/cryptography.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    no previously-included directories found matching 'docs/_build'
    warning: no previously-included files matching '*' found under directory 'vectors'
    writing manifest file 'src/cryptography.egg-info/SOURCES.txt'
    running build_ext
    generating cffi module 'build/temp.linux-x86_64-3.5/_padding.c'
    creating build/temp.linux-x86_64-3.5
    generating cffi module 'build/temp.linux-x86_64-3.5/_constant_time.c'
    generating cffi module 'build/temp.linux-x86_64-3.5/_openssl.c'
    building '_openssl' extension
    creating build/temp.linux-x86_64-3.5/build
    creating build/temp.linux-x86_64-3.5/build/temp.linux-x86_64-3.5
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -c build/temp.linux-x86_64-3.5/_openssl.c -o build/temp.linux-x86_64-3.5/build/temp.linux-x86_64-3.5/_openssl.o -Wconversion -Wno-error=sign-conversion
    build/temp.linux-x86_64-3.5/_openssl.c:493:30: fatal error: openssl/opensslv.h: No such file or directory
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    
    ----------------------------------------
    

    Command "/usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-build-2k1hxp0x/cryptography/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-9a9fk4sl-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-2k1hxp0x/cryptography/ You are using pip version 8.1.1, however version 9.0.3 is available. You should consider upgrading via the 'pip install --upgrade pip' command. ERROR: Service 'haipproxy' failed to build: The command '/bin/sh -c pip install -i https://pypi.douban.com/simple/ -r requirements.txt' returned a non-zero code: 1

    opened by fei-ju 8
  • crawler start and finish in 1 second

    crawler start and finish in 1 second

    hey, first thank you of your great work, i have some problem about this repo,

    now my machine on aliyun, and status is: splash: install by docker, run at localhost:8058 squid: install by yum, run at *:3128 and i use pip to install your all requirements

    my python version is 3.6.4

    python crawler_booter.py --usage crawler
    

    and i see log in logs directory like this:

    2018-05-22 16:24:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2018-05-22 16:24:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2018-05-22 16:24:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2018-05-22 16:24:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2018-05-22 16:24:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'finish_reason': 'finished', 'finish_time': datetime.datetime(2018, 5, 22, 8, 24, 19, 267658), 'log_count/INFO': 4, 'memusage/max': 46108672, 'memusage/startup': 46108672, 'start_time': datetime.datetime(2018, 5, 22, 8, 24, 19, 242833)} 2018-05-22 16:24:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'finish_reason': 'finished', 'finish_time': datetime.datetime(2018, 5, 22, 8, 24, 19, 269282), 'log_count/INFO': 4, 'memusage/max': 46108672, 'memusage/startup': 46108672, 'start_time': datetime.datetime(2018, 5, 22, 8, 24, 19, 250614)} 2018-05-22 16:24:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'finish_reason': 'finished', 'finish_time': datetime.datetime(2018, 5, 22, 8, 24, 19, 270711), 'log_count/INFO': 4, 'memusage/max': 46309376, 'memusage/startup': 46309376, 'start_time': datetime.datetime(2018, 5, 22, 8, 24, 19, 258190)} 2018-05-22 16:24:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'finish_reason': 'finished', 'finish_time': datetime.datetime(2018, 5, 22, 8, 24, 19, 272169), 'log_count/INFO': 4, 'memusage/max': 46559232, 'memusage/startup': 46559232, 'start_time': datetime.datetime(2018, 5, 22, 8, 24, 19, 265645)}

    can you tell me how can i run this? thanks

    opened by sherry0429 6
  • 调度启动失败

    调度启动失败

    2018-04-13 11:20:22 [validator] INFO: crawler scheduler is starting...
    multiprocessing.pool.RemoteTraceback:
    """
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 119, in worker
        result = (True, func(*args, **kwds))
      File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
        return list(map(*args))
      File "/custom-disk1/haipproxy-0.1/scheduler/scheduler.py", line 112, in schedule_task_with_lock
        if not r or (now - int(r.decode('utf-8'))) >= internal * 60:
    TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
    """
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "scheduler_booter.py", line 22, in <module>
        scheduler_start()
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
        return callback(*args, **kwargs)
      File "/custom-disk1/haipproxy-0.1/scheduler/scheduler.py", line 186, in scheduler_start
        scheduler.schedule_all_right_now()
      File "/custom-disk1/haipproxy-0.1/scheduler/scheduler.py", line 73, in schedule_all_right_now
        pool.map(self.schedule_task_with_lock, self.tasks)
      File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 266, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
      File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 644, in get
        raise self._value
    TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
    
    
    opened by ColorfulGhost 6
  • docker cannot perform

    docker cannot perform

    envir: ubuntu 16.04 64

    when I run docker-compose up this command, It will stuck in the below: how I solve this problem

    by the way, I already set REDIS_HOST='redis' SPLASH_URL='http://splash:8050"

    anyone can run this docker container success?

    opened by xyrobo 6
  • 使用docker部署时遇见一个错误

    使用docker部署时遇见一个错误

    使用docker部署时遇见两个错误:

    #1 c/_cffi_backend.c:15:17: fatal error: ffi.h: No such file or directory
    
    #2 build/temp.linux-x86_64-3.5/_openssl.c:493:30: fatal error: openssl/opensslv.h: No such file or directory
    

    环境:CentOS 7.4, 但是我已经安装了相关联的devel

    opened by metroluffy 5
  • docker-compose up 启动异常

    docker-compose up 启动异常

    在执行 docker-compose up 时,启动出现错误:

    image

    这里是执行的 log:squid.log

    因为使用的是云服务器,配置比较低,如下:

    image

    我查看过 log 的错误,貌似是无法连接到 Redis 导致的,但是报错信息是无法获取内存,不知道有没有可能是机子内存过小导致的。麻烦查看下是什么问题。

    opened by JMwill 5
  • 一段奇怪的报错

    一段奇怪的报错

    haipproxy_1 | 2018/03/10 09:53:43| WARNING: HTTP: Invalid Response: Bad header encountered from http://userapi.plu.cn/user/collect?roomId=2115304 AKA userapi.plu.cn/user/collect?roomId=2115304

    上下文: haipproxy_1 | 2018/03/10 09:53:18| TCP connection to 160.16.223.156/8080 failed haipproxy_1 | 2018/03/10 09:53:20| TCP connection to 160.16.223.156/8080 failed haipproxy_1 | 2018/03/10 09:53:20| Detected DEAD Parent: proxy-34 haipproxy_1 | 2018/03/10 09:53:20| Detected REVIVED Parent: proxy-34 haipproxy_1 | 2018/03/10 09:53:24| WARNING: HTTP: Invalid Response: Bad header encountered from http://userapi.longzhu.com/user/collect?roomId=2220760 AKA userapi.longzhu.com/user/collect?roomId=2220760 haipproxy_1 | 2018/03/10 09:53:28| local=172.18.0.4:3128 remote=120.76.222.200:63864 FD 223 flags=1: read/write failure: (32) Broken pipe haipproxy_1 | 2018/03/10 09:53:30| TCP connection to 160.16.214.186/8080 failed haipproxy_1 | 2018/03/10 09:53:35| TCP connection to 160.16.223.146/8080 failed haipproxy_1 | 2018/03/10 09:53:35| TCP connection to 160.16.214.186/8080 failed haipproxy_1 | 2018/03/10 09:53:35| TCP connection to 95.215.25.233/8080 failed haipproxy_1 | 2018/03/10 09:53:35| Detected DEAD Parent: proxy-31 haipproxy_1 | 2018/03/10 09:53:35| Detected REVIVED Parent: proxy-31 haipproxy_1 | 2018/03/10 09:53:42| TCP connection to 95.215.25.233/8080 failed haipproxy_1 | 2018/03/10 09:53:42| TCP connection to 160.16.213.241/8080 failed haipproxy_1 | 2018/03/10 09:53:43| WARNING: HTTP: Invalid Response: Bad header encountered from http://userapi.plu.cn/user/collect?roomId=2115304 AKA userapi.plu.cn/user/collect?roomId=2115304

    请问我没做龙珠的爬虫,怎么就出现龙珠直播的API了,难道服务器被黑了

    opened by BadReese 5
  • ARM support?

    ARM support?

    在树莓派3b+上尝试部署docker:

    [email protected]:~/haipproxy $ sudo docker-compose up
    Starting haipproxy_splash_1 ... done
    Starting haipproxy_redis_1  ... done
    Starting haipproxy_haipproxy_1 ... done
    Attaching to haipproxy_redis_1, haipproxy_splash_1, haipproxy_haipproxy_1
    redis_1      | 1:C 08 Apr 05:27:23.229 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
    redis_1      | 1:C 08 Apr 05:27:23.229 # Redis version=4.0.9, bits=32, commit=00000000, modified=0, pid=1, just started
    redis_1      | 1:C 08 Apr 05:27:23.229 # Configuration loaded
    redis_1      | 1:M 08 Apr 05:27:23.237 # Warning: 32 bit instance detected but no memory limit set. Setting 3 GB maxmemory limit with 'noeviction' policy now.
    redis_1      | 1:M 08 Apr 05:27:23.238 * Running mode=standalone, port=6379.
    redis_1      | 1:M 08 Apr 05:27:23.239 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
    redis_1      | 1:M 08 Apr 05:27:23.239 # Server initialized
    redis_1      | 1:M 08 Apr 05:27:23.239 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
    redis_1      | 1:M 08 Apr 05:27:23.239 * Ready to accept connections
    splash_1     | standard_init_linux.go:190: exec user process caused "exec format error"
    haipproxy_1  | standard_init_linux.go:190: exec user process caused "exec format error"
    haipproxy_splash_1 exited with code 1
    haipproxy_haipproxy_1 exited with code 1
    

    猜测是架构的问题?

    opened by zingdle 4
  • 安装都成功,但是一个IP都跑不到;有人碰见么

    安装都成功,但是一个IP都跑不到;有人碰见么

    image

    2018-09-05 10:29:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) Read 0 requests from haipproxy:spider:common 2018-09-05 10:29:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) Read 0 requests from haipproxy:spider:ajax Read 0 requests from haipproxy:spider:common Read 0 requests from haipproxy:spider:ajax Read 0 requests from haipproxy:spider:common Read 0 requests from haipproxy:spider:ajax Read 0 requests from haipproxy:spider:common Read 0 requests from haipproxy:spider:ajax

    opened by amassplus 3
  • 运行时候报错:Unhandled error in Deferred:

    运行时候报错:Unhandled error in Deferred:

    环境搭建好了,按照文档执行python crawler_booter.py --usage crawler 出现以下错误:

    root@~#python crawler_booter.py --usage crawler Unhandled error in Deferred:

    Unhandled error in Deferred:

    Unhandled error in Deferred:

    Unhandled error in Deferred:

    Unhandled error in Deferred:


    root@~#scrapy version -v

    Scrapy : 1.5.0 lxml : 4.2.4.0 libxml2 : 2.9.8 cssselect : 1.0.3 parsel : 1.5.0 w3lib : 1.19.0 Twisted : 17.1.0 Python : 3.5.6 (default, Aug 8 2018, 18:36:31) - [GCC 4.4.7 20120313 (Red Hat 4.4.7-18)] pyOpenSSL : 18.0.0 (OpenSSL 1.1.0h 27 Mar 2018) cryptography : 2.3 Platform : Linux-2.6.32-504.el6.x86_64-x86_64-with-centos-6.6-Final


    root@~#docker run -p 8050:8050 scrapinghub/splash

    2018-08-09 03:18:26+0000 [-] Log opened. 2018-08-09 03:18:26.263532 [-] Splash version: 3.2 2018-08-09 03:18:26.264100 [-] Qt 5.9.1, PyQt 5.9, WebKit 602.1, sip 4.19.3, Twisted 16.1.1, Lua 5.2 2018-08-09 03:18:26.264174 [-] Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] 2018-08-09 03:18:26.264231 [-] Open files limit: 10240 2018-08-09 03:18:26.264281 [-] Can't bump open files limit 2018-08-09 03:18:26.367108 [-] Xvfb is started: ['Xvfb', ':2053627847', '-screen', '0', '1024x768x24', '-nolisten', 'tcp'] QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root' 2018-08-09 03:18:26.442532 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles 2018-08-09 03:18:26.521230 [-] verbosity=1 2018-08-09 03:18:26.521334 [-] slots=50 2018-08-09 03:18:26.521397 [-] argument_cache_max_entries=500 2018-08-09 03:18:26.521640 [-] Web UI: enabled, Lua: enabled (sandbox: enabled) 2018-08-09 03:18:26.521711 [-] Server listening on 0.0.0.0:8050 2018-08-09 03:18:26.522198 [-] Site starting on 8050 2018-08-09 03:18:26.522284 [-] Starting factory <twisted.web.server.Site object at 0x7fb37add67b8>


    Google了很多方案都解决不了,有知道怎么解决这个问题的吗?感谢~

    opened by musicking 3
  • AttributeError: 'NoneType' object has no attribute 'decode'

    AttributeError: 'NoneType' object has no attribute 'decode'

    image

    validator scheduler is starting... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/haipproxy/haipproxy/scheduler/scheduler.py", line 161, in schedule_task_with_lock release_lock(conn, task_name, lock_indentifier) File "/haipproxy/haipproxy/utils/redis_util.py", line 41, in release_lock identifier_origin = pipe.get(lock_name).decode() AttributeError: 'NoneType' object has no attribute 'decode' """

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "scheduler_booter.py", line 22, in scheduler_start() File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in call return self.main(*args, **kwargs) File "/usr/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/usr/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "/haipproxy/haipproxy/scheduler/scheduler.py", line 197, in scheduler_start scheduler.schedule_all_right_now() File "/haipproxy/haipproxy/scheduler/scheduler.py", line 73, in schedule_all_right_now pool.map(self.schedule_task_with_lock, self.tasks) File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value AttributeError: 'NoneType' object has no attribute 'decode'

    opened by cxiaolng 3
  • fix(sec): upgrade scrapy-splash to 0.8.0

    fix(sec): upgrade scrapy-splash to 0.8.0

    What happened?

    There are 1 security vulnerabilities found in scrapy-splash 0.7.2

    What did I do?

    Upgrade scrapy-splash from 0.7.2 to 0.8.0 for vulnerability fix

    What did you expect to happen?

    Ideally, no insecure libs should be used.

    The specification of the pull request

    PR Specification from OSCS

    opened by chncaption 0
  • docker 执行 docker-compose报错

    docker 执行 docker-compose报错

    listing workers for Build: failed to list workers: Unavailable: connection error: desc = "transport: Error while dialing unable to upgrade to h2c, received 404" 是否需要降低docker-compose版本?如是,请问需要降低到哪个版本?

    opened by colappp 0
  • Bump twisted from 17.9.0 to 22.10.0

    Bump twisted from 17.9.0 to 22.10.0

    Bumps twisted from 17.9.0 to 22.10.0.

    Release notes

    Sourced from twisted's releases.

    Twisted 22.10.0 (2022-10-30)

    This release contains a security fix for CVE-2022-39348. This is a low-severity security bug.

    Twisted 22.10.0rc1 release candidate was released on 2022-10-26 and there are no changes between the release candidate and the final release.

    Features

    • The systemd: endpoint parser now supports "named" file descriptors. This is a more reliable mechanism for choosing among several inherited descriptors. (#8147)

    Improved Documentation

    • The systemd endpoint parser's index parameter is now documented as leading to non-deterministic results in which descriptor is selected. The new name parameter is now documented as preferred. (#8146)
    • The implementers of Zope interfaces are once more displayed in the documentations. (#11690)

    Deprecations and Removals

    • twisted.protocols.dict, which was deprecated in 17.9, has been removed. (#11725)

    Misc

    Conch

    Bugfixes

    
    - twisted.conch.manhole.ManholeInterpreter now captures tracebacks even if sys.excepthook has been modified. ([#11638](https://github.com/twisted/twisted/issues/11638))
    

    Web

    Features

    ... (truncated)

    Changelog

    Sourced from twisted's changelog.

    Twisted 22.10.0 (2022-10-30)

    This release contains a security fix for CVE-2022-39348. This is a low-severity security bug.

    Twisted 22.10.0rc1 release candidate was released on 2022-10-26 and there are no changes between the release candidate and the final release.

    Features

    • The systemd: endpoint parser now supports "named" file descriptors. This is a more reliable mechanism for choosing among several inherited descriptors. (#8147)

    Improved Documentation

    • The systemd endpoint parser's index parameter is now documented as leading to non-deterministic results in which descriptor is selected. The new name parameter is now documented as preferred. (#8146)
    • The implementers of Zope interfaces are once more displayed in the documentations. (#11690)

    Deprecations and Removals

    • twisted.protocols.dict, which was deprecated in 17.9, has been removed. (#11725)

    Misc

    Conch

    Bugfixes

    
    - twisted.conch.manhole.ManholeInterpreter now captures tracebacks even if sys.excepthook has been modified. ([#11638](https://github.com/twisted/twisted/issues/11638))
    

    Web

    Features

    ... (truncated)

    Commits
    • 39ee213 Update news for final version.
    • 7e76513 python -m incremental.update Twisted --newversion 22.10.0
    • 3f1f502 Apply suggestions from twm.
    • 3185b01 Add info about CVE at the start of the release notes.
    • 15aa477 tox -e towncrier
    • 0a29d34 python -m incremental.update Twisted --rc
    • f2f5e81 Merge pull request from GHSA-vg46-2rrj-3647
    • b0545bc Merge branch 'trunk' into advisory-fix-1
    • 50761f4 #11715: Use NEXT in deprecation examples (#11720)
    • 927a5dc Add newsfragment
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump scrapy from 1.5.0 to 2.6.2

    Bump scrapy from 1.5.0 to 2.6.2

    Bumps scrapy from 1.5.0 to 2.6.2.

    Release notes

    Sourced from scrapy's releases.

    2.6.2

    Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0.

    See the changelog.

    2.6.1

    Fixes a regression introduced in 2.6.0 that would unset the request method when following redirects.

    2.6.0

    • Security fixes for cookie handling (see details below)
    • Python 3.10 support
    • asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version
    • Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

    See the full changelog

    Security bug fixes

    • When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

      If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

      The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

      Note: It is still possible to enable the sharing of cookies between different domains with a shared domain suffix (e.g. example.com and any subdomain) by defining the shared domain suffix (e.g. example.com) as the cookie domain when defining your cookies. See the documentation of the Request class for more information.

    • When the domain of a cookie, either received in the Set-Cookie header of a response or defined in a Request object, is set to a public suffix <https://publicsuffix.org/>_, the cookie is now ignored unless the cookie domain is the same as the request domain.

      The old behavior could be exploited by an attacker to inject cookies from a controlled domain into your cookiejar that could be sent to other domains not controlled by the attacker. Please, see the mfjm-vh54-3f96 security advisory for more information.

    2.5.1

    Security bug fix:

    If you use HttpAuthMiddleware (i.e. the http_user and http_pass spider attributes) for HTTP authentication, any request exposes your credentials to the request target.

    To prevent unintended exposure of authentication credentials to unintended domains, you must now additionally set a new, additional spider attribute, http_auth_domain, and point it to the specific domain to which the authentication credentials must be sent.

    If the http_auth_domain spider attribute is not set, the domain of the first request will be considered the HTTP authentication target, and authentication credentials will only be sent in requests targeting that domain.

    If you need to send the same HTTP authentication credentials to multiple domains, you can use w3lib.http.basic_auth_header instead to set the value of the Authorization header of your requests.

    If you really want your spider to send the same HTTP authentication credentials to any domain, set the http_auth_domain spider attribute to None.

    Finally, if you are a user of scrapy-splash, know that this version of Scrapy breaks compatibility with scrapy-splash 0.7.2 and earlier. You will need to upgrade scrapy-splash to a greater version for it to continue to work.

    2.5.0

    • Official Python 3.9 support
    • Experimental HTTP/2 support
    • New get_retry_request() function to retry requests from spider callbacks

    ... (truncated)

    Changelog

    Sourced from scrapy's changelog.

    Scrapy 2.6.2 (2022-07-25)

    Security bug fix:

    • When :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware processes a request with :reqmeta:proxy metadata, and that :reqmeta:proxy metadata includes proxy credentials, :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware sets the Proxy-Authentication header, but only if that header is not already set.

      There are third-party proxy-rotation downloader middlewares that set different :reqmeta:proxy metadata every time they process a request.

      Because of request retries and redirects, the same request can be processed by downloader middlewares more than once, including both :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware and any third-party proxy-rotation downloader middleware.

      These third-party proxy-rotation downloader middlewares could change the :reqmeta:proxy metadata of a request to a new value, but fail to remove the Proxy-Authentication header from the previous value of the :reqmeta:proxy metadata, causing the credentials of one proxy to be sent to a different proxy.

      To prevent the unintended leaking of proxy credentials, the behavior of :class:~scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware is now as follows when processing a request:

      • If the request being processed defines :reqmeta:proxy metadata that includes credentials, the Proxy-Authorization header is always updated to feature those credentials.

      • If the request being processed defines :reqmeta:proxy metadata without credentials, the Proxy-Authorization header is removed unless it was originally defined for the same proxy URL.

        To remove proxy credentials while keeping the same proxy URL, remove the Proxy-Authorization header.

      • If the request has no :reqmeta:proxy metadata, or that metadata is a falsy value (e.g. None), the Proxy-Authorization header is removed.

        It is no longer possible to set a proxy URL through the :reqmeta:proxy metadata but set the credentials through the Proxy-Authorization header. Set proxy credentials through the :reqmeta:proxy metadata instead.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • docker 执行 docker-compose up 报错 RUN apk upgrade --no-cache

    docker 执行 docker-compose up 报错 RUN apk upgrade --no-cache

    => [2/7] RUN echo -e "https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.7/main/\nhttps://mirrors.tuna.tsinghua.edu. 1.6s => ERROR [3/7] RUN apk upgrade --no-cache && apk add --no-cache squid libxml2-dev libxml2 libxslt-dev 0.9s

    [3/7] RUN apk upgrade --no-cache && apk add --no-cache squid libxml2-dev libxml2 libxslt-dev libxslt libffi-dev python3-dev && rm -rf /var/cache/* && rm -rf /root/.cache/*: #7 0.679 fetch https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.7/main/x86_64/APKINDEX.tar.gz #7 0.740 140373535435724:error:14007086:SSL routines:CONNECT_CR_CERT:certificate verify failed:ssl_clnt.c:1026: #7 0.741 WARNING: Ignoring https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.7/main/x86_64/APKINDEX.tar.gz: Permission denied #7 0.741 fetch https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.7/community/x86_64/APKINDEX.tar.gz #7 0.774 140373535435724:error:14007086:SSL routines:CONNECT_CR_CERT:certificate verify failed:ssl_clnt.c:1026: #7 0.774 WARNING: Ignoring https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.7/community/x86_64/APKINDEX.tar.gz: Permission denied #7 0.782 OK: 273 MiB in 55 packages #7 0.797 fetch https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.7/main/x86_64/APKINDEX.tar.gz #7 0.829 139900340419532:error:14007086:SSL routines:CONNECT_CR_CERT:certificate verify failed:ssl_clnt.c:1026: #7 0.830 WARNING: Ignoring https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.7/main/x86_64/APKINDEX.tar.gz: Permission denied #7 0.830 fetch https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.7/community/x86_64/APKINDEX.tar.gz #7 0.861 139900340419532:error:14007086:SSL routines:CONNECT_CR_CERT:certificate verify failed:ssl_clnt.c:1026: #7 0.862 WARNING: Ignoring https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.7/community/x86_64/APKINDEX.tar.gz: Permission denied #7 0.862 ERROR: unsatisfiable constraints: #7 0.866 libffi-dev (missing): #7 0.866 required by: world[libffi-dev] #7 0.866 libxml2 (missing): #7 0.866 required by: world[libxml2] #7 0.866 libxml2-dev (missing): #7 0.866 required by: world[libxml2-dev] #7 0.866 libxslt (missing): #7 0.866 required by: world[libxslt] #7 0.866 libxslt-dev (missing): #7 0.866 required by: world[libxslt-dev] #7 0.866 squid (missing): #7 0.866 required by: world[squid]


    executor failed running [/bin/sh -c apk upgrade --no-cache && apk add --no-cache squid libxml2-dev libxml2 libxslt-dev libxslt libffi-dev python3-dev && rm -rf /var/cache/* && rm -rf /root/.cache/*]: exit code: 6 ERROR: Service 'haipproxy' failed to build : Build failed

    请教怎么修改呀

    opened by nankaimy 1
Releases(v0.1)
Owner
SpiderClub
A group interested in web crawler.
SpiderClub
A simple Discord scraper for discord bots

A simple Discord scraper for discord bots. That includes sending an guild members ids to an file, Mass inviter for joining servers your bot is in and Fetching all the servers of the bot (w/MemberCoun

3zg 1 Jan 06, 2022
Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye

Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye, you can search with various keywords and usernames on Twitter.

Jolanda de Koff 19 Dec 12, 2022
A simple python web scraper.

Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

11 May 06, 2022
A tool can scrape product in aliexpress: Title, Price, and URL Product.

Scrape-Product-Aliexpress A tool can scrape product in aliexpress: Title, Price, and URL Product. Usage: 1. Install Python 3.8 3.9 padahal halaman ins

Rahul Joshua Damanik 1 Dec 30, 2021
Binance Smart Chain Contract Scraper + Contract Evaluator

Pulls Binance Smart Chain feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials.Returns only those with socials information included, and then submit

14 Dec 09, 2022
PyQuery-based scraping micro-framework.

demiurge PyQuery-based scraping micro-framework. Supports Python 2.x and 3.x. Documentation: http://demiurge.readthedocs.org Installing demiurge $ pip

Matias Bordese 109 Jul 20, 2022
抖音批量下载用户所有无水印视频

Douyincrawler 抖音批量下载用户所有无水印视频 Run 安装python3, 安装依赖

28 Dec 08, 2022
Dude is a very simple framework for writing web scrapers using Python decorators

Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-lea

Ronie Martinez 326 Dec 15, 2022
Minecraft Item Scraper

Minecraft Item Scraper To run, first ensure you have the BeautifulSoup module: pip install bs4 Then run, python minecraft_items.py folder-to-save-ima

Jaedan Calder 1 Dec 29, 2021
Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Web Scrapping Popular Youtube Tech Channels with Selenium Data Mining, Data Wrangling, and Exploratory Data Analysis About the Data Web scrapi

David Rusho 0 Aug 18, 2021
Searching info from Google using Python Scrapy

Python-Search-Engine-Scrapy || Python-爬虫-索引/利用爬虫获取谷歌信息**/ Searching info from Google using Python Scrapy /* 利用 PYTHON 爬虫获取天气信息,以及城市信息和资料**/ translatio

HONGVVENG 1 Jan 06, 2022
This is my CS 20 final assesment.

eeeeeSpider This is my CS 20 final assesment. How to use: Open program Run to your hearts content! There are no external dependancies that you will ha

1 Jan 17, 2022
A pure-python HTML screen-scraping library

Scrapely Scrapely is a library for extracting structured data from HTML pages. Given some example web pages and the data to be extracted, scrapely con

Scrapy project 1.8k Dec 31, 2022
抢京东茅台脚本,定时自动触发,自动预约,自动停止

jd_maotai 抢京东茅台脚本,定时自动触发,自动预约,自动停止 小白信用 99.6,暂时还没抢到过,朋友 80 多抢到了一瓶,所以我感觉是跟信用分没啥关系,完全是看运气的。

Aruelius.L 117 Dec 22, 2022
A scalable frontier for web crawlers

Frontera Overview Frontera is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large sc

Scrapinghub 1.2k Jan 02, 2023
Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

TwitterScraper Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine . Screenshot Data Users Only

Remax Alghamdi 19 Nov 17, 2022
Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

Auto Invite To The Organization By Star A GitHub Action script to automatically invite everyone to your organization that stars your repository. What

Max Base 11 Dec 11, 2022
Iptvcrawl - A scrapy project for crawl IPTV playlist

iptvcrawl a scrapy project for crawl IPTV playlist. Dependency Python3 pip insta

Zhijun 18 May 05, 2022
一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

QQ音乐歌词爬虫 一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件,默认去除了所有演唱会(Live)版本的歌曲。 使用方法 直接运行python run.py即可,然后输入你想获取的歌手名字,然后静静等待片刻。 output目录下保存生成的歌词和歌名文件。以周杰伦为例,会生成两

Yang Wei 11 Jul 27, 2022
Transistor, a Python web scraping framework for intelligent use cases.

Web data collection and storage for intelligent use cases. transistor About The web is full of data. Transistor is a web scraping framework for collec

BOM Quote Manufacturing 212 Nov 05, 2022