当前位置:网站首页>Nutch2.1分布式抓取
Nutch2.1分布式抓取
2022-06-29 19:57:00 【星哥玩云】
在这篇的基础上http://www.linuxidc.com/Linux/2014-01/95796.htm。
1准备环境:Hadoop集群、java、mysql数据库,代码可以在eclipse中运行,可以单机模式下插入数据到mysql数据库。
2修改配置文件nutch-site.xml:
<property>
<name>plugin.folders</name>
<value>./plugins</value>
<description>Directories where nutch plugins are located. Each
element may be a relative or absolute path. If absolute, it is used
as is. If relative, it is searched for on the classpath.</description>
</property>
在eclipse中选中buil.xml,run as ant,运行runtime,运行成功会产生文件夹runtime。
3 把runtime文件夹上传到hadoop集群中的master服务器(没有验证其他服务器是不是可以),我上传之后的位置是:/home/hadoop/nutch/runtime,设置环境变量:
在/etc/profile中:export NUTCH_HOME=/home/hadoop/nutch/runtime/local source /etc/profile使得修改起作用。
4应该是把url种子文件上传到hadoop。我的种子文件始终没有成功,这一步略过。
5在/home/hadoop/nutch/runtime/deploy目录下运行:
./bin/nutch crawl -dir crawl -depth 2 -threads 4 -topN 50
一点心得:nutch2之后不需要把配置文件(conf)分发到集群中的每台机器,但是修改配置文件以后需要重新用ant打包,配置才能生效。
边栏推荐
- Finally, Amazon~
- The era of data security solutions
- 社区访谈丨一个IT新人眼中的JumpServer开源堡垒机
- Zotero journal Automatic Matching Update Influencing Factors
- PHP implementation extracts non repeated integers (programming topics can be the fastest familiar functions)
- Automatically obtain local connection and network address modification
- In 2022, the financial interest rate has dropped, so how to choose financial products?
- WPS and Excelle
- ETCD数据库源码分析——服务端PUT流程
- The concept and properties of mba-day26 number
猜你喜欢

【网络方向实训】-企业园区网络设计-【Had Done】

如何设置 Pod 到指定节点运行

Shell bash script note: there must be no other irrelevant characters after the escape character \ at the end of a single line (multi line command)

关于印发宝安区重点产业项目和总部项目遴选及用地保障实施细则(2022修订版)的通知

JVM (2) garbage collection

畫虎國手孟祥順數字藏品限量發售,隨贈虎年茅臺

Detailed description of gaussdb (DWS) complex and diverse resource load management methods

Understanding of software test logic coverage

There are more than 20 databases in a MySQL with 3306 ports. How can I backup more than 20 databases with one click and do system backup to prevent data from being deleted by mistake?

One hour to build a sample scenario sound network to release lingfalcon Internet of things cloud platform
随机推荐
data link layer
How to solve the problem of insufficient memory space in Apple iPhone upgrade system?
[boutique] detailed explanation of Pinia
nacos 问题
【精品】pinia详解
How to use filters in jfinal to monitor Druid for SQL execution?
【编译原理】语义分析
Flume-ng配置
14,04 millions! Appel d'offres pour la mise à niveau de la base de données relationnelle et du système logiciel Middleware du Département des ressources humaines et sociales de la province du Sichuan!
云服务器的安全设置常识
Shell bash script note: there must be no other irrelevant characters after the escape character \ at the end of a single line (multi line command)
【观察】软通动力刘天文:拥抱变化“顺势而为”,做中国数字经济“使能者”...
Linux安装MySQL8
软件工程—原理、方法与应用
CorelDRAW最新24.1.0.360版本更新介绍讲解
文件包含漏洞
Flume配置3——拦截器过滤
What is a database? Database detailed notes! Take you into the database ~ you want to know everything here!
@SneakyThrows注解
Koa 源码剖析