当前位置:网站首页>Nutch2.1 distributed fetching
Nutch2.1 distributed fetching
2022-06-29 20:02:00 【Brother Xing plays with the clouds】
On the basis of this article http://www.linuxidc.com/Linux/2014-01/95796.htm.
1 Prepare the environment :Hadoop colony 、java、mysql database , The code can be in eclipse Run in , You can insert data into... In stand-alone mode mysql database .
2 Modify the configuration file nutch-site.xml:
<property>
<name>plugin.folders</name>
<value>./plugins</value>
<description>Directories where nutch plugins are located. Each
element may be a relative or absolute path. If absolute, it is used
as is. If relative, it is searched for on the classpath.</description>
</property>
stay eclipse Choose buil.xml,run as ant, function runtime, A successful run will result in a folder runtime.
3 hold runtime Upload the folder to hadoop colony Medium master The server ( No other validation The server Is it possible to ), The location after I upload is :/home/hadoop/nutch/runtime, Set the environment variable :
stay /etc/profile in :export NUTCH_HOME=/home/hadoop/nutch/runtime/local source /etc/profile Make changes work .
4 It should be url Upload the seed file to hadoop. My seed file never succeeded , Skip this step .
5 stay /home/hadoop/nutch/runtime/deploy Run in directory :
./bin/nutch crawl -dir crawl -depth 2 -threads 4 -topN 50
A little experience :nutch2 There is no need to change the configuration file (conf) Distribute to colony Each machine in the , But you need to use it again after modifying the configuration file ant pack , Configuration can take effect .
边栏推荐
- data link layer
- Finally, Amazon~
- 【译】十二因子应用(四)
- JMeter BeanShell explanation and thread calling
- 7.取消与关闭
- Etcd database source code analysis - put process of server
- 通过MeterSphere和DataEase实现项目Bug处理进展实时跟进
- JVM (2) garbage collection
- Common knowledge of ECS security settings
- The list of winners in the classic Smurfs of childhood: bluedad's digital collection was announced
猜你喜欢

JVM(4) 字節碼技術+運行期優化

Luoqingqi: has high-end household appliances become a red sea? Casati took the lead in breaking the game

Flume理论

Flume配置1——基础案例

lock4j--分布式锁中间件--自定义获取锁失败的逻辑

苹果iPhone手机升级系统内存空间变小不够如何解决?

Finally, Amazon~

The concept and properties of mba-day26 number

Introduction to the latest version 24.1.0.360 update of CorelDRAW

CorelDRAW最新24.1.0.360版本更新介绍讲解
随机推荐
Dynamics CRM: 本地部署的服务器中, Sandbox, Unzip, VSS, Asynchronous还有Monitor服务的作用
Jmeter之BeanShell详解和夸线程调用
Spark存储体系底层架构剖析-Spark商业环境实战
文件包含漏洞
nacos 问题
使用Gunicorn部署web.py应用
WPS and Excelle
ASP. Net core creates razor page and uploads multiple files (buffer mode) (Continued)
命令执行(RCE)漏洞
Linux安装MySQL5
Flume配置1——基础案例
Union find
XSS漏洞
Tiger painter mengxiangshun's digital collection is on sale in limited quantities and comes with Maotai in the year of the tiger
Measures to support the development of advanced manufacturing industry in Futian District of Shenzhen in 2022
Flume-ng配置
Introduction to the latest version 24.1.0.360 update of CorelDRAW
JVM (4) bytecode technology + runtime optimization
7.取消与关闭
PHP implementation extracts non repeated integers (programming topics can be the fastest familiar functions)