当前位置:网站首页>Web crawler knowledge day03
Web crawler knowledge day03
2022-07-03 16:40:00 【Little Chen Gong】
One 、 Reptile cases
1.1 Demand analysis
Visit Jingdong , Search for mobile phones , Analysis page , Capture the following product data :
Commodity images 、 Price 、 title 、 Product details page
1.1.1 SPU and SKU
In addition to the above four attributes , We found that the Apple phone in the figure above has four products , We should grab every kind of . Then we must understand spu and sku The concept of
SPU = Standard Product Unit ( Standard product unit )
SPU It is the smallest unit of commodity information aggregation , It's a set of reusable 、 A collection of easily retrievable standardized information , This collection describes the characteristics of a product . Popular point , Property value 、 A commodity with the same characteristics can be called a SPU. For example, the Apple phone in the picture above is SPU, Including red 、 Dark grey 、 golden 、 silvery
SKU=stock keeping unit( A unit of stock )
SKU That is, the unit of measurement of inventory in and out , It can be in the form of 、 box 、 Pallets, etc. are units .SKU It is the smallest inventory unit that is physically indivisible . It should be used according to different formats , Different management models to deal with . In clothing 、 The most commonly used footwear products . For example, the Apple phone in the figure above has several styles , Red Apple phone , It's just one. sku
You can also see the difference by looking at the source code of the page
1.2 The development of preparation
1.2.1 Database table analysis
According to the demand analysis , The table we created is as follows :
CREATE TABLE `jd_item` (
`id` bigint(10) NOT NULL AUTO_INCREMENT COMMENT ' Primary key id',
`spu` bigint(15) DEFAULT NULL COMMENT ' A collection of goods id',
`sku` bigint(15) DEFAULT NULL COMMENT ' The smallest category unit of commodity id',
`title` varchar(100) DEFAULT NULL COMMENT ' Commodity title ',
`price` bigint(10) DEFAULT NULL COMMENT ' commodity price ',
`pic` varchar(200) DEFAULT NULL COMMENT ' Commodity images ',
`url` varchar(200) DEFAULT NULL COMMENT ' Product details address ',
`created` datetime DEFAULT NULL COMMENT ' Creation time ',
`updated` datetime DEFAULT NULL COMMENT ' Update time ',
PRIMARY KEY (`id`),
KEY `sku` (`sku`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT=' Jingdong commodity list ';
1.2.2 Add dependency
Use Spring Boot+Spring Data JPA And timing tasks , Need to create Maven Project and add the following dependencies :
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.2.RELEASE</version>
</parent>
<groupId>cn.itcast.crawler</groupId>
<artifactId>itcast-crawler-jd</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<!--SpringMVC-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!--SpringData Jpa-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<!--MySQL Connection package -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
<!-- HttpClient -->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
<!--Jsoup-->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.3</version>
</dependency>
<!-- tool kit -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
</dependencies>
</project>
1.2.3 Add configuration file
Join in application.properties The configuration file :
#DB Configuration:
spring.datasource.driverClassName=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://127.0.0.1:3306/crawler
spring.datasource.username=root
spring.datasource.password=root
#JPA Configuration:
spring.jpa.database=MySQL
spring.jpa.show-sql=true
1.2.4 Code implementation
To write pojo
To write dao
public interface ItemDao extends JpaRepository<Item,Long> {
}
To write Service
ItemService Interface :
public interface ItemService {
// Query data according to criteria
public List<Item> findAll(Item item);
// Save the data
public void save(Item item);
}
ItemServiceImpl Implementation class
Write a guide class
边栏推荐
- QT serial port UI design and solution to display Chinese garbled code
- "The NTP socket is in use, exiting" appears when ntpdate synchronizes the time
- [combinatorial mathematics] counting model, common combinatorial numbers and combinatorial identities**
- Thinking about telecommuting under the background of normalization of epidemic | community essay solicitation
- 8 cool visual charts to quickly write the visual analysis report that the boss likes to see
- 8个酷炫可视化图表,快速写出老板爱看的可视化分析报告
- Client does not support authentication protocol requested by server; consider upgrading MySQL client
- PHP CI (CodeIgniter) log level setting
- 面试之 top k问题
- 架构实战营 - 第 6 期 毕业总结
猜你喜欢
What material is 12cr1movr? Chemical property analysis of pressure vessel steel plate 12cr1movr
线程池执行定时任务
To resist 7-Zip, list "three sins"? Netizen: "is the third key?"
2022.02.14_ Daily question leetcode five hundred and forty
Google Earth engine (GEE) - daymet v4: daily surface weather data set (1000m resolution) including data acquisition methods for each day
【声明】关于检索SogK1997而找到诸多网页爬虫结果这件事
探索Cassandra的去中心化分布式架构
Netease UI automation test exploration: airtest+poco
8 cool visual charts to quickly write the visual analysis report that the boss likes to see
Deep understanding of grouping sets statements in SQL
随机推荐
Shentong express expects an annual loss of nearly 1billion
Golang decorator mode and its use in NSQ
AcWing 第58 场周赛
2022.02.14_ Daily question leetcode five hundred and forty
ThreeJS 第二篇:顶点概念、几何体结构
1287. Elements that appear more than 25% in an ordered array
什么是质押池,如何进行质押呢?
数据分析必备的能力
Mysql 将逗号隔开的属性字段数据由列转行
Everyone in remote office works together to realize cooperative editing of materials and development of documents | community essay solicitation
Threejs Part 2: vertex concept, geometry structure
在ntpdate同步时间的时候出现“the NTP socket is in use, exiting”
探索Cassandra的去中心化分布式架构
Using optimistic lock and pessimistic lock in MySQL to realize distributed lock
NSQ source code installation and operation process
Simulink oscilloscope data is imported into Matlab and drawn
[combinatorics] non descending path problem (outline of non descending path problem | basic model of non descending path problem | non descending path problem expansion model 1 non origin starting poi
Unity project optimization case 1
How to initialize views when loading through storyboards- How is view initialized when loaded via a storyboard?
【剑指 Offer】58 - I. 翻转单词顺序