当前位置:网站首页>Web crawler knowledge day03
Web crawler knowledge day03
2022-07-03 16:40:00 【Little Chen Gong】
One 、 Reptile cases
1.1 Demand analysis
Visit Jingdong , Search for mobile phones , Analysis page , Capture the following product data :
Commodity images 、 Price 、 title 、 Product details page
1.1.1 SPU and SKU
In addition to the above four attributes , We found that the Apple phone in the figure above has four products , We should grab every kind of . Then we must understand spu and sku The concept of
SPU = Standard Product Unit ( Standard product unit )
SPU It is the smallest unit of commodity information aggregation , It's a set of reusable 、 A collection of easily retrievable standardized information , This collection describes the characteristics of a product . Popular point , Property value 、 A commodity with the same characteristics can be called a SPU. For example, the Apple phone in the picture above is SPU, Including red 、 Dark grey 、 golden 、 silvery
SKU=stock keeping unit( A unit of stock )
SKU That is, the unit of measurement of inventory in and out , It can be in the form of 、 box 、 Pallets, etc. are units .SKU It is the smallest inventory unit that is physically indivisible . It should be used according to different formats , Different management models to deal with . In clothing 、 The most commonly used footwear products . For example, the Apple phone in the figure above has several styles , Red Apple phone , It's just one. sku
You can also see the difference by looking at the source code of the page

1.2 The development of preparation
1.2.1 Database table analysis
According to the demand analysis , The table we created is as follows :
CREATE TABLE `jd_item` (
`id` bigint(10) NOT NULL AUTO_INCREMENT COMMENT ' Primary key id',
`spu` bigint(15) DEFAULT NULL COMMENT ' A collection of goods id',
`sku` bigint(15) DEFAULT NULL COMMENT ' The smallest category unit of commodity id',
`title` varchar(100) DEFAULT NULL COMMENT ' Commodity title ',
`price` bigint(10) DEFAULT NULL COMMENT ' commodity price ',
`pic` varchar(200) DEFAULT NULL COMMENT ' Commodity images ',
`url` varchar(200) DEFAULT NULL COMMENT ' Product details address ',
`created` datetime DEFAULT NULL COMMENT ' Creation time ',
`updated` datetime DEFAULT NULL COMMENT ' Update time ',
PRIMARY KEY (`id`),
KEY `sku` (`sku`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT=' Jingdong commodity list ';
1.2.2 Add dependency
Use Spring Boot+Spring Data JPA And timing tasks , Need to create Maven Project and add the following dependencies :
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.2.RELEASE</version>
</parent>
<groupId>cn.itcast.crawler</groupId>
<artifactId>itcast-crawler-jd</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<!--SpringMVC-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!--SpringData Jpa-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<!--MySQL Connection package -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
<!-- HttpClient -->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
<!--Jsoup-->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.3</version>
</dependency>
<!-- tool kit -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
</dependencies>
</project>
1.2.3 Add configuration file
Join in application.properties The configuration file :
#DB Configuration:
spring.datasource.driverClassName=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://127.0.0.1:3306/crawler
spring.datasource.username=root
spring.datasource.password=root
#JPA Configuration:
spring.jpa.database=MySQL
spring.jpa.show-sql=true
1.2.4 Code implementation
To write pojo

To write dao
public interface ItemDao extends JpaRepository<Item,Long> {
}
To write Service
ItemService Interface :
public interface ItemService {
// Query data according to criteria
public List<Item> findAll(Item item);
// Save the data
public void save(Item item);
}
ItemServiceImpl Implementation class

Write a guide class

边栏推荐
- Eleven requirements for test management post
- Visual SLAM algorithms: a survey from 2010 to 2016
- Everyone in remote office works together to realize cooperative editing of materials and development of documents | community essay solicitation
- [Jianzhi offer] 57 - ii Continuous positive sequence with sum s
- 【剑指 Offer 】57 - II. 和为s的连续正数序列
- word 退格键删除不了选中文本,只能按delete
- Deep understanding of grouping sets statements in SQL
- Learn from me about the enterprise flutter project: simplified framework demo reference
- 手机注册股票开户安全吗 开户需要钱吗
- How to set up SVN server on this machine
猜你喜欢

NSQ source code installation and operation process

深入理解 SQL 中的 Grouping Sets 语句
![[combinatorics] non descending path problem (outline of non descending path problem | basic model of non descending path problem | non descending path problem expansion model 1 non origin starting poi](/img/81/59ed6bebf5d85e9eb71bd4ca261309.jpg)
[combinatorics] non descending path problem (outline of non descending path problem | basic model of non descending path problem | non descending path problem expansion model 1 non origin starting poi

Netease UI automation test exploration: airtest+poco

What material is 13crmo4-5 equivalent to in China? 13crmo4-5 chemical composition 13crmo4-5 mechanical properties

Visual SLAM algorithms: a survey from 2010 to 2016

What material is sa537cl2? Analysis of mechanical properties of American standard container plate
![[combinatorics] non descending path problem (number of non descending paths with constraints)](/img/89/bd1a2ddd9632ab5d4b4bee9336be51.jpg)
[combinatorics] non descending path problem (number of non descending paths with constraints)

A survey of state of the art on visual slam

(Supplement) double pointer topic
随机推荐
手机注册股票开户安全吗 开户需要钱吗
Cocos Creator 2. X automatic packaging (build + compile)
To resist 7-Zip, list "three sins"? Netizen: "is the third key?"
Is it safe to open a stock account by mobile registration? Does it need money to open an account
PHP二级域名session共享方案
智慧之道(知行合一)
Mysql 单表字段重复数据取最新一条sql语句
Golang anonymous function use
PHP CI (CodeIgniter) log level setting
2022爱分析· 国央企数字化厂商全景报告
Golang 装饰器模式以及在NSQ中的使用
Nifi from introduction to practice (nanny level tutorial) - flow
MySQL converts comma separated attribute field data from column to row
Hong Kong Polytechnic University | data efficient reinforcement learning and adaptive optimal perimeter control of network traffic dynamics
IDEA-配置插件
What kind of material is 14Cr1MoR? Analysis of chemical composition and mechanical properties of 14Cr1MoR
Informatics Olympiad all in one YBT 1175: divide by 13 | openjudge noi 1.13 27: divide by 13
爱可可AI前沿推介(7.3)
NSQ source code installation and operation process
What is the difference between 14Cr1MoR container plate and 14Cr1MoR (H)? Chemical composition and performance analysis of 14Cr1MoR