当前位置:网站首页>[note] on May 28, 2022, data is obtained from the web page and written into the database
[note] on May 28, 2022, data is obtained from the web page and written into the database
2022-06-30 03:40:00 【Sprite. Nym】
""" example11 - Persistent crawler data with database 400 - Bad request. 401 - Unauthorized. 403 - Forbidden. 404 - Not Found. 405 - Method not allowed. 418 - I am a teapot. create table `tb_top_movie` ( `mov_id` bigint unsigned auto_increment comment ' Number ', `mov_title` varchar(200) not null comment ' title ', `mov_rating_num` decimal(3,1) not null comment ' score ', `mov_comments_count` bigint not null comment ' comments ', primary key (`mov_id`) ) engine=innodb auto_increment=1001 comment ' Movie data sheet '; Author: Hao Date: 2022/5/28 """
import bs4
import pymysql
import requests
from pymysql.cursors import Cursor
def fetch_page(session, url):
""" Grab page :param session: Session object :param url: Uniform resource locator :return: Page HTML Code """
resp = session.get(url=url)
return resp.text if resp.status_code == 200 else ''
def parse_page(html_code):
""" Parsing the page :param html_code: Page HTML Code :return: Data parsed from the page """
soup = bs4.BeautifulSoup(html_code, 'html.parser')
movie_items_list = soup.select('#content > div > div.article > ol > li')
data = []
for movie_item in movie_items_list:
title = movie_item.select_one('div > div.info > div.hd > a > span.title').text
rating_num = movie_item.select_one('div > div.info > div.bd > div > span.rating_num').text
comments_count = movie_item.select_one('div > div.info > div.bd > div > span:nth-child(4)').text[:-3]
data.append((title, rating_num, comments_count))
return data
def save_to_db(conn, data):
""" Save the data to the database :param conn: Database connection :param data: data """
with conn.cursor() as cursor: # type: Cursor
cursor.executemany(
'insert into tb_top_movie (mov_title, mov_rating_num, mov_comments_count) '
'values (%s, %s, %s)',
data
)
conn.commit()
def main():
session = requests.Session()
session.headers = {
'User-Agent': 'Baiduspider'}
conn = pymysql.connect(host='localhost', port=3306,
user='guest', password='Guest.618',
database='hrs', charset='utf8mb4')
try:
for page in range(10):
url = f'https://movie.douban.com/top250?start={
25 * page}'
html_code = fetch_page(session, url)
data = parse_page(html_code)
save_to_db(conn, data)
finally:
conn.close()
if __name__ == '__main__':
main()
边栏推荐
- [0x0] 校长留的开放问题作业
- How do college students make money by programming| My way to make money in College
- The next change direction of database - cloud native database
- hudi记录
- Learning cyclic redundancy CRC check
- Installation and use of yarn
- 云原生入门+容器概念介绍
- 第2章 控制结构和函数(编程题)
- [operation] getting started with MySQL on May 23, 2022
- 【笔记】2022.5.23 MySQL
猜你喜欢

December2020 - true questions and analysis of C language (Level 2) in the youth level examination of the Electronic Society

1152_ Makefile learning_ Pattern matching rules

Node-RED系列(二八):基于OPC UA节点与西门子PLC进行通讯

Usage record of unity input system (instance version)

Practical debugging skills

深入浅出掌握grpc通信框架

Reasons for MySQL master-slave database synchronization failure

(04).NET MAUI实战 MVVM

【笔记】2022.6.7 数据分析概论

你清楚AI、数据库与计算机体系
随机推荐
Arrangement of language resources of upgraded version
将DataBinding整合到Activity/Fragment的一种极简方式
Play with algorithm interview together, nanny level strategy (with high-definition codeless algorithm summary map), recommended collection
laravel9本地安装
51单片机的室内环境监测系统,MQ-2烟雾传感器和DHT11温湿度传感器,原理图,C编程和仿真
Wang Shuang - assembly language learning summary
Huawei interview question: divide candy
Knowledge points of 2022 system integration project management engineer examination: software quality assurance and quality evaluation
How to view Tencent's 2022 school recruitment salary, the total contract of cabbage is 40W?
Number of students from junior college to Senior College (III)
Redis在windows系统中使用
【常见问题】页面跨域和接口跨域
专升本高数(三)
Product thinking - is the future of UAV express worth looking forward to?
C#【高级篇】 C# 泛型(Generic)【需进一步补充:泛型接口、泛型事件的实例】
Buffer pool of MySQL notes
图的邻接矩阵存储 C语言实现BFS
ZABBIX trigger explanation
Laravel9 installation locale
Use of foreach in QT