当前位置:网站首页>Starting point Chinese website font anti crawling technology web page can display numbers and letters, and the web page code is garbled or blank
Starting point Chinese website font anti crawling technology web page can display numbers and letters, and the web page code is garbled or blank
2022-07-28 07:05:00 【ithicker】
I took a piece of code
# -*- coding: utf-8 -*-
""" Created on Tue Mar 23 14:38:01 2021 @author: xinyi """
import xlwt
import requests
from lxml import etree
import time
all_info_list = []
def get_info(url):
html = requests.get(url)
selector = etree.HTML(html.text)
infos = selector.xpath('//ul[@class="all-img-list cf"]/li')
for info in infos:
title = info.xpath('div[2]/h4/a/text()')[0]
author = info.xpath('div[2]/p[1]/a[1]/text()')[0]
style_1 = info.xpath('div[2]/p[1]/a[2]/text()')[0]
style_2 = info.xpath('div[2]/p[1]/a[3]/text()')[0]
style = style_1+'·'+style_2
complete = info.xpath('div[2]/p[1]/span/text()')[0]
introduce = info.xpath('div[2]/p[2]/text()')[0].strip()
word = info.xpath('div[2]/p[3]/span/text()')[0].strip(' swastika ')
info_list = [title,author,style,complete,introduce,word]
all_info_list.append(info_list)
time.sleep(1)
if __name__ == '__main__':
urls = ['http://a.qidian.com/?page={}'.format(str(i)) for i in range(1,101)]
for url in urls:
get_info(url)
header = ['title','author','style','complete','introduce','word']
book = xlwt.Workbook(encoding='utf-8')
sheet = book.add_sheet('Sheet1')
for h in range(len(header)):
sheet.write(0, h, header[h])
i = 1
for list in all_info_list:
j = 0
for data in list:
sheet.write(i, j, data)
j += 1
i += 1
book.save('xiaoshuo.xls')

3、 ... and . The final code
# -*- coding: utf-8 -*-
""" Created on Tue Mar 23 14:38:01 2021 @author: xinyi """
import xlwt
import requests
from lxml import etree
import time
all_info_list = []
def get_info(url):
html = requests.get(url)
selector = etree.HTML(html.text)
infos = selector.xpath('//ul[@class="all-img-list cf"]/li')
for info in infos:
title = info.xpath('div[2]/h4/a/text()')[0]
author = info.xpath('div[2]/p[1]/a[1]/text()')[0]
style_1 = info.xpath('div[2]/p[1]/a[2]/text()')[0]
style_2 = info.xpath('div[2]/p[1]/a[3]/text()')[0]
style = style_1+'·'+style_2
complete = info.xpath('div[2]/p[1]/span/text()')[0]
introduce = info.xpath('div[2]/p[2]/text()')[0].strip()
word = info.xpath('div[2]/p[3]/span/text()')[0].strip(' swastika ')
info_list = [title,author,style,complete,introduce,word]
all_info_list.append(info_list)
time.sleep(1)
if __name__ == '__main__':
urls = ['http://a.qidian.com/?page={}'.format(str(i)) for i in range(1,101)]
for url in urls:
get_info(url)
header = ['title','author','style','complete','introduce','word']
book = xlwt.Workbook(encoding='utf-8')
sheet = book.add_sheet('Sheet1')
for h in range(len(header)):
sheet.write(0, h, header[h])
i = 1
for list in all_info_list:
j = 0
for data in list:
sheet.write(i, j, data)
j += 1
i += 1
book.save('xiaoshuo.xls')
Four . Reference material
Reference resources 1
Reference resources 2
Reference resources 3
Reference resources 4
Reference resources 5
Reference resources 6:mango
Thank you for your excellent articles
边栏推荐
- NAT和PAT的原理及配置
- How to describe a bug and the definition and life cycle of bug level
- [learning notes] coding ability
- Vmware workstation configuration net mode
- JS string method Encyclopedia
- Operation document tree
- Test interview questions collection (V) | automated testing and performance testing (with answers)
- Escape character notes
- DHCP service
- 爬虫学习总结
猜你喜欢

MOOC翁恺C语言 第四周:进一步的判断与循环:3.多路分支4.循环的例子5.判断和循环常见的错误

NAT-网络地址转换

MOOC翁恺C语言第七周:数组运算:1.数组运算2.搜索3.排序初步

Blue Bridge Cup square filling number

Custom component -- data listener

MOOC翁恺C语言 第六周:数组与函数:1.数组2.函数的定义与使用3.函数的参数和变量4.二维数组

PXE unattended installation management

FTP service

Canvas drawing 1

SSH service configuration
随机推荐
Applet creation component
DHCP principle and configuration
RAID disk array
Test interview questions collection (V) | automated testing and performance testing (with answers)
Detailed explanation of LNMP construction process
Blue bridge code error ticket
Custom component -- data listener
Traversal binary tree
Use powercli to create a custom esxi ISO image
DOM operation cases
Esxi community network card driver updated again
Understanding of C language EOF
Small turtle C (Chapter 5 loop control structure program 567) break and continue statements
DOM - Events
MySQL common commands
Common models in software development
小甲鱼C(第五章循环控制结构程序567)break和continue语句
MOOC Weng Kai C language week 3: judgment and cycle: 1. Judgment
MySQL installation and use
Array to linked list