当前位置:网站首页>Epidemic data analysis platform work report [8.5] additional crawlers and drawings
Epidemic data analysis platform work report [8.5] additional crawlers and drawings
2022-06-12 04:14:00 【m0_ fifty-five million six hundred and seventy-five thousand ei】
Import dependence
import csv
import time
import requests
from bs4 import BeautifulSoup
Determination of crawled domain name
url = 'https://ncov2019.live/'
write in UserAgent
header = {
'content-type': 'text/html;charset=UTF-8',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36 Edg/87.0.664.41',
}
Data types
data = {
'Name': '', 'Confirmed': '', 'Confirmed Per Million': '', 'Confirmed Changes Today': '',
'Confirmed Percentage Day Change': '', 'Critical': '', 'Deceased': '', 'Deceased Per Million': '',
'Deceased Changes Today': '', 'Death Percentage Day Change': '', 'Tests': '', 'Active': '', 'Recovered': '',
'Recovered Per Million': '', 'Population': ''}
Configure output correlation
csv_file = open('csvFile/Covid19Data' + time.strftime("%Y-%m-%d") + '.csv', "w", newline='', encoding="utf_8_sig")
csv_writer = csv.DictWriter(csv_file,
fieldnames=data.keys())
csv_writer.writeheader()
r = requests.get(url=url, headers=header)
with open('htmlFile/page' + time.strftime("%Y-%m-%d-%H-%M") + '.html', 'w', encoding='utf-8') as f:
f.write(r.text)
soup = BeautifulSoup(r.text, 'html5lib')
items = soup.find('table', id='sortable_table_world').find('tbody').find_all('tr')
Cycle data
for item in items:
index = 0
# Get all kinds of data
for key in data.keys():
s = item.select('td')[index].text
# Handle special characters
if '*' in s:
s = ' '.join(s.split()[1:])
s = s.strip()
# Processing empty data
if s == 'Unknown':
s = 'NA'
# Dealing with countries with less than a million people
if s == '0' and (
key == 'Confirmed Per Million' or key == 'Deceased Per Million' or key == 'Recovered Per Million'):
s = 'NA'
data[key] = s
index += 1
csv_writer.writerow(data)
print(data['Name'])
csv_file.close()
Overall drawing part
# 15 In the day , The overall change trend of the global COVID-19
import matplotlib.pyplot as plt
import pandas as pd
from pandas import DataFrame
plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] # Used to display Chinese labels normally
plt.rcParams['savefig.dpi'] = 300 # Image pixels
plt.rcParams['figure.dpi'] = 300 # The resolution of the
plt.style.use('Solarize_Light2')
df = DataFrame()
for i in range(1, 16):
if i >= 10:
str_num = str(i)
else:
str_num = '0' + str(i)
df[str(i)] = \
pd.read_csv('csvFile/Covid19Data2020-12-' + str_num + '.csv',
encoding='utf-8', thousands=',', nrows=1).loc[0] # Read only the first row of global data , And remove the commas in the thousandths
print(df)
print(df.at['Confirmed', '1'])
print(df.loc['Confirmed'].to_list())
fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(12, 8)) # nrows=2, ncols=2 figsize=(13, 13)
day_list = list(range(1, 16))
# 1. Show the change of the total confirmed number of COVID-19 in the world
Confirmed_list = df.loc['Confirmed'].to_list()
ax[0, 0].plot(day_list, Confirmed_list, linewidth=2.0, marker='o')
ax[0, 0].text(day_list[0] - 1.5, Confirmed_list[0] + 3e5, '{:.2e}'.format(Confirmed_list[0]), color='r', size=13,
weight='bold')
ax[0, 0].text(day_list[-1] - 1.5, Confirmed_list[-1] + 3e5, '{:.2e}'.format(Confirmed_list[-1]), color='r', size=13,
weight='bold')
ax[0, 0].set_xticks(day_list)
ax[0, 0].set_xlabel(' date ')
ax[0, 0].set_ylabel(' Number of confirmed cases ')
ax[0, 0].set_title('12 month 1 solstice 15 Changes in the number of new crowns diagnosed in the world in Japan ', y=1.1, size=13)
# 2. Show the changes in the growth rate of the global COVID-19
Confirmed_Percentage_list = df.loc['Confirmed Percentage Day Change'].to_list()
Confirmed_Percentage_list = [float(i[:-1]) for i in Confirmed_Percentage_list]
ax[0, 1].plot(day_list, Confirmed_Percentage_list, linewidth=2.0, marker='o')
ax[0, 1].set_xticks(day_list)
ax[0, 1].set_xlabel(' date ')
ax[0, 1].set_yticks([0.10, 0.15, 0.20, 0.25, 0.30, 0.35])
ax[0, 1].set_yticklabels(['0.10%', '0.15%', '0.20%', '0.25%', '0.30%', '0.35%'])
ax[0, 1].set_ylabel(' The daily growth rate of the number of confirmed cases ')
ax[0, 1].set_title('12 month 1 solstice 15 The growth rate of the total number of newly diagnosed patients in the world in Japan ', y=1.1, size=13)
# 3. Show the current number of confirmed cases
Active_list = df.loc['Active'].to_list()
ax[0, 2].plot(day_list, Active_list, linewidth=2.0, marker='o')
ax[0, 2].text(day_list[0] - 1.5, Active_list[0] + 8e4, '{:.2e}'.format(Active_list[0]), color='r', size=13,
weight='bold')
ax[0, 2].text(day_list[-1] - 1.5, Active_list[-1] + 8e4, '{:.2e}'.format(Active_list[-1]), color='r', size=13,
weight='bold')
ax[0, 2].set_xticks(day_list)
ax[0, 2].set_xlabel(' date ')
ax[0, 2].set_ylabel(' Number of confirmed cases ')
ax[0, 2].set_title('12 month 1 solstice 15 The net change in the number of people with new crown diagnosis in the world ', y=1.1, size=13)
# 4. Show the changes in the death toll of the global COVID-19
Deceased_list = df.loc['Deceased'].to_list()
ax[1, 0].plot(day_list, Deceased_list, linewidth=2.0, marker='o')
ax[1, 0].text(day_list[0] - 1.5, Deceased_list[0] + 6e3, '{:.2e}'.format(Deceased_list[0]), color='r', size=13,
weight='bold')
ax[1, 0].text(day_list[-1] - 1.5, Deceased_list[-1] + 6e3, '{:.2e}'.format(Deceased_list[-1]), color='r', size=13,
weight='bold')
ax[1, 0].set_xticks(day_list)
ax[1, 0].set_xlabel(' date ')
ax[1, 0].set_ylabel(' The number of deaths ')
ax[1, 0].set_title('12 month 1 solstice 15 Changes in the global death toll of the new crown in Japan ', y=1.1, size=13)
# 5. Show the changes in the number of people recovering from the global COVID-19
Recovered_list = df.loc['Recovered'].to_list()
ax[1, 1].plot(day_list, Recovered_list, linewidth=2.0, marker='o')
ax[1, 1].text(day_list[0] - 1.5, Recovered_list[0] + 3e5, '{:.2e}'.format(Recovered_list[0]), color='r', size=13,
weight='bold')
ax[1, 1].text(day_list[-1] - 1.5, Recovered_list[-1] + 3e5, '{:.2e}'.format(Recovered_list[-1]), color='r', size=13,
weight='bold')
ax[1, 1].set_xticks(day_list)
ax[1, 1].set_xlabel(' date ')
ax[1, 1].set_ylabel(' Number of rehabilitated people ')
ax[1, 1].set_title('12 month 1 solstice 15 Changes in the number of new champions in the world ', y=1.1, size=13)
# 5. Show the changes in the number of people tested for the global COVID-19
Tests_list = df.loc['Tests'].to_list()
ax[1, 2].plot(day_list, Tests_list, linewidth=2.0, marker='o')
ax[1, 2].text(day_list[0] - 1.5, Tests_list[0] + 3e6, '{:.2e}'.format(Tests_list[0]), color='r', size=13,
weight='bold')
ax[1, 2].text(day_list[-1] - 1.5, Tests_list[-1] + 3e6, '{:.2e}'.format(Tests_list[-1]), color='r', size=13,
weight='bold')
ax[1, 2].set_xticks(day_list)
ax[1, 2].set_xlabel(' date ')
ax[1, 2].set_ylabel(' The number of people tested ')
ax[1, 2].set_title('12 month 1 solstice 15 Changes in the number of new crown testers in the world ', y=1.1, size=13)
plt.tight_layout()
plt.subplots_adjust()
plt.savefig('imgResult/ Overall change trend .png')
plt.show()
Recovery speed , That is, the highest number of people recovered and the highest proportion of diagnosed
# """
# Recovery rate ( Number of rehabilitated people / Number of confirmed cases ) The highest 10 A country ;
# """
import matplotlib.pyplot as plt
import pandas as pd
plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] # Used to display Chinese labels normally
plt.rcParams['savefig.dpi'] = 300 # Image pixels
plt.rcParams['figure.dpi'] = 300 # The resolution of the
plt.style.use('Solarize_Light2')
df = pd.read_csv('csvFile/Covid19Data2020-12-15.csv', encoding='utf-8', skiprows=[1], thousands=',', usecols=[0, 1, 12])
df['Recovered rate'] = df['Recovered'] / df['Confirmed']
print(df)
df.sort_values(by='Recovered rate', inplace=True, ascending=False)
# Take out the one with the lowest mortality 10 A country
df_res = df[0:10]
df_res = df_res.reset_index(drop=True) # Reset index
print(df_res)
plt.bar(list(range(0, 50, 5)), df_res['Recovered rate'].to_list(), width=2, alpha=0.5, color='orange')
plt.xticks(list(range(0, 50, 5)), labels=df_res['Name'].to_list(), rotation=35)
plt.yticks([0.0, 0.2, 0.4, 0.6, 0.8, 1.0], ['0', '0.2%', '0.4%', '0.6%', '0.8%', '1.0%'])
plt.tick_params(labelsize=9)
for a, b in zip(list(range(0, 50, 5)), df_res['Recovered rate'].to_list()): # Display numbers on the histogram
plt.text(a, b + 0.008, '%.2f%%' % (b * 100), ha='center', va='bottom', fontsize=9, color='black')
plt.title(' The highest recovery rate 10 A country ')
plt.xlabel(" Country ")
plt.ylabel(" Recovery rate ")
plt.tight_layout()
plt.savefig('imgResult/ The highest recovery rate 10 A country .png')
plt.show()
df_res.to_csv('csvResult/ The highest recovery rate 10 A country .csv', index=False)
边栏推荐
- 动规(14)-三角形最佳路径问题
- How to have peace of mind? Get a new heart
- Smart panel WiFi linkage technology, esp32 wireless chip module, Internet of things WiFi communication application
- 【C语言】变量本质分析
- DS18B20 digital thermometer (I) electrical characteristics, power supply and wiring mode
- 疫情数据分析平台工作报告【6.5】疫情地图
- Object detection model rfbnet -- a very useful model
- Network tester operation manual renix rack management
- 疫情数据分析平台工作报告【2】接口API
- PyTorch中的Sequential、ModuleList和ModuleDict用法总结
猜你喜欢
![[软件工具][原创]voc数据集类别名批量修改工具使用教程](/img/25/31d771c9770bb7f455f35e38672170.png)
[软件工具][原创]voc数据集类别名批量修改工具使用教程

【C语言】封装接口(加减乘除)

Detailed explanation of software testing process

Paper recommendation: relicv2, can the new self supervised learning surpass supervised learning on RESNET?

Smart Panel wifi Linkage Technology, esp32 wireless chip module, Internet of Things WiFi Communication Application

Esp32c3 remote serial port

分布式锁介绍

How to have peace of mind? Get a new heart

Cloud native overview

【C语言】变量本质分析
随机推荐
R语言write函数:使用write函数将数据写入RStudio控制台、write函数将数据写入新的Rstudio窗口(启动新的窗口)
Kotlin协程协程作用域,CoroutineScope MainScope GlobalScope viewModelScope lifecycleScope 分别代表什么
JSP implementation of bank counter business performance evaluation system
[Yugong series] March 2022 asp Net core Middleware - current limiting
认真工作对自己到底意味着什么?
spacy中en_core_web_sm安装问题
R语言plotly可视化:plotly可视化分组(grouped)小提琴图(grouped violin plot in R with plotly)
数据库新建表,以前没问题的,今天
Network tester operation manual renix rack management
Kinematic gauge (14) - Triangle optimal path problem
疫情数据分析平台工作报告【6】可视化绘图
【C语言】封装接口(加减乘除)
动规(15)-最低通行费
[FPGA chaos] implementation of FPGA based chaotic system Verilog
Mosaïque d'images basée sur la matrice de transformation
Zabbix6.0新功能Geomap 地图标记 你会用吗?
FOB,CIF,CFR的区别是什么?
【C语言】变量本质分析
[fpga+gps receiver] detailed design introduction of dual frequency GPS receiver based on FPGA
[Yugong series] March 2022 asp Net core Middleware - cross domain