当前位置：网站首页>疫情数据分析平台工作报告【2】接口API

疫情数据分析平台工作报告【2】接口API

2022-06-12 04:09:00 【m0_55675803】

接口api申请

请求接口：/nCoV/api/overall
请求方式：GET
返回自爬虫运行开始（2020年1月24日下午4:00）至今，病毒研究情况以及全国疫情概览，可指定返回数据为最新发布数据或时间序列数据。

变量名	注释
latest	1：返回最新数据（默认）0：返回时间序列数据（已废除）

返回数据

变量名	注释
generalRemark	全国疫情信息概览
remarkX	注释内容，X为1~5
note1	病毒名称
note2	传染源
note3	传播途径
currentConfirmedCount(Incr)	现存确诊人数（较昨日增加数量）值为confirmedCount(Incr) - curedCount(Incr) - deadCount(Incr)
confirmedCount(Incr)	累计确诊人数（较昨日增加数量）
suspectedCount(Incr)	疑似感染人数（较昨日增加数量）
curedCount(Incr)	治愈人数（较昨日增加数量）
deadCount(Incr)	死亡人数（较昨日增加数量）
seriousCount(Incr)	重症病例人数（较昨日增加数量）
updateTime	数据最后变动时间

请求接口：/nCoV/api/provinceName
请求方式：GET
返回数据库内有数据条目的国家、省份、地区、直辖市列表。

变量名	注释
lang	返回数据的语言。中文：zh（默认）英文：en

示例

/nCoV/api/provinceName
返回中文版国家、省份、地区或直辖市列表。
/nCoV/api/provinceName?lang=en
返回英文版国家、省份、地区或直辖市列表。

请求接口：/nCoV/api/area
请求方式：GET
返回自2020年1月22日凌晨3:00（爬虫开始运行）至今，中国所有省份、地区或直辖市及世界其他国家的所有疫情信息变化的时间序列数据（精确到市），能够追溯确诊/疑似感染/治愈/死亡人数的时间序列。
注：自2020年1月22日凌晨3:00至2020年1月24日凌晨3:40之间的数据只有省级数据，自2020年1月24日起，丁香园才开始统计并公开市级数据。

变量名	注释
latest
province	国家、省份、地区或直辖市的中文名称，如：美国、湖北省、香港、北京市。具体的名称列表可以通过/nCoV/api/provinceName?lang=zh获取。
provinceEng	国家、省份、地区或直辖市的英文名称，如：湖北省、香港、北京市。具体的名称列表可以通过/nCoV/api/provinceName?lang=en获取。请注意大小写规范，应与/nCoV/api/provinceName?lang=en保持一致。

返回数据

变量名	注释
locationId	城市编号中国大陆城市编号为邮编，中国大陆以外城市编号暂不知规则
continent(English)Name	大洲（英文）名称
country(English)Name	国家（英文）名称
province(English)Name	省份、地区或直辖市（英文）全称
provinceShortName	省份、地区或直辖市简称
currentConfirmedCount	现存确诊人数，值为confirmedCount - curedCount - deadCount
confirmedCount	累计确诊人数
suspectedCount	疑似感染人数
curedCount	治愈人数
deadCount	死亡人数
comment	其他信息
cities	下属城市的情况
updateTime	数据更新时间

示例

/nCoV/api/area?latest=1&province=湖北省
返回湖北省疫情最新数据
/nCoV/api/area?latest=0&province=湖北省
返回湖北省疫情的时间序列数据
/nCoV/api/area?latest=1
返回中国全部城市及世界其他国家疫情最新数据

请求接口：/nCoV/api/news
请求方式：GET
返回所有与疫情有关的新闻信息，包含数据来源以及数据来源链接。
按发布顺序倒序排列。

变量名	注释
page	返回新闻的页码。默认返回第1页
num	返回新闻每页的数量。默认为10则。

返回数据

变量名	注释
pubDate	新闻发布时间
title	新闻标题
summary	新闻内容概述
infoSource	数据来源
sourceUrl	来源链接
province	省份或直辖市名称
provinceId	省份或直辖市代码

示例

/nCoV/api/news?page=1&num=10
返回所有地区范围内第1页的新闻，每页10则。

请求接口：/nCoV/api/rumors
请求方式：GET
返回与疫情有关的谣言以及丁香园的辟谣。
按发布顺序倒序排列。

变量名	注释
rumorType	0：返回谣言（默认）1：返回可信信息2：返回尚未证实信息
page	返回谣言的页码。默认返回第1页
num	返回每页谣言的数量。默认为10则。

返回数据

变量名	注释
id	谣言编号
title	谣言标题
mainSummary	辟谣内容概述
body	辟谣内容全文
sourceUrl	来源链接

/nCoV/api/rumors?page=1&num=10&rumorType=1
返回第2页可信信息，每页10则，即返回所有可信信息的第11至20则。

微软运营的 COVID-19 数据集

# JSON schema of full text documents


{
    
    "paper_id": <str>,                      # 40-character sha1 of the PDF
    "metadata": {
    
        "title": <str>,
        "authors": [                        # list of author dicts, in order
            {
    
                "first": <str>,
                "middle": <list of str>,
                "last": <str>,
                "suffix": <str>,
                "affiliation": <dict>,
                "email": <str>
            },
            ...
        ],
        "abstract": [                       # list of paragraphs in the abstract
            {
    
                "text": <str>,
                "cite_spans": [             # list of character indices of inline citations
                                            # e.g. citation "[7]" occurs at positions 151-154 in "text"
                                            # linked to bibliography entry BIBREF3
                    {
    
                        "start": 151,
                        "end": 154,
                        "text": "[7]",
                        "ref_id": "BIBREF3"
                    },
                    ...
                ],
                "ref_spans": <list of dicts similar to cite_spans>,     # e.g. inline reference to "Table 1"
                "section": "Abstract"
            },
            ...
        ],
        "body_text": [                      # list of paragraphs in full body
                                            # paragraph dicts look the same as above
            {
    
                "text": <str>,
                "cite_spans": [],
                "ref_spans": [],
                "eq_spans": [],
                "section": "Introduction"
            },
            ...
            {
    
                ...,
                "section": "Conclusion"
            }
        ],
        "bib_entries": {
    
            "BIBREF0": {
    
                "ref_id": <str>,
                "title": <str>,
                "authors": <list of dict>       # same structure as earlier,
                                                # but without `affiliation` or `email`
                "year": <int>,
                "venue": <str>,
                "volume": <str>,
                "issn": <str>,
                "pages": <str>,
                "other_ids": {
    
                    "DOI": [
                        <str>
                    ]
                }
            },
            "BIBREF1": {
    },
            ...
            "BIBREF25": {
    }
        },
        "ref_entries":
            "FIGREF0": {
    
                "text": <str>,                  # figure caption text
                "type": "figure"
            },
            ...
            "TABREF13": {
    
                "text": <str>,                  # table caption text
                "type": "table"
            }
        },
        "back_matter": <list of dict>           # same structure as body_text
    }
}

可以使用下列代码连接至该代码托管服务器。

from azure.storage.blob import BlockBlobService

# storage account details
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = "sv=2019-02-02&ss=bfqt&srt=sco&sp=rlcup&se=2025-04-14T00:21:16Z&st=2020-04-13T16:21:16Z&spr=https&sig=JgwLYbdGruHxRYTpr5dxfJqobKbhGap8WUtKFadcivQ%3D"

# create a blob service
blob_service = BlockBlobService(
    account_name=azure_storage_account_name,
    sas_token=azure_storage_sas_token,
)

CORD-19 数据存储在 covid19temp 容器中。下面是容器中的文件结构以及示例文件。

metadata.csv
custom_license/
    pdf_json/
        0001418189999fea7f7cbe3e82703d71c85a6fe5.json        # filename is sha-hash
        ...
    pmc_json/
        PMC1065028.xml.json                                  # filename is the PMC ID
        ...
noncomm_use_subset/
    pdf_json/
        0036b28fddf7e93da0970303672934ea2f9944e7.json
        ...
    pmc_json/
        PMC1616946.xml.json
        ...
comm_use_subset/
    pdf_json/
        000b7d1517ceebb34e1e3e817695b6de03e2fa78.json
        ...
    pmc_json/
        PMC1054884.xml.json
        ...
biorxiv_medrxiv/                                             # note: there is no pmc_json subdir
    pdf_json/
        0015023cc06b5362d332b3baf348d11567ca2fbb.json
        ...

每个 .json 文件对应于数据集中的一篇文章。标题、作者、摘要和（如适用）全文数据都存储在这里。该数
据集附带一个 metadata.csv，记录了相关基本信息。
读取对应的文件和相关信息列。

# container housing CORD-19 data
container_name = "covid19temp"

# download metadata.csv
metadata_filename = 'metadata.csv'
blob_service.get_blob_to_path(
    container_name=container_name,
    blob_name=metadata_filename,
    file_path=metadata_filename
)

simple_schema = ['cord_uid', 'source_x', 'title', 'abstract', 'authors', 'full_text_file', 'url']

def make_clickable(address):
    '''Make the url clickable'''
    return '<a href="{0}">{0}</a>'.format(address)

def preview(text):
    '''Show only a preview of the text data.'''
    return text[:30] + '...'

format_ = {
    'title': preview, 'abstract': preview, 'authors': preview, 'url': make_clickable}

metadata[simple_schema].head().style.format(format_)

num_entries = len(metadata)
print("There are {} many entries in this dataset:".format(num_entries))

metadata_with_text = metadata[metadata['full_text_file'].isna() == False]
with_full_text = len(metadata_with_text)
print("-- {} have full text entries".format(with_full_text))

with_doi = metadata['doi'].count()
print("-- {} have DOIs".format(with_doi))

with_pmcid = metadata['pmcid'].count()
print("-- {} have PubMed Central (PMC) ids".format(with_pmcid))

with_microsoft_id = metadata['Microsoft Academic Paper ID'].count()
print("-- {} have Microsoft Academic paper ids".format(with_microsoft_id))

来自bing的受信任可靠来源数据集

修改后的数据集一共提供 CSV、JSON、JSON-Lines 和 Parquet 格式。
全部列在下方了:
https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.csv
https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.json
https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.jsonl
https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet

在这里插入图片描述

加载并验证该数据集

import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

df = pd.read_parquet("https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet")
df.head(10)

df_Worldwide=df[df['country_region']=='Worldwide']
df_Worldwide_pivot=df_Worldwide.pivot_table(df_Worldwide, index=['country_region','updated'])

df_Worldwide_pivot

df_Worldwide.plot(kind='line',x='updated',y="confirmed",grid=True)
df_Worldwide.plot(kind='line',x='updated',y="deaths",grid=True)
df_Worldwide.plot(kind='line',x='updated',y="confirmed_change",grid=True)
df_Worldwide.plot(kind='line',x='updated',y="deaths_change",grid=True)

Our World in Data 提供的数据来源

Metrics	Source	Updated	Countries
Vaccinations	Official data collated by the Our World in Data team	Daily	218
Tests & positivity	Official data collated by the Our World in Data team	Weekly	193
Hospital & ICU	Official data collated by the Our World in Data team	Daily	47
Confirmed cases	JHU CSSE COVID-19 Data	Daily	217
Confirmed deaths	JHU CSSE COVID-19 Data	Daily	217
Reproduction rate	Arroyo-Marioli F, Bullano F, Kucinskas S, Rondón-Moreno C	Daily	192
Policy responses	Oxford COVID-19 Government Response Tracker	Daily	187
Other variables of interest	International organizations (UN, World Bank, OECD, IHME…)	Fixed	241

Variable	Description
total_cases	Total confirmed cases of COVID-19. Counts can include probable cases, where reported.
new_cases	New confirmed cases of COVID-19. Counts can include probable cases, where reported. In rare cases where our source reports a negative daily change due to a data correction, we set this metric to NA.
new_cases_smoothed	New confirmed cases of COVID-19 (7-day smoothed). Counts can include probable cases, where reported.
total_cases_per_million	Total confirmed cases of COVID-19 per 1,000,000 people. Counts can include probable cases, where reported.
new_cases_per_million	New confirmed cases of COVID-19 per 1,000,000 people. Counts can include probable cases, where reported.
new_cases_smoothed_per_million	New confirmed cases of COVID-19 (7-day smoothed) per 1,000,000 people. Counts can include probable cases, where reported.

url = 'https://api.tianapi.com/ncovabroad/index'

# 国际疫情新闻、疫情概况、风险地区

# 名称 类型 示例值 说明
# modifyTime int 1584159933000 数据修改时间
# continents string 欧洲 大洲
# provinceName string 意大利 地区名
# currentConfirmedCount int 14955 现存确诊人数
# confirmedCount int 17660 累计确诊人数
# suspectedCount int 1439 治愈人数
# deadCount int 1266 死亡人数
# locationId int 965008 地理位置编号
# countryShortCode string ITA 国家代码



query_params = {
    "key": 'd334721cf6eba2d619a5855420ec352c'}

res = requests.get(url, params=query_params)
res_dict = res.json()
print(res_dict)

import requests

url = 'http://api.tianapi.com/ncov/index'

# 国内疫情新闻、疫情概况、风险地区

# 名称 类型 示例值 说明
# news object 新闻资讯对象 疫情新闻动态列表
# desc object 疫情概况对象 全球疫情详细数据
# riskarea object 风险地区对象 全国风险地区，high高风险、mid中风险
# currentConfirmedCount int 55881 现存确诊人数
# confirmedCount int 74679 累计确诊人数
# suspectedCount int 2053 累计境外输入人数
# curedCount int 16676 累计治愈人数
# deadCount int 2122 累计死亡人数
# seriousCount int 306 现存无症状人数
# suspectedIncr int 8 新增境外输入人数
# currentConfirmedIncr int -2002 相比昨天现存确诊人数
# confirmedIncr int 403 相比昨天累计确诊人数
# curedIncr int 2289 相比昨天新增治愈人数
# deadIncr int 116 相比昨天新增死亡人数
# seriousIncr int 4 相比昨天现存无症状人数


query_params = {
    "key": 'd334721cf6eba2d619a5855420ec352c'}

res = requests.get(url, params=query_params)
res_dict = res.json()
print(res_dict)

import requests

url = 'https://api.muxiaoguo.cn/api/epidemic'

# MXG api
# 警告:容易超时
# 查询参数
# [macroscopically(高危地区)，epidemicInfectionData(疫情数据)，epidemicHotspot(疫情热点)]

query_params = {
    "type": 'macroscopically'}

res = requests.get(url, params=query_params)
res_dict = res.json()
print(res_dict)

import requests
url = 'https://view.inews.qq.com/g2/getOnsInfo?name=disease_other'
# 国内历史
# 警告:易超时
res = requests.get(url)
res_dict = res.json()
print(res_dict)

原网站

版权声明
本文为[m0_55675803]所创，转载请带上原文链接，感谢
https://blog.csdn.net/m0_55675803/article/details/125021902

当前位置：网站首页>疫情数据分析平台工作报告【2】接口API

疫情数据分析平台工作报告【2】接口API

边栏推荐

猜你喜欢

随机推荐