当前位置:网站首页>HoloView -- Tabular Datasets
HoloView -- Tabular Datasets
2022-08-01 08:23:00 【terminal frog】
原文地址HoloView – Tabular Datasets
表格数据集
As we have found,Element is a simple packaging surrounding data,Provides semantically meaningful visual representation.HoloViews可以处理多种数据类型,But many of them can be classified as:
- Tabular: 表格数据
- Gridded: 阵列数据
We will be divided into two chapters to describe the above two kinds of structure,In this article to describe table data,Including the following various types of(包括NumPy array组成的字典类型,pandas的DataFrames, daskDataFrames, 和 xarrayDataArrays``Datasets`).
表格化
Tabular data (also called columnar data) is one of the most common, general, and versatile data formats, corresponding to how data is laid out in a spreadsheet.
表格数据(Also known as columnar data)是最常见、One of the general and common data format,Corresponds to the layout of data in a spreadsheet.
There are many ways to convert data into tabular form,For interactive analysis,tidy dataProvides a simple and flexible method.For neat data,表格的列表示变量或维度,行表示观察值.The best way to understand this format is to look at the data set:
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', 'matplotlib')```
```python
path = r"C:\Users\samsung\AppData\Local\Programs\Python\Python38\Lib\site-packages\holoviews\examples\getting_started"
import os
os.chdir(path)
diseases = pd.read_csv('../assets/diseases.csv.gz')
diseases.head()

This particular data set is[华尔街日报]((http://graphics.wsj.com/infectious-diseases-and-vaccines/#b02g20t20w15)A good visual news topic.The theme of the Wall Street journal detailed about the incidence of various diseases with time,The data can be downloaded from the university of Pittsburgh, corresponding programUniversity of Pittsburgh’s Project Tycho.我们可以看到,我们有5个数据列,Each data column corresponding to specify particular measured values of the independent variable(“年”、“周”、“州”),Or report the actual measured values/因变量(“麻疹”或“百日咳”发病率).
To understand the two relationships between variables is crucial for a visual,But no these information in the form.Apparently drawn“周”与“州”The relationship between figure it doesn't make any sense,And each state of measles rate(On the other dimensions for the average)The result will fit,But it can't infer that these constraints in a table format.So we will first create aDataset这个HoloView对象,该DatasetDeclares the variables(kdims 这个维度)和因变量(vdims这个维度).
vdims = [('measles', 'Measles Incidence'), ('pertussis', 'Pertussis Incidence')]
ds = hv.Dataset(diseases, ['Year', 'State'], vdims)
这里,We use the optional based on tuple syntax**(name,label)**来为vdimsThis dimension for the description of the more interesting,而对于kvdims,Here is to use the default character.从上述可以看到,We did not declare how to deal with“Week”,Because we are here only for"Year"和"State"感兴趣.
ds = ds.aggregate(function=np.mean)
ds

“repr”Show us“Dataset”的“kdims”(在方括号中)和“vdims”(在括号中).因为“数据集”Can accommodate all combinations of the dimension,所以它不是立即可见的.From the four dimensions to the two-dimensional page there is no single clear map,So the above shows the text said.
In order to make these data visualization,We need to provide more metadata,Method is to choose a large library of elements to help answer we want to ask questions about the data.Perhaps the most obvious form the data set is a“曲线”,The incidence of shows in each state every year.We can pull one by one the columns from the original data set,But now we have declared the information about dimensions,The cleanest way is to use将数据集The dimension mapped to the elements of the dimension.到:
layout = (ds.to(hv.Curve, 'Year', 'measles') + ds.to(hv.Curve, 'Year', 'pertussis')).cols(1)
layout.opts(
opts.Curve(width=600, height=250, framewise=True))

在这里,We specify the two respectively show that the incidence of measles and whooping cough“曲线”元素(vdims),每年(kdim),And place it in a vertical column.您会注意到,Although we only value dimension specifies the short name,But the picture shows us in“数据集”The name of the statement is longer than the other(“Measles incidence”、“Pertussis incidence”).
您还会注意到,We automatically receive a drop-down menu,选择要查看的“州”.每个“曲线”Ignores the value dimension of unused,Because of the extra measurement won't influence each other,但HoloViewsMust for each such a figure of each key dimensions to do something.如果“州”(Or any other critical dimensions)No draw or aggregation in some way,则HoloViewsMust let the user select a value for its,So select widgets.About how to handle the extra dimensions or just additional data range of the other options as shown below.
selecting
We often want to do is we only select a subset of the data.selectMethods to make this very simple,Allows us to select only one data,list或者tuple.这里我们使用selectMethods select cases of measles in four states,When the application of thisselectH后,我们用 .toMethods to make its use in the form of a histogram showing,使用Year和State作为kdims,‘Measles Incidence’ 作为vdims
states = ['New York', 'New Jersey', 'California', 'Texas']
bars = ds.select(State=states, Year=(1980, 1990)).to(hv.Bars, ['Year', 'State'], 'measles').sort()
bars.opts(
opts.Bars(width=800, height=400, tools=['hover'], xrotation=90, show_legend=False))

编者注:
实话实说,自己直接用BokehAlso can not come out the effect,But see this position,The feeling can control things relatively less,不如直接写bokeh来的踏实,But, after all, hope can use it to tame big data mapping,Or start all over again for me
Faceting(Translated into Mosaic?)
We have seen above we didn't use“to”Explicitly assigned to the elements of the key dimensions of:They are grouped,Pop up a set of widgets,So that users can select values should be displayed at any time.然而,Use the widget is not always the most effective way to view the data,“数据集”允许您使用“数据集”Specify other alternatives.覆盖,.网格和.布局``方法.例如,我们可以使用“.grid”Layout on each state:
grouped = ds.select(State=states, Year=(1930, 2005)).to(hv.Curve, 'Year', 'measles')
gridspace = grouped.grid('State')
gridspace.opts(
opts.Curve(width=200, tools=['hover'], color='indianred'))

In addition to select data subset,HoloViewsSupport another common operation is to estimate aggregate.When we load the data set for the first time,我们在“周”Column on the aggregated,To calculate the average annual incidence of,Significantly reduced our data.因此,“聚合”Method for according to our data statistics is very useful.
Using a simple example of our data set is according to the“年”Calculate the mean and standard deviation of the incidence of measles.We can simply by passing the key dimensions to say this(在本例中只是“年”),And a function and optional“spreadfn”To calculate the statistics we want.“spreadfn”After the function name is attached to the dimension of,So that we can be quoted separately calculated value.Calculation after polymerization,We can easily convert it to“曲线”和“误差条”:
agg = ds.aggregate('Year', function=np.mean, spreadfn=np.std)
errorbars = hv.ErrorBars(agg,vdims=['measles', 'measles_std']).iloc[::2]
overlay = (hv.Curve(agg) * errorbars).redim.range(measles=(0, None))
overlay.opts(width=600)

通过这种方式,We can cube can be summarized as direct visualization of the data set,At the same time allow us along the dimension calculation of arbitrary statistical information.
最后
Welcome all kinds of cattle join study to explore,关于pythonVisual communication,New group,主要专注于bokeh与Holoview
可在QQ搜索群 python bokeh, 群号859297380,随时交流,互通有无
边栏推荐
- 【MySQL】操作表DML相关语句
- POJ1287联网题解
- 数据分析6
- UXDB如何返回当前数据库所有表的记录数?
- Summary of test points about app updates in different ways
- Delphi MDI appliction documents maximize display, remove buttons such as maximize and minimize
- [Tear AHB-APB Bridge by hand]~ Why aren't the lower two bits of the AHB address bus used to represent the address?
- leetcode-6132:使数组中所有元素都等于零
- The use of Golang: go template engine
- 微信小程序请求封装
猜你喜欢

网络基础学习

华为深度学习课程第六、七章
![[Tear AHB-APB Bridge by hand]~ Why aren't the lower two bits of the AHB address bus used to represent the address?](/img/fb/c95c5857024db001638cd484c5e78f.png)
[Tear AHB-APB Bridge by hand]~ Why aren't the lower two bits of the AHB address bus used to represent the address?

热修复技术可谓是百花齐放

【手撕AHB-APB Bridge】~ AHB地址总线的低两位为什么不用来表示地址呢?

【STM32】入门(二):跑马灯-GPIO端口输出控制

聊一聊ICMP协议以及ping的过程

C语言中编译时出现警告C4013(C语言不加函数原型产生的潜在错误)

灵魂发问:MySQL是如何解决幻读的?

How to generate and configure public key certificate in Alipay
随机推荐
VSCode 快捷键及通用插件推荐
codeforces每日5题(均1600)-第二十七天
Redis学习
Shell executes SQL to send emails
pytest接口自动化测试框架 | 集成Allure测试报告
network basic learning
数据分析5
优炫数据库支持Oracle哪几种时间及日期类型
VoLTE Basic Learning Series | Enterprise Voice Network Brief
Golang: go static file processing
C语言学习概览(一)
【HDLBits 刷题】Circuits(1)Combinational Logic
【STM32】入门(一):环境搭建、编译、下载、运行
LeetCode 415:字符串相加
TiDB的真实数据库数据是存在kv和还是pd上?
leetcode-6134:找到离给定两个节点最近的节点
正则表达式符号
[Tear AHB-APB Bridge by hand]~ Why aren't the lower two bits of the AHB address bus used to represent the address?
HoloView -- Tabular Datasets
静态Pod、Pod创建流程、容器资源限制