当前位置:网站首页>About Simple Data Visualization
About Simple Data Visualization
2022-06-26 00:07:00 【Dreamer DBA】
After learn it , you will know
- How to chart time series data with line plots and categorical quantities with bar charts.
- How to summarize data distributions with histograms and box plots
- How to summarize the relationship between variables with scatter plots.
1.1 Tutorial Overview
- Data Visualization
- Introduction to Matplotlib
- Line Plot
- Bar Chart
- Histogram Plot
- Box and Whisker Plot
- Scatter Plot
1.2 Data Visualization
Data visualization is an important skill in applied statistics and machine learning. Statistics does indeed focus on quantitative descriptions and estimations of data. Data visualization provides an important suite of tools for gaining a qualitative understanding.With a little domain knowledge, data visualizations can be used to express and demonstrate key relationships in plots and charts that are more visceral to yourself and stakeholders than measures of association or significance.There are five key plots that you need to know well for basic data visualization. They are the Line Plot, Bar Chart, Histogram Plot, Box and Whisker Plot, and the Scatter Plot. With a knowledge of these plots, you can quickly get a qualitative understanding of most data that you come across.
1.3 Introduction to Matplotlib
The Matplotlib library is the foundation for many other plotting libraries and plotting support in higher-level libraries such as Pandas. The Matplotlib provides a context, one in which one or more plots can be drawn before the image is shown or saved to file. The context can be accessed via functions on pyplot. The context can be imported as follows:
# important matplotlib context
from matplotlib import pyplotThere is some convention to import this context and name it plt.
# import matplotlib context
import matplotlib.pyplot as plt. Charts and plots are made by making and calling on context;
# create a plot
pyplot.plot(...)The drawings on the context can be shown in a new window by calling the show() function:
# display the plot
pyplot.show()Alternately, the drawings on the context can be saved to file, such as a PNG formatted image file. The savefig() function can be used to save images.
# save plot to file
pyplot.savefig('my_image.png')1.4 Line Plot( Line drawing 、 Broken line diagram )
A line plot is generally used to present observations collected at regular intervals. The x-axis represents the regular interval, such as time. The y-axis shows the observations, ordered by the x-axis and connected by a line. A line plot can be created by calling the plot() function and passing the x-axis data for the regular interval, and y-axis for the observations.
# create line plot
pyplot.plot(x, y)Line plots are useful for presenting time series data as well as any sequence data where there is an ordering between observations. The example below creates a sequence of 100 floating point values as the x-axis and a sine wave as a function of the x-axis as the observations on the y-axis. The results are plotted as a line plot.
# Example of a line plot
from numpy import sin
from matplotlib import pyplot
# consistent interval for x-axis
x = [x*0.1 for x in range(100)]
# function of x for y-axis
y = sin(x)
# create line plot
pyplot.plot(x,y)
# show line plot
pyplot.show()Running the example creates a line plot showing the familiar sine wave pattern on the y-axis across the x-axis with a consistent interval between observations.

1.5 Bar Chart( Histogram )
A bar chart is generally used to present relative quantities for multiple categories. The x-axis represents the categories and are spaced evenly. The y-axis represents the quantity for each category and is drawn as a bar from the baseline to the appropriate level on the y-axis. A bar chart can be created by calling the bar() function and passing the category names for the x-axis and the quantities for the y-axis.
# create bar chart
pyplot.bar(x,y)Bar charts can be useful for comparing multiple point quantities or estimations. The example below creates a dataset with three categories, each defined with a string label. A single random integer value is drawn for the quantity in each category.
# example of a bar chart
from random import seed
from random import randint
from matplotlib import pyplot
# seed the random number generator
seed(1)
# names for categories
x = ['red','green','blue']
# quantities for each category
y = [randint(0,100),randint(0,100),randint(0, 100)]
# create bar chart
pyplot.bar(x, y)
# show line plot
pyplot.show()1.6 Histogram Plot( Histogram )
A histogram plot is generally used to summarize the distribution of a data sample. The x-axis represents discrete bins or intervals for the observations. For example observations with values between 1 and 10 may be split into five bins, the values [1,2] would be allocated to the first bin,3,4] would be allocated to the second bin, and so on. The y-axis represents the frequency or count of the number of observations in the dataset that belong to each bin. Essentially, a data sample is transformed into a bar chart where each category on the x-axis represents an interval of observation values.
A histogram plot can be created by calling the hist() function and passing in a list or array that represents the data sample.
# example of creating a histogram plot
# create histogram plot
pyplot.hist(x)Histograms are valuable for summarizing the distribution of data samples. The example below creates a dataset of 1,000 random numbers drawn from a standard Gaussian distribution, then plots the dataset as a histogram.
# example of a histogram plot
from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot
# seed the random number generator
seed(1)
# random numbers drawn from a Gaussian distribution
x = randn(1000)
# create histogram plot
pyplot.hist(x)
# show line plot
pyplot.show()Running the example, we can see that the shape of the bars shows the bell-shaped curve of the Gaussian distribution. We can see that the function automatically chose the number of bins, in this case splitting the values into groups by integer value.

Often, careful choice of the number of bins can help to better expose the shape of the data distribution. The number of bins can be specified by setting the bins argument; for example:
# example of a histogram plot
from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot
# seed the random number generator
seed(1)
# random numbers drawn from a Gaussian distribution
x = randn(1000)
# create histogram plot
pyplot.hist(x, bins=100)
# show line plot
pyplot.show()
1.7 Box and Whisker Plot( boxplot )
A box and whisker plot, or boxplot for short, is generally used to summarize the distribution of a data sample. The x-axis is used to represent the data sample, where multiple boxplots can be drawn side by side on the x-axis if desired.The y-axis represents the observation values.
Boxplots can be drawn by calling the boxplot() function passing in the data sample as an array or list.
# create box and whisker plot
pyplot.boxplot(x)Boxplots are useful to summarize the distribution of a data sample as an alternative to the histogram. They can help to quickly get an idea of the range of common and sensible values in the box and in the whisker respectively. Because we are not looking at the shape of the distribution explicitly, this method is often used when the data has an unknown or unusual distribution, such as non-Gaussian.
# example of a box and whisker plot
from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot
# seed the random number generator
seed(1)
# random numbers drawn from a Gaussian distribution
x = [randn(1000),5 * randn(1000), 10 * randn(1000)]
# create box and wisker plot
pyplot.boxplot(x)
# show line plot
pyplot.show()Running the example creates a chart showing the three box and whisker plots. We can see that the same scale is used on the y-axis for each, making the first plot look squashed and the last plot look spread out. In this case, we can see the black box for the middle 50% of the data, the orange line for the median, the lines for the whiskers summarizing the range of sensible data, and finally dots for the possible outliers.

1.8 Scatter Plot( Scatter plot )
A scatter plot, or scatterplot, is generally used to summarize the relationship between two paired data samples. Paired data samples means that two measures were recorded for a given observation, such as the weight and height of a person. The x-axis represents observation values for the first sample, and the y-axis represents the observation values for the second sample. Each point on the plot represents a single observation.
Scatter plots can be created by calling the scatter() function and passing the two data sample arrays.
# create scatter plot
pyplot.scatter(x, y)Scatter plots are useful for showing the association or correlation between two variables. A correlation can be quantified, such as a line of best fit, that too can be drawn as a line plot on the same chart, making the relationship clearer. A dataset may have more than two measures (variables or columns) for a given observation. A scatter plot matrix is a cart containing scatter plots for each pair of variables in a dataset with more than two variables. The example below creates two data samples that are related. The first is a sample of random numbers drawn from a standard Gaussian. The second is dependent upon the first by adding a second random Gaussian value to the value of the first measure.
# example of a scatter plot
from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot
# seed(1)
# first variable
x = 20 * randn(1000) + 100
# second variable
y = x + (10 * randn(1000) + 50)
# create scatter plot
pyplot.scatter(x, y)
# show line plot
pyplot.show()
边栏推荐
- Explain in detail the three types of local variables, global variables and static variables
- Thrift入门学习
- Redis之跳跃表
- 文献调研(二):基于短期能源预测的建筑节能性能定量评估
- 贴片机供料器(feeder)飞达的种类,如何工作
- SMT行业AOI,X-RAY,ICT分别是什么?作用是?
- How postman tests interfaces that require login
- Summary of c++ references and pointers
- Efficacy of kiwi fruit enzyme_ Old bear passing by_ Sina blog
- Smt贴片机保养与维护要点
猜你喜欢

懒人教你用猕猴桃一月饱减16斤_过路老熊_新浪博客

ValueError: color kwarg must have one color per data set. 9 data sets and 1 colors were provided

Alipay payment interface sandbox environment test and integration into an SSM e-commerce project

手工制作 pl-2303hx 的USB转TTL电平串口的电路_过路老熊_新浪博客

My blog is 2 years old and 167 days old today. I received the pioneer blogger Badge_ Old bear passing by_ Sina blog

Literature research (II): quantitative evaluation of building energy efficiency performance based on short-term energy prediction

关于运行scrapy项目时提示 ModuleNotFoundError: No module named 'pymongo‘的解决方案

Summary of c++ references and pointers

Notes on the method of passing items from the spider file to the pipeline in the case of a scratch crawler

mysql 主从复制
随机推荐
keil编译运行错误,缺少error:#5:#includecore_cm3.h_过路老熊_新浪博客
Oracle writes a trigger that inserts a piece of data first and updates a field in the data
正则表达式介绍及一些语法
贴片机供料器(feeder)飞达的种类,如何工作
JS中的数字数组去重
常用的几款富文本编辑器
Connecting MySQL database with VBScript_ Old bear passing by_ Sina blog
搜索旋转数组II[抽象二分练习]
用frp搭建云电脑
IDEA常用快捷键
10.2.3、Kylin_kylin的使用,维度必选
mysql 主从复制
Topic36——53. Maximum subarray and
10.4.1 data console
Alipay payment interface sandbox environment test and integration into an SSM e-commerce project
SSM integrated learning notes (mainly ideas)
[wechat official account H5] generates a QR code with parameters to enter the official account attention page to listen to user-defined menu bar for official account events (server)
ValueError: color kwarg must have one color per data set. 9 data sets and 1 colors were provided解决
STEP7主站与远程I/O组网_过路老熊_新浪博客
Transformation of communication protocol between Siemens S7-200PLC and Danfoss inverter_ Old bear passing by_ Sina blog