当前位置:网站首页>Gaussian and Summary Stats
Gaussian and Summary Stats
2022-06-27 00:11:00 【梦想家DBA】
1.Gaussian Distribution
2.Sample vs Population
3. Test Dataset
4. Central Tendencies
5.Variance
6.Describing a Gaussian
1.2 Gaussian Distribution
Let’s look at a normal distribution. Below is some code to generate and plot an idealized Gaussian distribution.
# generation and plot an idealized gaussian
from numpy import arange
from matplotlib import pyplot
from scipy.stats import norm
# x-axis for the plot
x_axis = arange(-3, 3, 0.001)
# y-axis as the gaussian
y_axis = norm.pdf(x_axis,0,1)
# plot data
pyplot.plot(x_axis, y_axis)
pyplot.show()
1.3 Sample vs Population
The data that we collect is called a data sample, whereas all possible data that could be collected is called the population.
- Data Sample : A subset of observations from a group
- Data Population : All possible observations from a group.
Two examples of data samples that you will encounter in machine learning include:
- The train and test datasets.
- The performance scores for a model.
When using statistical methods, we often want to make claims about the population using only observations in the sample. Two clear examples of this include:
- The training sample must be representative of the population of observations so that we can fit a useful model.
- The test sample must be representative of the population of observations so that we can develop an unbiased evaluation of the model skill.
1.4 Test Dataset
Before we explore some important summary statistics for data with a Gaussian distribution.We can use the randn() NumPy function to generate a sample of random numbers drawn from a Gaussian distribution.
We can then plot the dataset using a histogram and look for the expected shape of the plotted data. The complete example is listed below.
# generate a sample of random gaussians
from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot
# seed the random number generator
seed(1)
# generate univariate observations
data = 5 * randn(10000) + 50
# histogram of generated data
pyplot.hist(data)
pyplot.show()
Example of calculating and plotting the sample of Gaussian random numbers with more bins.
# generate a sample of random gaussians
from numpy.random import seed
from numpy.random import randn
from matplotlib import pyplot
# seed the random number generator
seed(1)
# generate univariate observations
data = 5 * randn(10000) + 50
# histogram of generated data
pyplot.hist(data, bins=100)
pyplot.show()1.5 Central Tendency
The central tendency of a distribution refers to the middle or typical value in the distribution. The most common or most likely value.In the Gaussian distribution, the central tendency is called the mean, or more formally, the arithmetic mean, and is one of the two main parameters that defines any Gaussian distribution.

The example below demonstrates this on the test dataset developed in the previous section.
# calculate the mean of a sample
from numpy.random import seed
from numpy.random import randn
from numpy import mean
# seed the random number generator
seed(1)
# generate univariate observations
data = 5 * randn(10000) + 50
# calculate mean
result = mean(data)
print('Mean: %.3f' % result)
The median is calculated by first sorting all data and then locating the middle value in the sample.
The example below demonstrates this on the test dataset.
# calculate the median of a sample
from numpy.random import seed
from numpy.random import randn
from numpy import median
# seed the random number generator
seed(1)
# generate univariate observations
data = 5 * randn(10000) + 50
# calculate median
result = median(data)
print('Median: %.3f' % result)
1.6 Variance
The variance of a distribution refers to how much on average that observations vary or differ from the mean value. It is useful to think of the variance as a measure of the spread of a distribution. A low variance will have values grouped around the mean.
The complete example is listed below.
# generate and plot gaussians with different variance
from numpy import arange
from matplotlib import pyplot
from scipy.stats import norm
# x-axis for the plot
x_axis = arange(-3, 3, 0.001)
# plot low variance
pyplot.plot(x_axis, norm.pdf(x_axis,0,0.5))
# plot high variance
pyplot.plot(x_axis,norm.pdf(x_axis,0,1))
pyplot.show()Running the example plots two idealized Gaussian distributions: the blue with a low variance grouped around the mean and the orange with a higher variance with more spread.

The variance of a data sample drawn from a Gaussian distribution is calculated as the average squared difference of each observation in the sample from the sample mean:

The example below demonstrates calculating variance on the test problem.
# calculate the variance of a sample
from numpy.random import seed
from numpy.random import randn
from numpy import var
# seed the random number generator
seed(1)
# generate univariate observations
data = 5 * randn(10000) + 50
# calculate variance
result = var(data)
print('Variance: %.3f' % result)
Where the standard deviation is often written as s or as the Greek lowercase letter sigma (σ). The standard deviation can be calculated directly in NumPy for an array via the std() function. The example below demonstrates the calculation of the standard deviation on the test problem.
# calculate the standard deviation of a sample
from numpy.random import seed
from numpy.random import randn
from numpy import std
# seed the randomm number generator
seed(1)
# generate univariance number observations
data = 5 * randn(10000) + 50
# calculate standard deviation
result = std(data)
print('Standard Deviation: %.3f' % result)Running the example calculates and prints the standard deviation of the sample. The value matches the square root of the variance and is very close to 5.0, the value specified in the definition of the problem.

边栏推荐
- 运用物理信息神经网络求解流体力学方程
- 网络中的网络(套娃)
- Is it safe to open a compass account?
- 【Mysql】时间字段默认设置为当前时间
- 【UVM实战 ===> Episode_3 】~ Assertion、Sequence、Property
- Oracle 数据库基本知识概念
- 简单快速的数网络(网络中的网络套娃)
- 这10款文案神器帮你速码,做自媒体还担心写不出文案吗?
- 2022 Health Expo, Shandong health care exhibition, postpartum health and sleep health exhibition
- 论文解读(LG2AR)《Learning Graph Augmentations to Learn Graph Representations》
猜你喜欢

Super hard core! Can the family photo album on Huawei's smart screen be classified automatically and accurately?

Sword finger offer 10- ii Frog jumping on steps

国产框架MindSpore联合山水自然保护中心,寻找、保护「中华水塔」中的宝藏生命

Mindspire, a domestic framework, cooperates with Shanshui nature conservation center to find and protect the treasure life in the "China water tower"

1+1<2 ?! HESIC论文解读

自定义MVC(导成jar包)+与三层架构的区别+反射+面试题

當Transformer遇見偏微分方程求解
![寻找旋转排序数组中的最小值 II[经典抽象二分 + 如何破局左中右三者相等]](/img/75/05d5765588dfde971167fbc72e2aa8.png)
寻找旋转排序数组中的最小值 II[经典抽象二分 + 如何破局左中右三者相等]

剑指 Offer 10- II. 青蛙跳台阶问题

Hit the point! The largest model training collection!
随机推荐
Competition Registration | one of the key ai+ scientific computing competitions - China open source scientific software creativity competition, competing for 100000 bonus!
No clue about complex data?
从位图到布隆过滤器,C#实现
新型冠状病毒变异Delta毒株的模拟(MindSPONGE应用)
一键加速索尼相机SD卡文件的复制操作,文件操作批处理教程
小白看MySQL--windows环境安装MySQL
At present, which securities company is the best and safest to open an account for stock speculation?
Freescale 单片机概述
Oracle 数据库基本知识概念
Network in network (dolls)
test
idea 热启动失效解决方案
Deep learning method for solving mean field game theory problems
Lambda表达式
How do new investors open accounts online? Is it safe to open accounts online and speculate in stocks
目前哪个证券公司炒股开户是最好最安全的?
The most complete hybrid precision training principle in the whole network
Lambda expression
滑环安装有哪些技巧和方法
From bitmap to bloom filter, C # implementation