当前位置：网站首页>[函数文档] torch.histc 与 paddle.histogram 与 numpy.histogram

[函数文档] torch.histc 与 paddle.histogram 与 numpy.histogram

2022-07-28 04:39:00 【氵文大师】

1. torch.histc

摘自：
https://pytorch.org/docs/stable/generated/torch.histc.html

torch.histc(input, bins=100, min=0, max=0, *, out=None) → Tensor

用来计算张量的直方图
元素被分类为 min 和 max 之间相等宽度的单元格。如果 min 和 max 均为零，则使用数据的最小值和最大值。

小于 min 值和大于 max 的元素将被忽略

参数

input（Tensor）——输入张量。
bins（int）——直方图箱数
min（Scalar）——范围的下限（包括）
max（Scalar）——范围的上限（包括）
out（Tensor ，可选）–输出张量。

返回值

直方图用张量表示

返回类型

Tensor

例子

import torch

input_tensor = torch.tensor([1., 2, 1, 2.5])
res = torch.histc(input_tensor, bins=4, min=0, max=3)
print(res)

tensor([0., 2., 1., 1.])

例子图解

在这里插入图片描述

2. paddle.histogram

和

摘自：
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/histogram_cn.html#histogram

paddle.histogram(input, bins=100, min=0, max=0)

计算输入张量的直方图。
以min和max为range边界，将其均分成bins个直条，然后将排序好的数据划分到各个直条(bins)中。
如果min和max都为0，则利用数据中的最大最小值作为边界。

参数

input (Tensor) - 输入Tensor。维度为多维，数据类型为int32、int64、float32或float64。
bins (int) - 直方图 bins(直条)的个数，默认为100。
min (int) - range的下边界(包含)，默认为0。
max (int) - range的上边界(包含)，默认为0。

Tensor，数据为int64类型，维度为(nbins,)。

代码示例

import paddle

inputs = paddle.to_tensor([1, 2, 1])
result = paddle.histogram(inputs, bins=4, min=0, max=3)
print(result) # [0, 2, 1, 0]

3. numpy.histogram

hist, bin_edges = numpy.histogram(a, bins=10, 
                                     range=None, 
                                     normed=None, 
                                     weights=None, 
                                     density=None)

a是待统计数据的数组
bins指定统计的区间个数或者 直接传入区间
range是一个长度为2的元组，表示统计范围的最小值和最大值，默认值None，表示范围由数据的范围决定 (a.min(), a.max())
weights为数组的每个元素指定了权值,histogram()会对区间中数组所对应的权值进行求和
density为True时，返回每个区间的概率密度；为False返回每个区间中元素的个数
normed 已经被废弃，不应该被使用

直接来几个例子吧：

import numpy as np
hist, bin_edges = np.histogram([1., 2, 1, 2.5], 4, (0, 3))

hist
# Out: array([0, 2, 1, 1], dtype=int64)

bin_edges
# Out: array([0. , 0.75, 1.5 , 2.25, 3. ])

bin_edges 是(0,3)划分的4个区间，hist 是每个区间的数量，和上边的torch.histc 和 paddle.histogram 的用法相同

import numpy as np

input_ = np.arange(5)
bins = np.array([0., 0.4, 0.8, 1.9])
hist, bin_edges = np.histogram(input_, bins=bins)

bin_edges
# Out: array([0. , 0.4, 0.8, 1.9])

hist
# Out: array([1, 0, 1], dtype=int64)

直接将 bins 这个区间传入，可以看出 bin_edges == bins，hist 是对应区间的数量，超过最大值最小值的元素被忽略

import numpy as np

input_ = np.random.random(5,) * 5
# Out: array([2.27585698, 0.32795885, 3.16672458, 4.55222666, 3.71125298])

hist, bin_edges = np.histogram(input_, bins=5, range=(0, 6), density=True)
hist
# Out: array([0.16666667 0.16666667 0.16666667 0.33333333 0. ])

hist, bin_edges = np.histogram(input_, bins=5, range=(0, 6), density=False)
hist
# Out: array([1, 1, 1, 2, 0], dtype=int64)

bin_edges
# Out: array([0. , 1.2, 2.4, 3.6, 4.8, 6. ])

可以看出 density=False 时，hist 依旧是数量，density=True 时，hist变成了密度，积分之后为1

最后看看 weights 参数怎么用：

import numpy as np

input_ = np.array([2.27585698, 0.32795885, 3.16672458, 4.55222666, 3.71125298])
weight = np.array([1, 1, 2, 2, 3])
hist, bin_edges = np.histogram(input_, bins=5, range=(0, 6), weights=weight)

bin_edges
# Out: array([0. , 1.2, 2.4, 3.6, 4.8, 6. ])

hist
# Out: array([1, 1, 2, 5, 0])

weights 参数直接取消了每个权重都是1的情况，所以weights 的shape应该与输入向量的shape相同

在官方文档还有这样一个例子：

>>> a = np.arange(5)
>>> hist, bin_edges = np.histogram(a, density=True)
>>> hist
array([0.5, 0. , 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. , 0.5])
>>> hist.sum()
2.4999999999999996
>>> np.sum(hist * np.diff(bin_edges))
1.0