当前位置:网站首页>Lunch break train & problem thinking: on multidimensional array statistics of the number of elements
Lunch break train & problem thinking: on multidimensional array statistics of the number of elements
2022-07-24 19:58:00 【Python advanced】
Lunch break train & Problem thinking : Statistics on the number of elements of multidimensional arrays
This article provides a source : 【( This is the back of the moon )】
Inspired :Python Advanced leader Of Inventory 4 The element value in the calculation array is 1 The number method of
It is not easy to sort out official account , Limited personal ability , I'd like to invite you big brothers, big brothers, giant gods to sit down , And set up a * Lunch break train * special column , Used to share the information collected in various ways , Have interesting knowledge of the active discussion process . Change your thinking after intense study , Jump out of the thinking trap , Happy learning .
Series article description :
Series name : The problem thinking or interesting knowledge shared this time
problem
Count the number of elements in a one-dimensional array , There are very simple and perfect methods , What should we do if we need to count the elements in the multidimensional array ?
The process of thinking
In several modules I often use , There is no effective and simple way to directly count the elements in a multidimensional array , Since you can't use the direct method , Change to indirect method , It is much more convenient to convert an array into a one-dimensional array and then make statistics , Here are some examples , For reference only :
Two dimensional array :
Data preparation
Use numpy Fix random seeds , Randomly generate a two-dimensional array .
import numpy as np
import pandas as pd
np.random.seed(2022) # Set random seeds
a = np.random.randint(0, 10, (100, 3)) # 100*3 Two dimensional array of Statistical elements
numpy Is a method without statistical elements , Need to use other methods for Statistics , For the sake of simplicity , Use here collections Under the module of Counter Class for statistics .
from collections import Counter
# Counter Not directly numpy.array Make statistics
# Use .flatten Method spread it flat
Counter(a.flatten())
# Or use .flat Turn into numpy.flatiter Iteratable object
Counter(a.flat)
In terms of statistical elements, it is also easier to think of pandas Of value_counts, because value_counts You can only count the number of elements in a column , You can do it in value_counts Before using melt Method , Convert multi column data to one-dimensional data .
# melt The column after is named value, Right now value Column for value_counts
pd.DataFrame(a).melt().value_counts('value')
The number of each number corresponds to Counter The result is the same .
Multidimensional arrays :
Rule array
np.random.seed(2022) # Set random seeds
a = np.random.randint(0, 10, (10, 3, 4)) # 10*3*4 Three dimensional array of 
stay numpy As long as the composition of the array is very standard , For example, the size of each component is the same , For example, the three-dimensional array generated above is composed of 10 individual 3*4 A two-dimensional array of , have access to .flatten() It is very convenient to convert it into a one-dimensional array , Then count the number of each element , Higher dimensional arrays are similar to .
pandas Whether the array can be converted to pandas The array structure object of has a certain standard , Only two-dimensional arrays can be converted to pandas object , Therefore, it is not allowed to convert multi-dimensional arrays and then make statistics .
Irregular arrays
When multidimensional arrays are irregular , For example, multiple lists are nested in a list , Such array types use numpy.flatten() You can't easily spread it into a one-dimensional array .
list_b = [[[1, 2, 3, 4, [1, 2, 3, 4, [1, 2, 3, 4, [1, 2, 3, 4]], 6, 7], 8, 9, [1, 2, 3, 4, [1, 2, 3, 4]], 6, 7]]]
# You need to set the embedded list as an element
np.array(list_b, dtype='O').flatten()
Such a result obviously can no longer pass Counter Count .
Consider defining a function , Spread out the list :
def flatten(values):
# Call generate before recursion numpy.flatiter Iteration objects
values = np.array(values, dtype='O').flat
for value in values:
if isinstance(value, (list, np.ndarray)):
yield from flatten(value)
else:
yield value Traverse the multidimensional array , If the element is a list or array , Recursively returns , Otherwise, return the currently traversed element , Before starting the traversal, you can convert this array to numpy.flatiter Iteratable object , This meaning can be solved when dealing with numpy.matrix The matrix object has the problem of iterative recursion depth .

The above figure shows that it is not converted to numpy.flatiter The operation of iterative object preprocessing matrix . You can see that the size of the matrix is only 3*3 when , Still report a mistake .
At this point, the defined flatten Function can easily count list_b The number of elements in the :

For such a multidimensional array, try using pandas.value_counts Count , First by calling flatten Function and then make statistics .
pd.DataFrame(list_b).stack().map(flatten).map(tuple).explode().value_counts()
there stack And melt similar , The result is Series object , Then chain call each method 、 Function to complete the statistics of data .
summary
The above is the thinking of the frequently used module for data statistics of multi-dimensional arrays , The breadth and depth of thinking may not be enough , It's just my one-sided words , It's hard to judge the pros and cons , By asking a question, you can turn the brain , Try to tap knowledge , It's also a happy move .
Half day sun and half day rain , Mossy green stage , Flowers are everywhere .
Made on May 10, 2002
边栏推荐
- Classic interview questions of interface testing: what is the difference between session, cookie and token?
- Use of paging assistant PageHelper
- SSL Error: Unable to verify the first certificate
- Look at the interface control devaxpress WinForms - how to customize auxiliary function properties (Part 2)
- 02 | environment preparation: how to install and configure a basic PHP development environment under windows?
- 02 | 环境准备:如何在windows下安装和配置一个基本的php开发环境?
- Valdo2021 - vascular space segmentation in vascular disease detection challenge (I)
- [resolved] CVC datatype valid. 1.2.1: '' is not a valid value for 'ncname'.
- English translation Chinese common dirty words
- Sword finger offer 45. arrange the array into the smallest number
猜你喜欢

Setting up a dual machine debugging environment for drive development (vs2017)

Valdo2021 - vascular space segmentation in vascular disease detection challenge (3)

day 3

从码农转型大音乐家,你只差这些音乐处理工具

day 2

Introduction to fastdfs high availability

Usage and introduction of MySQL binlog

Decision tree_ ID3_ C4.5_ CART

Batch download files from the server to the local

Look at the interface control devaxpress WinForms - how to customize auxiliary function properties (Part 2)
随机推荐
Bypass using the upper limit of the maximum number of regular backtracking
[trial experience of Yuxin micro Wiota ad hoc network protocol development kit] RT thread BSP Software package production
[untitled]
"Six pillars of self esteem" self esteem comes from one's own feelings
Create a life cycle aware MVP architecture
SSL Error: Unable to verify the first certificate
Flink Window&Time 原理
Sword finger offer 42. maximum sum of continuous subarrays
从服务器批量下载文件到本地
Leetcode402 remove K digits
Redis common configuration description
Student achievement management system based on PHP
Know typescript
Are network security and data security indistinguishable? Why is data security important?
Database index: index is not a panacea
Prevent static decompilation, dynamic debugging and plug-in
Day 10 (inheritance, rewriting and use of super)
Functional test of redisgraph multi active design scheme
Cmake series tutorial 2 HelloWorld
strlen函数剖析和模拟实现