当前位置:网站首页>Detailed explanation of groupby function

Detailed explanation of groupby function

2022-06-09 05:36:00 Vergil_ Zsh

One 、 Grouping principle

The core :

1、 Whether the grouping key is an array 、 list 、 Dictionaries 、Series、 function , It can be passed in as long as it is consistent with the axis length of the variable to be grouped groupby Grouping .

2、 Default axis=0 Group by row , Can be specified axis=1 Group columns .

groupby() Grammar format

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, group_keys=True, squeeze=False,  observed=False, **kwargs)

groupby principle

groupby Is in accordance with the XX grouping , For example, press... For a data set A Grouping , The effect is as follows

Use groupby Realization function

import numpy as np
import pandas as pd

data = pd.DataFrame({
    'name': ['Tom', 'Kaggle', 'Litter', 'Sam', 'Sam', 'Sam'],
    'race': ['B', 'C', 'D', 'E', 'B', 'C'],
    'age': [37.0, 61.0, 56.0, 87.0, 58.0, 34.0],
    'signs_of_mental_illness': [True, True, False, False, False, False]
})

data.groupby('race')

  The return result is as follows   Get a call DataFrameGroupBy Things that are ,pandas It cannot be displayed directly You can call list Show it

groupby() The coordination function of
function   Applicable scenario remarks
.mean() mean value
.count() Count
.min() minimum value
.mean().unstack() Calculating mean , Hierarchical indexes of aggregate tables are not stacked
.size() Calculate the group size GroupBy Of size Method , Will return a with the group size Series
.apply()
.agg()

Here's a demonstration .mean() and .count()

# mean()
data.groupby('name')['age'].mean()
# count()
data.groupby('name')['age'].count()
data.groupby('age').count()

  You can also aggregate according to single key multiple columns

#  Single bond multi column aggregation 
data.groupby('name')[['race','age',]].count()

 .agg operation Multiple functions can be selected Sometimes we need both averages , There is a need to count ( You can also take one )

agg As the list

print(data.groupby('name')['age'].agg(['mean']))

print(data.groupby('name')['age'].agg(['mean','count']))

You can also import a dictionary , Take different actions for different columns in the Group

print(data.groupby('race').agg({'age': np.median, 'signs_of_mental_illness': np.mean}))

.apply()

You can use the functions we have created

print('apply Before ')
grouped = data.groupby('name')
for name, group in grouped:
    print(name)
    print(group)
print('\n')
print('apply after ')
print(data.groupby('name').apply(lambda x: x.head(2)))

  The basic introduction of simple operation is completed

Sometimes you need to put another column of the aggregation together And cancel the duplicate value of the key This can be done at this time

The above is the built data , The order time needs to be processed , Here we are going to the month + Days /30, Then on ID Train for de duplication , And put the back Time The results of column calculation are put together

import numpy as np
import pandas as pd


data = pd.read_excel(' Order time forecast 2.xlsx')
def cut_m_d(x):
	return round(x.month + x.day / 30, 2)

data['m_d'] = data['Time'].apply(cut_m_d)
grouped = data.groupby('ID')
#  This step is to re (ID), There will be errors if you don't remove the duplicate 
result = grouped['m_d'].unique()
result2 = result.reset_index()
print(result2)

  Later, more complex operations will be carried out

原网站

版权声明
本文为[Vergil_ Zsh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206090516521899.html