当前位置:网站首页>Detailed explanation of groupby function
Detailed explanation of groupby function
2022-06-09 05:36:00 【Vergil_ Zsh】
One 、 Grouping principle
The core :
1、 Whether the grouping key is an array 、 list 、 Dictionaries 、Series、 function , It can be passed in as long as it is consistent with the axis length of the variable to be grouped groupby Grouping .
2、 Default axis=0 Group by row , Can be specified axis=1 Group columns .
groupby() Grammar format
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, group_keys=True, squeeze=False, observed=False, **kwargs)groupby principle
groupby Is in accordance with the XX grouping , For example, press... For a data set A Grouping , The effect is as follows

Use groupby Realization function
import numpy as np
import pandas as pd
data = pd.DataFrame({
'name': ['Tom', 'Kaggle', 'Litter', 'Sam', 'Sam', 'Sam'],
'race': ['B', 'C', 'D', 'E', 'B', 'C'],
'age': [37.0, 61.0, 56.0, 87.0, 58.0, 34.0],
'signs_of_mental_illness': [True, True, False, False, False, False]
})
data.groupby('race')
The return result is as follows Get a call DataFrameGroupBy Things that are ,pandas It cannot be displayed directly You can call list Show it

| function | Applicable scenario | remarks |
| .mean() | mean value | |
| .count() | Count | |
| .min() | minimum value | |
| .mean().unstack() | Calculating mean , Hierarchical indexes of aggregate tables are not stacked | |
| .size() | Calculate the group size | GroupBy Of size Method , Will return a with the group size Series |
| .apply() | ||
| .agg() |
Here's a demonstration .mean() and .count()
# mean()
data.groupby('name')['age'].mean()
# count()
data.groupby('name')['age'].count()
data.groupby('age').count()
You can also aggregate according to single key multiple columns
# Single bond multi column aggregation
data.groupby('name')[['race','age',]].count()
.agg operation Multiple functions can be selected Sometimes we need both averages , There is a need to count ( You can also take one )
agg As the list
print(data.groupby('name')['age'].agg(['mean']))
print(data.groupby('name')['age'].agg(['mean','count']))
You can also import a dictionary , Take different actions for different columns in the Group
print(data.groupby('race').agg({'age': np.median, 'signs_of_mental_illness': np.mean}))
.apply()
You can use the functions we have created
print('apply Before ')
grouped = data.groupby('name')
for name, group in grouped:
print(name)
print(group)
print('\n')
print('apply after ')
print(data.groupby('name').apply(lambda x: x.head(2)))
The basic introduction of simple operation is completed
Sometimes you need to put another column of the aggregation together And cancel the duplicate value of the key This can be done at this time

The above is the built data , The order time needs to be processed , Here we are going to the month + Days /30, Then on ID Train for de duplication , And put the back Time The results of column calculation are put together
import numpy as np
import pandas as pd
data = pd.read_excel(' Order time forecast 2.xlsx')
def cut_m_d(x):
return round(x.month + x.day / 30, 2)
data['m_d'] = data['Time'].apply(cut_m_d)
grouped = data.groupby('ID')
# This step is to re (ID), There will be errors if you don't remove the duplicate
result = grouped['m_d'].unique()
result2 = result.reset_index()
print(result2)
Later, more complex operations will be carried out
边栏推荐
- Morsel driven parallelism: a NUMA aware parallel query execution framework
- Gradient accumulation setting for pytorch DDP acceleration
- lambda匿名函数
- Product weekly report issue 28 | CSDN editor upgrade, adding the function of inserting existing videos
- Source code analysis of reentrantreadwritelock of AQS
- Gstreamer应用开发实战指南(三)
- seaweedfs-client适配高版本的seaweedfs服务
- Alibaba cloud AI training camp -sql basics 6: test questions
- Windows10 installs both MySQL 5 and MySQL 8
- latex中\cdots后面接上句子,后面的句子格式会乱怎么回事。
猜你喜欢

Apache devlake code base guide

WAMP环境搭建(apache+mysql+php)

Gstreamer应用开发实战指南(一)

MySQL add field or create table SQL statement

Mysql5 available clusters

Yolov5-6.0 series | yolov5 module design

Product weekly report issue 29 | creation center optimization: the sending assistant adds the quality score detection function, and the blog adds the historical version of the content

Differences between tinyint and int

Practical guide to GStreamer application development (III)

Ecmascript6.0 Basics
随机推荐
Heqibao's trip to Chongqing ~
pytorch with Automatic Mixed Precision(AMP)
Palindrome linked list leetcode
Web page capture software
Gradient accumulation setting for pytorch DDP acceleration
Source code analysis of reentrantreadwritelock of AQS
Unbutu 安装FFmpeg的两种方法
synchronized 详细解析
SET DECIMAL_ V2=false and UDF error: cannot divide decimal by zero and incompatible return types decimal
AI video cloud: a good wife in the era of we media
Seaweedfs client adapts to the higher version of seaweedfs service
arthas-boot
Matlab - polynomial and function
Gstreamer应用开发实战指南(四)
In latex, \cdots is followed by a sentence. What's wrong with the format of the following sentence.
CSV file reading (V3 & V5)
微信小程序wx.getLocation定位错误信息汇总
Alibaba cloud AI training camp - machine learning 2:xgboost
Notes on index building and search execution in Lucene
关于istream输入流对象cin的输入函数