当前位置:网站首页>R language de duplication operation unique duplicate filter
R language de duplication operation unique duplicate filter
2022-06-30 11:25:00 【Analysis of breeding data】
For example, there is a data frame , According to the first column ID duplicate removal , Delete all duplicate rows , Do you use unique and duplicate function , It could be wrong , Here's a summary .
Analog data
set.seed(123)
dat = data.frame(ID = c(1:10,9,4,4,9,9,2),y = rnorm(16))
dat

demand :
hold ID Remove all duplicate lines .
error 1: use unique function
unique function , Will remove the duplicate ID, Keep non duplicate ID, utilize 1,2,3,1 in 1 It's repetitive , use unique And then there is :1,2,3, instead of 2,3.
therefore , The following steps are wrong .
uid = unique(dat$ID)
dat[dat$ID %in% uid,]

error 2:duplicate function
duplicate Returns the TRUE and FALSE state , Returns a unique value , Instead of removing all duplicates ID Value . similar unique, It's not what we want .
dat[!duplicated(dat$ID),]

correct 1: use filter function
First determine the number of occurrences , extract ID, And then use filter To extract .
uid = dat %>% count(ID) %>% filter(n ==1) %>% select(ID)
uid
dat[dat$ID %in% uid$ID,]

correct 2: use %in%
First use duplicate Print out duplicate ID, And then use filter Just exclude .
uid2 = dat$ID[duplicated(dat$ID)]
uid2
dat %>% filter(!ID %in% uid2)

Complete test code :
set.seed(123)
dat = data.frame(ID = c(1:10,9,4,4,9,9,2),y = rnorm(16))
dat
# Wrong way
dat[!duplicated(dat$ID),]
# The right way 1
uid = unique(dat$ID)
dat[dat$ID %in% uid,]
uid = dat %>% count(ID) %>% filter(n ==1) %>% select(ID)
uid
dat %>% filter(ID %in% uid$ID)
# The right way 2
uid2 = dat$ID[duplicated(dat$ID)]
uid2
dat %>% filter(!ID %in% uid2)
边栏推荐
- LVGL 8.2 Simple Image button
- Create - configure factory
- R语言去重操作unique duplicate filter
- 19年来最艰难的618,徐雷表达三个谢意
- SQL必需掌握的100个重要知识点:使用视图
- 创建型-配置工厂
- Pycharm项目使用pyinstalle打包过程中问题及解决方案
- Retest the cloud native database performance: polardb is still the strongest, while tdsql-c and gaussdb have little change
- LVGL8.2 Simple Checkboxes
- It's time for the kotlin coroutine to schedule thread switching to solve the mystery
猜你喜欢

【IC5000教程】-01-使用daqIDEA图形化debug调试C代码

考研这些“不靠谱”的经验有多害人?

Oceanbase installation Yum source configuration error and Solutions

Introduction to China Mobile oneos development board

Handler source code analysis

阿里云李飞飞:中国云数据库在很多主流技术创新上已经领先国外
![When does the database need to use the index [Hangzhou multi surveyors] [Hangzhou multi surveyors _ Wang Sir]](/img/2a/f07a7006e0259d78d046b30c761764.jpg)
When does the database need to use the index [Hangzhou multi surveyors] [Hangzhou multi surveyors _ Wang Sir]
![[机缘参悟-34]:光锥之内皆命运](/img/3e/9f5630ba382df7f7ce00705445cef8.jpg)
[机缘参悟-34]:光锥之内皆命运

达梦数据冲刺科创板,或成A股市场“国产数据库第一股”

R语言去重操作unique duplicate filter
随机推荐
100 important knowledge points that SQL must master: using stored procedures
单片机 MCU 固件打包脚本软件
[机缘参悟-34]:光锥之内皆命运
OLAP数据库引擎如何选型?
Cp2112 teaching example of using USB to IIC communication
Create - configure factory
Retest the cloud native database performance: polardb is still the strongest, while tdsql-c and gaussdb have little change
promise async和await的方法与使用
[untitled]
100 important knowledge points that SQL must master: insert data
100 important knowledge points that SQL must master: Combined Query
PointDistiller:面向高效紧凑3D检测的结构化知识蒸馏
Deep dive kotlin Xie Cheng (17): Actor
以PolarDB为代表的阿里云数据库以跻身全球第一阵营
【leetcode 239】滑动窗口
Algorithme leetcode 86. Liste des liens séparés
Go语言学习之Switch语句的使用
Lvgl 8.2 picture scaling and rotation
LVGL 8.2 Simple Image button
SQL必需掌握的100个重要知识点:创建和操纵表