当前位置:网站首页>R language de duplication operation unique duplicate filter
R language de duplication operation unique duplicate filter
2022-06-30 11:25:00 【Analysis of breeding data】
For example, there is a data frame , According to the first column ID duplicate removal , Delete all duplicate rows , Do you use unique and duplicate function , It could be wrong , Here's a summary .
Analog data
set.seed(123)
dat = data.frame(ID = c(1:10,9,4,4,9,9,2),y = rnorm(16))
dat

demand :
hold ID Remove all duplicate lines .
error 1: use unique function
unique function , Will remove the duplicate ID, Keep non duplicate ID, utilize 1,2,3,1 in 1 It's repetitive , use unique And then there is :1,2,3, instead of 2,3.
therefore , The following steps are wrong .
uid = unique(dat$ID)
dat[dat$ID %in% uid,]

error 2:duplicate function
duplicate Returns the TRUE and FALSE state , Returns a unique value , Instead of removing all duplicates ID Value . similar unique, It's not what we want .
dat[!duplicated(dat$ID),]

correct 1: use filter function
First determine the number of occurrences , extract ID, And then use filter To extract .
uid = dat %>% count(ID) %>% filter(n ==1) %>% select(ID)
uid
dat[dat$ID %in% uid$ID,]

correct 2: use %in%
First use duplicate Print out duplicate ID, And then use filter Just exclude .
uid2 = dat$ID[duplicated(dat$ID)]
uid2
dat %>% filter(!ID %in% uid2)

Complete test code :
set.seed(123)
dat = data.frame(ID = c(1:10,9,4,4,9,9,2),y = rnorm(16))
dat
# Wrong way
dat[!duplicated(dat$ID),]
# The right way 1
uid = unique(dat$ID)
dat[dat$ID %in% uid,]
uid = dat %>% count(ID) %>% filter(n ==1) %>% select(ID)
uid
dat %>% filter(ID %in% uid$ID)
# The right way 2
uid2 = dat$ID[duplicated(dat$ID)]
uid2
dat %>% filter(!ID %in% uid2)
边栏推荐
- 19年来最艰难的618,徐雷表达三个谢意
- Ant financial's written test question: what can be quantified in the requirements document? [Hangzhou multi tester] [Hangzhou multi tester \wang Sir]
- 以PolarDB为代表的阿里云数据库以跻身全球第一阵营
- 10天学会flutter DAY10 flutter 玩转 动画与打包
- HMS Core音频编辑服务3D音频技术,助力打造沉浸式听觉盛宴
- 我们公司使用 7 年的这套通用解决方案,打通了几十个系统,稳的一批!
- SQL必需掌握的100个重要知识点:使用表别名
- Shutter from zero 004 button assembly
- 孔松(信通院)-数字化时代云安全能力建设及趋势
- 基于HAL库的LED驱动库
猜你喜欢

win10 R包安装报错:没有安装在arch=i386

It's time for the kotlin coroutine to schedule thread switching to solve the mystery

Handler source code analysis

ArrayList and sequence table

19年来最艰难的618,徐雷表达三个谢意

8行代码实现快速排序,简单易懂图解!

Qualcomm released the "magic mirror" of the Internet of things case set, and digital agriculture has become a reality

Multiparty Cardinality Testing for Threshold Private Set-2021:解读

关于IP定位查询接口的测评Ⅲ

8 lines of code to achieve quick sorting, easy to understand illustrations!
随机推荐
Handler source code analysis
Esp32-c3 introductory tutorial basic part ⑪ - reading and writing non-volatile storage (NVS) parameters
Line generation (Gauss elimination method, linear basis)
Esp32-c3 introductory tutorial basic part ⑫ - mass production burning device configuration and serial number, NVS partition confirmation, NVS partition generation program, CSV to bin
优惠券种类那么多,先区分清楚再薅羊毛!
Lvgl 8.2 picture scaling and rotation
【leetcode 16】三数之和
100 important knowledge points that SQL must master: grouping data
博弈论入门
LeetCode Algorithm 86. 分隔鏈錶
【无标题】
数学(快速幂)
Esp32-c3 introductory tutorial question ⑨ - core 0 panic 'ed (load access fault) Exception was unhandled. vfprintf. c:1528
win10 R包安装报错:没有安装在arch=i386
MCU firmware packaging Script Software
Handler-源码分析
go语言defer
基于HAL库的LED驱动库
高通发布物联网案例集 “魔镜”、数字农业已经成为现实
datax - 艰难debug路