当前位置:网站首页>R notes mice
R notes mice
2022-07-28 03:48:00 【UQI-LIUWJ】
1 MICE Algorithm theory part
MICE(Multiple Imputation by Chained Equations) It is a robust way to deal with missing data in data sets 、 Informative methods . This process passes through a series of iterative prediction models “ fill ”( Estimate ) Missing data in dataset .
In each iteration , Each specified variable in the dataset is estimated using other variables in the dataset . Constantly iterating in , Until it converges .
1.1 MICE give an example

This process continues until all specified variables are interpolated . If there is no convergence , You can run additional iterations , Although usually no more than 5 This iteration is necessary .
The accuracy of interpolation depends on the information density in the data set . The data set of completely independent variables without correlation will not produce accurate interpolation .
1.2 PMM,Predictive Mean Matching
MICE You can use what is called predictive mean matching (PMM) To select the value to be estimated . PMM Select a data point from the original non missing data , The predicted value of this data point is close to the predicted value of the missing sample .
Choose the closest N Data points as candidate values , Select a value randomly from them to complete .

2 R Language MICE
2.0 Import package
library(magrittr)
library(dplyr)
library(mice)
library(missForest)2.1 Import data
data(iris)
summary(iris)
# Sepal.Length Sepal.Width Petal.Length
# Min. :4.300 Min. :2.000 Min. :1.000
# 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600
# Median :5.800 Median :3.000 Median :4.350
# Mean :5.843 Mean :3.057 Mean :3.758
# 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100
# Max. :7.900 Max. :4.400 Max. :6.900
# Petal.Width Species
# Min. :0.100 setosa :50
# 1st Qu.:0.300 versicolor:50
# Median :1.300 virginica :50
# Mean :1.199
# 3rd Qu.:1.800
# Max. :2.500
2.2 Random loss of a certain amount of data
Randomly generated in the data 10% Of Missing value . At the same time Species This categorical variable is also removed .
iris_mis <- missForest::prodNA(iris, noNA = 0.1) %>% select(-Species)
summary(iris_mis)
# Sepal.Length Sepal.Width Petal.Length
# Min. :4.300 Min. :2.000 Min. :1.000
# 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.500
# Median :5.800 Median :3.000 Median :4.300
# Mean :5.856 Mean :3.049 Mean :3.707
# 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100
# Max. :7.900 Max. :4.200 Max. :6.900
# NA's :16 NA's :15 NA's :7
# Petal.Width
# Min. :0.100
# 1st Qu.:0.300
# Median :1.300
# Mean :1.201
# 3rd Qu.:1.800
# Max. :2.500
# NA's :13 2.3 Visualize missing data
md.pattern(iris_mis)
It means Petal.Length altogether 7 individual missing data , Two of them and the second column Petal.Weight Lose data at the same coordinates ; be left over 5 One only loses data at its own coordinates .
2.4 To complete
imputed_Data <- mice(iris_mis, m=5, maxit = 50, method = 'pmm', seed = 123)- m = 5 , To generate 5 Group filled data
- maxit = 50, The number of iterations per generation of fill data , Here take 50 Time
- method = ‘pmm’, Use 1.2 To introduce the Predictive Mean Matching Methods ( Continuous data is used )
2.5 View the data
Because we generated 5 Group data , So you can check it in groups
completeData <- mice::complete(imputed_Data,2)This one at the back 2 It means which group of data to view
边栏推荐
- STM32 RT thread virtual file system mount operation
- Monotonous stack -- 42. Receiving rain -- a difficult problem that big factories must know
- LeetCode 0140. 单词拆分 II
- 【原型与原型链】初识原型与原型链~
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- AI chief architect 12 AICA Baidu OCR vertical large-scale landing practice
- Container related concepts
- 动态规划——63. 不同路径 II
- 高等数学(第七版)同济大学 习题3-4 个人解答(后8题)
- Appnium--APP自动化测试工具
猜你喜欢
![[force deduction] 1337. Row K with the weakest combat effectiveness in the matrix](/img/6c/b5fd3350886fd74557439f5361e7f8.png)
[force deduction] 1337. Row K with the weakest combat effectiveness in the matrix

Qt:qmessagebox message box, custom signal and slot

How to solve MySQL deep paging problem

Dynamic planning - 62. Different paths

conda虚拟环境总结与解读

WordPress simple mkblog blog theme template v2.1

Light year admin background management system template

C语言:求一个整数存储在内存中的二进制中1的个数

Swift中的协议

Data mining-02
随机推荐
接口自动化测试,完整入门篇
LabVIEW loads and uses custom symbols in tree control projects
Advanced Mathematics (Seventh Edition) Tongji University exercises 3-5 personal solutions
ES6 from getting started to mastering 09: symbol type
递归和非递归分别实现求第n个斐波那契数
搬家通知!
Responsive high-end website template source code Gallery material resource download platform source code
LeetCode_ 409_ Longest palindrome string
Common interface testing tools
Tensorboard usage record
[openvx] VX for basic use of objects_ distribution
C语言力扣第45题之跳跃游戏 II。遍历跳跃
Super easy to use PC end long screenshot tool
leetcode刷题:动态规划08(分割等和子集)
Container related concepts
动态规划——63. 不同路径 II
Weekly recommended short video: how to correctly understand the word "lean"?
做自动化测试,你后悔了吗?
Dynamic programming - 474. One and zero
BRD,MRD,PRD的区别