当前位置:网站首页>Introduction to Resampling
Introduction to Resampling
2022-07-05 17:54:00 【梦想家DBA】
Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the estimate of the population parameter and help to quantify of the estimate.Both data sampling and data resampling are methods that are required in a predictive modeling problem.
- Sampling is an active process of gathering observations with the intent of estimating a population variable.
- Resampling is a methodology of economically using a data sample to improve the accuracy and quantify the uncertainty of a population parameter.
- Resampling methods, make use of a nested resampling method.
1.1 Statistical Sampling
Observations made in a domain represent samples of some broader idealized and unknown population of all possible observation that could be made in the domain.
Sampling consists of selecting some part of the population to observe so that one may estimate something about the whole population.
1.1.1 How to Sample
Some aspects to consider prior to collecting a data sample include:
- Sample Goal
- Population
- Selection Criteria
- Sample Size.
Statistical sampling is a large field of study, but in applied machine learning , there may be three types of sampling that you are likely to use: simple random sampling, systematic sampling, and stratified sampling.
- Simple Random Sampling : Samples are drawn with a uniform probability from the domain.
- Systematic Sampling : Samples are drawn using a pre-specified pattern , such as at intervals
- Stratified Sampling : Samples are drawn within pre-specified categories.
1.1.2 Sampling Errors
Two main types of errors include selection bias and sampling error.
Selection Bias: Caused when the method of drawing observations skews the sample in some way.
Sampling Error: Caused due to the random nature of drawing observations skewing the sample in some way.
1.1.3 Statistical Resampling
Statistical resampling methods are procedures that describe how to economically use available data to estimate a population parameter.Resampling methods are very easy to use.requiring little mathematical knowledege.They are methods that are easy to understand and implement compared to specialized statistical methods that may require deep technical skill in order to select and interpret.
Two commonly used resampling methods that you may encounter are k-fold cross-validation the bootstrap.
- Bootstrap. Samples are drawn from the dataset with replacement.where those instances not drawn into the data sample may be used for the test set.
- k-fold Cross-Validation. A dataset is partitioned into k groups, where each group is given the opportunity
The k-fold cross-validation method specifically lends itself to use in the evaluation of predictive models that are repeatedly trained on one subset of the data and evaluated on a second held-out subset of the data.
Generally, resampling techniques for estimating model performance operate similarly: a subset of samples are used to fit a model and the remaining samples are used to estimate the efficacy of the model. This process is repeated multiple times and the results are aggregated and summarized. The differences in techniques usually center around the method in which subsamples are chosen.
The bootstrap method can be used for the same purpose, but is a more general and simpler method intended for estimating a population parameter.
边栏推荐
猜你喜欢
Unicode processing in response of flash interface
Image classification, just look at me!
使用QT遍历Json文档及搜索子对象
Leetcode exercise - 206 Reverse linked list
JVM third talk -- JVM performance tuning practice and high-frequency interview question record
星环科技数据安全管理平台 Defensor重磅发布
ISPRS2022/雲檢測:Cloud detection with boundary nets基於邊界網的雲檢測
Wu Enda team 2022 machine learning course, coming
神经网络自我认知模型
Sophon Base 3.1 推出MLOps功能,为企业AI能力运营插上翅膀
随机推荐
Huaxia Fund: sharing of practical achievements of digital transformation in the fund industry
Thesis reading_ Chinese NLP_ LTP
Leetcode daily question: merge two ordered arrays
EasyCVR平台通过接口编辑通道出现报错“ID不能为空”,是什么原因?
Action avant ou après l'enregistrement du message teamcenter
南京大学:新时代数字化人才培养方案探讨
LeetCode每日一题:合并两个有序数组
Nanjing University: Discussion on the training program of digital talents in the new era
flask接口响应中的中文乱码(unicode)处理
消除`if()else{ }`写法
图扑软件数字孪生 | 基于 BIM 技术的可视化管理系统
Image classification, just look at me!
Configure pytorch environment in Anaconda - win10 system (small white packet meeting)
2022新版PMP考试有哪些变化?
删除数组中的某几个元素
Teamcenter 消息注册前操作或後操作
Anaconda中配置PyTorch环境——win10系统(小白包会)
The comprehensive competitiveness of Huawei cloud native containers ranks first in China!
数值计算方法 Chapter8. 常微分方程的数值解
Wu Enda team 2022 machine learning course, coming