当前位置:网站首页>Introduction to Resampling
Introduction to Resampling
2022-07-05 17:54:00 【梦想家DBA】
Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the estimate of the population parameter and help to quantify of the estimate.Both data sampling and data resampling are methods that are required in a predictive modeling problem.
- Sampling is an active process of gathering observations with the intent of estimating a population variable.
- Resampling is a methodology of economically using a data sample to improve the accuracy and quantify the uncertainty of a population parameter.
- Resampling methods, make use of a nested resampling method.
1.1 Statistical Sampling
Observations made in a domain represent samples of some broader idealized and unknown population of all possible observation that could be made in the domain.
Sampling consists of selecting some part of the population to observe so that one may estimate something about the whole population.
1.1.1 How to Sample
Some aspects to consider prior to collecting a data sample include:
- Sample Goal
- Population
- Selection Criteria
- Sample Size.
Statistical sampling is a large field of study, but in applied machine learning , there may be three types of sampling that you are likely to use: simple random sampling, systematic sampling, and stratified sampling.
- Simple Random Sampling : Samples are drawn with a uniform probability from the domain.
- Systematic Sampling : Samples are drawn using a pre-specified pattern , such as at intervals
- Stratified Sampling : Samples are drawn within pre-specified categories.
1.1.2 Sampling Errors
Two main types of errors include selection bias and sampling error.
Selection Bias: Caused when the method of drawing observations skews the sample in some way.
Sampling Error: Caused due to the random nature of drawing observations skewing the sample in some way.
1.1.3 Statistical Resampling
Statistical resampling methods are procedures that describe how to economically use available data to estimate a population parameter.Resampling methods are very easy to use.requiring little mathematical knowledege.They are methods that are easy to understand and implement compared to specialized statistical methods that may require deep technical skill in order to select and interpret.
Two commonly used resampling methods that you may encounter are k-fold cross-validation the bootstrap.
- Bootstrap. Samples are drawn from the dataset with replacement.where those instances not drawn into the data sample may be used for the test set.
- k-fold Cross-Validation. A dataset is partitioned into k groups, where each group is given the opportunity
The k-fold cross-validation method specifically lends itself to use in the evaluation of predictive models that are repeatedly trained on one subset of the data and evaluated on a second held-out subset of the data.
Generally, resampling techniques for estimating model performance operate similarly: a subset of samples are used to fit a model and the remaining samples are used to estimate the efficacy of the model. This process is repeated multiple times and the results are aggregated and summarized. The differences in techniques usually center around the method in which subsamples are chosen.
The bootstrap method can be used for the same purpose, but is a more general and simpler method intended for estimating a population parameter.
边栏推荐
- 职场进阶指南:大厂人必看书籍推荐
- LeetCode 练习——206. 反转链表
- 论文阅读_医疗NLP模型_ EMBERT
- Redis Foundation
- Leetcode daily question: merge two ordered arrays
- Easynmon Usage Summary
- Compared with the loss of Wenxin, the performance is improved a lot
- Wu Enda team 2022 machine learning course, coming
- 含重复元素取不重复子集[如何取子集?如何去重?]
- 使用QT遍历Json文档及搜索子对象
猜你喜欢

ISPRS2022/雲檢測:Cloud detection with boundary nets基於邊界網的雲檢測

Sophon AutoCV:助力AI工业化生产,实现视觉智能感知

Leetcode daily question: the first unique character in the string

图扑软件数字孪生 | 基于 BIM 技术的可视化管理系统

Sophon kg upgrade 3.1: break down barriers between data and liberate enterprise productivity

LeetCode 练习——206. 反转链表

nacos -分布式事务-Seata** linux安装jdk ,mysql5.7启动nacos配置ideal 调用接口配合 (保姆级细节教程)

Cmake tutorial Step4 (installation and testing)

「运维有小邓」用于云应用程序的单点登录解决方案

基于YOLOv3的口罩佩戴检测
随机推荐
GIMP 2.10教程「建议收藏」
Matlab reference
图像分类,看我就够啦!
EasyCVR平台通过接口编辑通道出现报错“ID不能为空”,是什么原因?
matlab内建函数怎么不同颜色,matlab分段函数不同颜色绘图
What are the requirements for PMP certification? How much is it?
What are the changes in the 2022 PMP Exam?
To solve the stubborn problem of Lake + warehouse hybrid architecture, xinghuan Technology launched an independent and controllable cloud native Lake warehouse integrated platform
通过SOCKS代理渗透整个内网
PMP认证需具备哪些条件啊?费用多少啊?
论文阅读_医疗NLP模型_ EMBERT
How to improve the thermal management in PCB design with the effective placement of thermal through holes?
Cmake tutorial step1 (basic starting point)
小白入门NAS—快速搭建私有云教程系列(一)[通俗易懂]
Is it safe to open an account online? What is the general interest rate of securities financing?
每日一练:关于日期的一系列
Zabbix
QT console printout
Generate XML schema from class
EasyCVR接入设备开启音频后,视频无法正常播放是什么原因?