当前位置:网站首页>Introduction to Resampling
Introduction to Resampling
2022-07-05 18:13:00 【Dreamer DBA】
Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the estimate of the population parameter and help to quantify of the estimate.Both data sampling and data resampling are methods that are required in a predictive modeling problem.
- Sampling is an active process of gathering observations with the intent of estimating a population variable.
- Resampling is a methodology of economically using a data sample to improve the accuracy and quantify the uncertainty of a population parameter.
- Resampling methods, make use of a nested resampling method.
1.1 Statistical Sampling
Observations made in a domain represent samples of some broader idealized and unknown population of all possible observation that could be made in the domain.
Sampling consists of selecting some part of the population to observe so that one may estimate something about the whole population.
1.1.1 How to Sample
Some aspects to consider prior to collecting a data sample include:
- Sample Goal
- Population
- Selection Criteria
- Sample Size.
Statistical sampling is a large field of study, but in applied machine learning , there may be three types of sampling that you are likely to use: simple random sampling, systematic sampling, and stratified sampling.
- Simple Random Sampling : Samples are drawn with a uniform probability from the domain.
- Systematic Sampling : Samples are drawn using a pre-specified pattern , such as at intervals
- Stratified Sampling : Samples are drawn within pre-specified categories.
1.1.2 Sampling Errors
Two main types of errors include selection bias and sampling error.
Selection Bias: Caused when the method of drawing observations skews the sample in some way.
Sampling Error: Caused due to the random nature of drawing observations skewing the sample in some way.
1.1.3 Statistical Resampling
Statistical resampling methods are procedures that describe how to economically use available data to estimate a population parameter.Resampling methods are very easy to use.requiring little mathematical knowledege.They are methods that are easy to understand and implement compared to specialized statistical methods that may require deep technical skill in order to select and interpret.
Two commonly used resampling methods that you may encounter are k-fold cross-validation the bootstrap.
- Bootstrap. Samples are drawn from the dataset with replacement.where those instances not drawn into the data sample may be used for the test set.
- k-fold Cross-Validation. A dataset is partitioned into k groups, where each group is given the opportunity
The k-fold cross-validation method specifically lends itself to use in the evaluation of predictive models that are repeatedly trained on one subset of the data and evaluated on a second held-out subset of the data.
Generally, resampling techniques for estimating model performance operate similarly: a subset of samples are used to fit a model and the remaining samples are used to estimate the efficacy of the model. This process is repeated multiple times and the results are aggregated and summarized. The differences in techniques usually center around the method in which subsamples are chosen.
The bootstrap method can be used for the same purpose, but is a more general and simpler method intended for estimating a population parameter.
边栏推荐
- 生词生词生词生词[2]
- Elk log analysis system
- 修复漏洞 - mysql 、es
- 开户复杂吗?网上开户安全么?
- Sophon base 3.1 launched mlops function to provide wings for the operation of enterprise AI capabilities
- 第十届全球云计算大会 | 华云数据荣获“2013-2022十周年特别贡献奖”
- Generate classes from XML schema
- Nacos distributed transactions Seata * * install JDK on Linux, mysql5.7 start Nacos configure ideal call interface coordination (nanny level detail tutorial)
- Electron安装问题
- Delete some elements in the array
猜你喜欢

Daily exercise: a series of dates

Cmake tutorial step1 (basic starting point)

What are the changes in the 2022 PMP Exam?

图片数据不够?我做了一个免费的图像增强软件

Sophon AutoCV:助力AI工业化生产,实现视觉智能感知

《2022中国信创生态市场研究及选型评估报告》发布 华云数据入选信创IT基础设施主流厂商!

Tencent music launched its new product "quyimai", which provides music commercial copyright authorization

第十届全球云计算大会 | 华云数据荣获“2013-2022十周年特别贡献奖”

隐私计算助力数据的安全流通与共享

Star Ring Technology launched transwarp Navier, a data element circulation platform, to help enterprises achieve secure data circulation and collaboration under privacy protection
随机推荐
登录连接 CDB 和 PDB
如何获取飞机穿过雷达两端的坐标
Generate XML schema from class
pytorch yolov5 训练自定义数据
【在优麒麟上使用Electron开发桌面应】
Image classification, just look at me!
Nacos distributed transactions Seata * * install JDK on Linux, mysql5.7 start Nacos configure ideal call interface coordination (nanny level detail tutorial)
mybash
Leetcode exercise - 206 Reverse linked list
南京大学:新时代数字化人才培养方案探讨
Let more young people from Hong Kong and Macao know about Nansha's characteristic cultural and creative products! "Nansha kylin" officially appeared
华夏基金:基金行业数字化转型实践成果分享
Cmake tutorial step1 (basic starting point)
Electron installation problems
写作写作写作写作
使用JMeter录制脚本并调试
Le cours d'apprentissage de la machine 2022 de l'équipe Wunda arrive.
Huaxia Fund: sharing of practical achievements of digital transformation in the fund industry
Sophon CE Community Edition is online, and free get is a lightweight, easy-to-use, efficient and intelligent data analysis tool
Wu Enda team 2022 machine learning course, coming