当前位置：网站首页>神经网络入门(下)

神经网络入门(下)

2022-07-04 07:27:00 【Uncertainty!!】

神经网络入门(下)

声明：本人为小白，第一次学习有关知识，本篇为学习笔记，如有错误，请各位大佬匹配指正!

Observation = Signal + Noise
模型应该适应信号,而不是适应噪音

What is Noise in Machine Learning?

Humans are prone to making mistakes when collecting data, and data collection instruments may be unreliable, resulting in dataset errors. The errors are referred to as noise. Data noise in machine learning can cause problems since the algorithm interprets the noise as a pattern and can start generalizing from it. --摘自:What is Noise in Machine Learning

Machine learning noise detection and removal

PCA attempts to eliminate corrupted data from a signal or picture using preservative noise while maintaining the critical features–摘自:What is Noise in Machine Learning

关于PCA我之前写过一篇笔记，传送门：主成分分析（Principal Component Analysis，PCA）

1.1 过拟合(Overfitting)

过拟合现象(Overfitting)

In mathematical modeling, overfitting is “the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably” --摘自:Overfitting

绿线代表过拟合模型(也就是一个函数)，能够很好符合训练数据，但对训练数据依赖性太高，一旦对不在训练数据中的未知数据进行预测时就会有较大偏差，过拟合模型缺乏泛化能力
黑线代表正则化模型（对过拟合模型的改进，提高泛化能力）

增加一组数据，使得过拟合更加明显

增加数据后，我们重新训练
绘制添加新数据后的图

新训练模型(曲面)如下
黑色点为训练集数据

我们发现有些数据已经不符合事实，这是模型过拟合导致的
如下图所示,当Hours Sleep固定在一个值时，随着Hours Study的增加，TestScore会先减小后增加，这显然不符合现实

1.2 检测过拟合

如何检测模型是否过拟合?
首先我们将数据集分为：训练集和测试集
1.训练集

Your training data is a subset of your dataset that you use to teach a machine learning model to recognize patterns or perform your criteria. --摘自:What is Training Data?

2.测试集

Once your machine learning model is built (with your training data), you need unseen data to test your model. This data is called testing data, and you can use it to evaluate the performance and progress of your algorithms’ training and adjust or optimize it for improved results. --摘自:What is Testing Data?

Testing data has two main criteria. It should:
1.Represent the actual dataset
2.Be large enough to generate meaningful predictions

内容延展：对比数据集(Contrastive dataset)

Assume you need to clean a noisy dataset that includes big background patterns as noise that a data scientist isn’t interested in. Then, using an adaptive noise cancellation approach, this method offers a solution by eliminating the noisy signal. This technique employs two signals: one is the target signal, and the other is a noise-free background signal.–摘自:What is Noise in Machine Learning

傅里叶变换

Researches have already shown that our signal or data has a structure, we can remove noise from it directly. The Fourier Transform of the signal is used to translate the signal into the frequency domain in this process.–摘自:What is Noise in Machine Learning

信号的傅里叶变换常常将信号转到频域,从而去除对应的某个噪音

关于傅里叶变换我之前写过一篇笔记，传送门：傅里叶级数、傅里叶变换、频谱

下图来自LaTeX工作室

原数据集

训练集和测试集

我们通过测试集来检测过拟合