当前位置：网站首页>Naive Bayes--Study Notes--Basic Principles and Code Implementation

Naive Bayes--Study Notes--Basic Principles and Code Implementation

2022-08-01 09:23:00 【Miracle Fan】

朴素贝叶斯

概述

Bayesis first calculated by probability with the training set,Obtain the prior probability distribution and conditional probability distribution,And for these two probabilities,BayesThe classifier is used to estimate the population with samples（极大似然估计）的思路,Pass these two probability distributions of samples in the training set,Make an overall approximation.也就是计算出

先验概率分布：The possible probability of each sample label,That is, the proportion of each classification sample
under conditions where the label is already known,The conditions under which various properties occur are possible

After knowing these two probabilities,You can use Bayes' theorem
$P(A|B)=\frac{P(AB)}{P(B)}=\frac{P(B|A)P(A)}{P(B)}$
Calculates the label of the predicted sample after the sample features are known.

基本方法

先验概率分布：
$P(Y=c_k),\quad k=1,2,\cdots ,K$
条件概率分布：The naivety of Naive Bayes is also manifested here,“朴素”It is assumed that the different characteristics of the sample are independent of each other
$\begin{aligned} P\left(X=x \mid Y=c_{k}\right) &=P\left(X^{(1)}=x^{(1)}, \cdots, X^{(n)}=x^{(n)} \mid Y=c_{k}\right) \\ &=\prod_{j=1}^{n} P\left(X^{(j)}=x^{(j)} \mid Y=c_{k}\right) \end{aligned}$
So it can be obtained by the multiplication formula $P (X, Y)$ ,Then use Bayes' theorem to find the posterior probability distribution,That is, it is equivalent to knowing the relevant feature attributes of a sample,predict its properties $c_k$ 的概率是多少,So as to achieve the predicted effect.

$\begin{aligned} P\left(Y=C_{\mathrm{k}} \mid X=x\right)&= \frac{P(X=x|Y=c_k)\cdot P(Y=c_k)}{P(X=x)} \\&=\frac{P\left(X=x \mid Y=C_{k}\right) P\left(Y=C_{k}\right)}{\sum_{k} P\left(X=x \mid Y=C_{k}\right) P\left(Y=C_{k}\right)} \\ &=\frac{P\left(Y=C_{k}\right) \prod_{j} P\left(X^{(j)}=x^{(j)} \mid Y=C_{k}\right)}{\sum_{k} P\left(Y=C_{k}\right) \prod_{j} P\left(X^{(j)}=x^{(j)} \mid Y=C_{k}\right)} \end{aligned}$

极大似然估计

Probability distribution for the whole event,It is generally impossible to know the distribution of each type,and in the event of a certain type of occurrence,The probability distribution of the occurrence of each attribute,So we can use maximum likelihood estimation,The probability of predicting the population with a sample.

Maximum likelihood estimates of prior probability distributions：
$P(Y=c_k)=\frac{\sum_{i=1}^nI(y_i=c_K)}{N},\quad k=1,2,\cdots ,K$
Maximum Likelihood Estimation of Conditional Probabilities：
$P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}\quad \\j=1,2, \cdots, n, l=1,2, \cdots,S_{j}, y_{i} \in\{c_{1},c_2 , \cdots , c_{K}\}$

朴素贝叶斯算法 (naive Bayes algorithm)

输入: 训练数据 $T=\left\{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \cdots,\left(x_{N}, y_{N}\right)\right\}$ ,其中 $x_{i}=\left(x_{i}^{(1)}, x_{i}^{(2)}, \cdots, x_{i}^{(n)}\right)^{T}$ , $x_i^{j}$ 是第i个样本的第 j个特征, $x_i^{j}\in{a_{j1},a_{j2},\dots,a_{jS_j}}$ , $a_{jl}$ 是第j个特征可能取得第l个值, $\cdots, n, l=1,2, \cdots,S_{j},$ $y_{i} \in\{c_{1},c_2, \cdots , c_{K}\}$ ; 实例 $x$ ;
输出：实例 $x$ 的分类.

计算先验概率及条件概率

$\begin{array}{l} P\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}{N}, \quad k=1,2, \cdots, K \\ P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j 1}, y_{i}=c_{k}\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)} \\ j=1,2, \cdots, n ; \quad l=1,2, \cdots, S_{j} ; \quad k=1,2, \cdots, K \end{array}$

对于给定的实例 $x=\left(x^{(1)}, x^{(2)}, \cdots, x^{(n)}\right)^{\mathrm{T}}$ , The posterior probability obtained from the above,It can be obtained that its denominator is a combination of conditional probability and prior probability,Equivalent to known parameters,So we can just count the molecules for the final prediction evaluation.

$P\left(Y=c_{k}\right) \prod_{j=1}^{n} P\left(X^{(j)}=x^{(n)} \mid Y=c_{k}\right), \quad k=1,2, \cdots, K$

确定实例 $x$ 的类,It is to calculate which type of sample it is by using known features,Then calculate the index with the largest probability value, that is, which category it belongs to.

$y=\arg \max _{c_{k}} P\left(Y=c_{k}\right) \prod_{j=1}^{n} P\left(X^{(j)}=x^{(j)} \mid Y=c_{k}\right)$

概率密度函数

伯努利模型

Handle boolean features(true和false,或者1和0),使用伯努利模型.

如果特征值为1,那么 $P\left(x_{i} \mid y_{k}\right)=P\left(x_{i}=1 \mid y_{k}\right)$

如果特征值为0,那么 $P\left(x_{i} \mid y_{k}\right)=1-P\left(x_{i}=1 \mid y_{k}\right)$

贝叶斯估计–平滑处理

用极大似然估计可能会出现所要估计的概率值为 0 的情况. 这时会影响到后验概率的计算结果, 使分类产生偏差. 解决这一问题的方法是采用贝叶斯估计. 具体地, 条件概率的贝叶斯估计是
$P_{\lambda}\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)+\lambda}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+S_{j} \lambda}$
式中 $\lambda \geqslant 0$ . 等价于在随机变量各个取值的频数上赋予一个正数 $\lambda>0$ . 当 $\lambda=0$ 时就是极大似然估计. 常取 $\lambda=1$ , 这时称为拉普拉斯平滑 (Laplace smoothing). 显然, 对任何 $\cdots, S_{j}, k=1,2, \cdots, K$ , 有
$P_{\lambda}\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)>0 \\ \sum_{l=1}^{s_{j}} P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=1$
同样, 先验概率的贝叶斯估计是
$P_{\lambda}\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+\lambda}{N+K \lambda}$

GaussianNB 高斯朴素贝叶斯

特征的可能性被假设为高斯

概率密度函数：
$P(x_i | y_k)=\frac{1}{\sqrt{2\pi\sigma^2_{yk}}}exp(-\frac{(x_i-\mu_{yk})^2}{2\sigma^2_{yk}})$

数学期望(mean)： $\mu$

方差： $\sigma^2=\frac{\sum(X-\mu)^2}{N}$

代码实现：

参考：

class NaiveBayes:
    def __init__(self):
        self.model = None
        
    def summarize(self, train_data):
        train_data = np.array(train_data)
        mean = np.mean(train_data, axis=0)
        std = np.std(train_data, axis=0)
        summaries = np.stack((mean, std), axis=1)
        return summaries

    def fit(self, X, y):
        labels = list(set(y))
        data = {
    label: [] for label in labels}
        for f, label in zip(X, y):
            data[label].append(f)
        self.model = {
    label: self.summarize(value) for label, value in data.items()}
        return 'gaussianNB train done!'

        # 高斯概率密度函数

    def gaussian_probability(self, x, mean, stdev):
        exponent = math.exp(-(math.pow(x - mean, 2) /(2 * math.pow(stdev, 2))))
        prod=(1 / (math.sqrt(2 * math.pi) * stdev)) * exponent
        return prod

    def gaussian_probability_np(self, x, summarize):
        x=np.array(x)
        x = x.reshape(x.shape[0], 1)
        mean, std = np.hsplit(summarize, indices_or_sections=2)
        exponent = np.exp(-((x - mean) ** 2 /(2 * (std ** 2))))
        prod = (1 / (np.sqrt(2 * np.pi) * std)) * exponent
        prod=np.prod(prod, axis=0)
        return prod

    # 计算概率
    def calculate_probabilities_np(self, input_data):
        probabilities = {
    }
        for label, value in self.model.items():
            # The initialization weight probability is 1
            probabilities[label] = 1
            # Count how many properties are traversed several times
            probabilities[label] *= self.gaussian_probability_np(input_data, value)
        return probabilities

    def calculate_probabilities(self, input_data):
        probabilities = {
    }
        for label, value in self.model.items():
            # The initialization weight probability is 1
            probabilities[label] = 1
            for i in range(len(value)):
                mean, stdev = value[i]
                probabilities[label] *= self.gaussian_probability(input_data[i], mean, stdev)
        print('math:',probabilities)
        return probabilities

    # 类别
    def predict(self, X_test):
        # {0.0: 2.9680340789325763e-27, 1.0: 3.5749783019849535e-26}
        label = sorted(
            self.calculate_probabilities_np(X_test).items(),
            key=lambda x: x[-1])
        label = label[-1][0]
        return label

    def score(self, X_test, y_test):
        right = 0
        for X, y in zip(X_test, y_test):
            label = self.predict(X)
            if label == y:
                right += 1
        return right / float(len(X_test))

The following is the classification of the dataset using the Bayesian classifier,the final probability obtained

在这里插入图片描述

实例实现

1.IRIS鸢尾花数据集

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import math


# data
def create_data():
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df['label'] = iris.target
    df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
    data = np.array(df.iloc[:, :])

    return data[:, :-1], data[:, -1]


X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = NaiveBayes()
model.fit(X_train, y_train)
score=model.score(X_test, y_test)
print(model.predict([4.4, 3.2, 1.3, 0.2]))
print(score)