当前位置：网站首页>Entropy method to calculate weight

Entropy method to calculate weight

2022-07-03 10:36:00 【Also far away】

One 、 Principle analysis of entropy method

1、 Select data
m Samples , common n Indicators , $X_{ij}$ For the first $i$ Of samples $j$ The number of indicators , $i = 1, 2, 3, . . . m; j = 1, 2, 3 . . . n .$

2、 Data standardization
When the measurement units and directions of various indicators are not unified , The data needs to be standardized , In order to avoid the meaningless logarithm when calculating entropy , You can add a real number of a smaller order to each value , Such as 0.001
（1） Positive correlation index
$\frac{X_{ij}- Min(X_{ij})}{Max(X_{ij})-Min(X_{ij})}$

（2） For negative indicators ( The smaller the better indicator )
$\frac{Max(X_{ij})-X_{ij}}{Max(X_{ij})-Min(X_{ij})}$

3、 Computation first $j$ Item... Under this indicator $i$ The proportion of samples in this index
Calculate the sample weight ：
$P_{ij} = \frac{X_{ij}}{\sum_{i=1}^nX_{ij}}$

4、 Computation first $j$ The entropy of this index
Calculate the index entropy ：
$e_j = -\frac{1}{ln(m)}*\sum_{i=1}^m(P_{ij}*ln(P_{ij}))$
, among m Is the number of samples .

5、 Computation first $j$ The difference coefficient of each index
The information utility value of an index depends on the information entropy and 1 Difference between , Its value directly affects the size of the weight . The greater the value of information utility , The greater the importance of evaluation , The greater the weight .
$d_j=1-e_j$

6、 Calculate the weight of evaluation index
Use entropy method to estimate the weight of each index , Its essence is to use the difference coefficient of the index information to calculate , The higher the coefficient of difference , The greater the importance of evaluation （ Or the greater the weight , The greater the contribution to the evaluation results ）
The first j jj The weight of each indicator ：
$w_j=\frac{d_j}{\sum_{j=1}^md_j}$

7、 Calculate the comprehensive score of each sample
$z_i=\sum_{j=1}^mw_jx_{ij}$

Two 、 Test cases

1、 This case data set is based on 2012 Mathematical modeling of national college students A Take some data of question as an example

Sample No	Total amino acids	Aspartic acid	Threonine	Serine	Glutamate	Proline
Grape samples 1	2027.96	101.22	393.42	77.61	266.6	723.88
Grape samples 2	2128.82	64.43	140.62	71.94	39.26	1560.97
Grape samples 3	8397.28	108.07	222.35	173.08	67.54	7472.28
Grape samples 4	2144.68	79.39	133.83	158.74	156.72	1182.23
Grape samples 5	1844	52.28	145.09	164.05	102.43	816.08
Grape samples 6	3434.17	68.01	102.42	75.78	80.6	2932.76
Grape samples 7	2391.16	65.1	267.76	239.2	208.97	1096.28
Grape samples 8	1950.76	72.09	345.87	44.23	176.02	962.01

2、 Complete code

import numpy as np
import pandas as pd

##  Reading data 
data=pd.read_csv('redputao.csv',encoding='utf-8',index_col=0)
indicator=data.columns.tolist()   ##  The number of indicators   How many columns 
project=data.index.tolist()       ##  Number of samples   How many rows? 
value=data.values
print(indicator)
print(project)
print(value)



##  Define data standardization functions . In order to avoid the meaningless logarithm when calculating entropy , Translate the data , A constant is added to the standardized data 0.001
def std_data(value,flag):
    for i in range(len(indicator)):
        if flag[i]=='+':
            value[:,i]=(value[:,i]-np.min(value[:,i],axis=0))/(np.max(value[:,i],axis=0)-np.min(value[:,i],axis=0))+0.01
        elif flag[i]=='-':
            value[:,i]=(np.max(value[:,i],axis=0)-value[:,i])/(np.max(value[:,i],axis=0)-np.min(value[:,i],axis=0))+0.01
    return value


#  Define entropy function 、 Entropy method calculates the weight of variables 
def cal_weight(indicator, project, value):
    p = np.array([[0.0 for i in range(len(indicator))] for i in range(len(project))])
    # print(p)
    for i in range(len(indicator)):
        p[:, i] = value[:, i] / np.sum(value[:, i], axis=0)

    e = -1 / np.log(len(project)) * sum(p * np.log(p))  #  Calculate the entropy 
    g = 1 - e  #  Calculate the degree of consistency 
    w = g / sum(g)  #  Calculate weight 
    return w

#  Indicates whether each indicator is a positive indicator or a negative indicator 
flag=["+","+","+","+","+",    "+","+","+","+","+",
      "+","+","+","+","+",    "+","+","+","+","+",
      "+","+","+","+","+",    "+","+","+","+","+",
      "+","+","+","+","+",    "+","+","+","+","+",
      "+","+","+","+","+",    "+","+","+","+","+",
      "+","+","+","+"]

#  Call the function to standardize the data , That is, add a positive or negative correlation flag for each data 
std_value=std_data(value, flag)

#  Call the function to calculate the weight 
w = cal_weight(indicator, project, std_value)
w = pd.DataFrame(w, index=data.columns, columns=[' The weight '])
print(w)

#  Call the function to get the score 
score = np.dot(std_value, w).round(4)         #  The corresponding data is multiplied by the weight to get the score , Keep the result to four decimal places 
score = pd.DataFrame(score, index=data.index, columns=[' Comprehensive score ']).sort_values(by=[' Comprehensive score '], ascending=False)
print(score)