当前位置:网站首页>Entropy method to calculate weight
Entropy method to calculate weight
2022-07-03 10:36:00 【Also far away】
One 、 Principle analysis of entropy method
1、 Select data
m Samples , common n Indicators , X i j X_{ij} Xij For the first i i i Of samples j j j The number of indicators , i = 1 , 2 , 3 , . . . m ; j = 1 , 2 , 3... n . i = 1 , 2 , 3 , . . . m ; j = 1 , 2 , 3... n. i=1,2,3,...m;j=1,2,3...n.
2、 Data standardization
When the measurement units and directions of various indicators are not unified , The data needs to be standardized , In order to avoid the meaningless logarithm when calculating entropy , You can add a real number of a smaller order to each value , Such as 0.001
(1) Positive correlation index
X ′ = X i j − M i n ( X i j ) M a x ( X i j ) − M i n ( X i j ) X'= \frac{X_{ij}- Min(X_{ij})}{Max(X_{ij})-Min(X_{ij})} X′=Max(Xij)−Min(Xij)Xij−Min(Xij)
(2) For negative indicators ( The smaller the better indicator )
X ′ = M a x ( X i j ) − X i j M a x ( X i j ) − M i n ( X i j ) X' = \frac{Max(X_{ij})-X_{ij}}{Max(X_{ij})-Min(X_{ij})} X′=Max(Xij)−Min(Xij)Max(Xij)−Xij
3、 Computation first j j j Item... Under this indicator i i i The proportion of samples in this index
Calculate the sample weight :
P i j = X i j ∑ i = 1 n X i j P_{ij} = \frac{X_{ij}}{\sum_{i=1}^nX_{ij}} Pij=∑i=1nXijXij
4、 Computation first j j j The entropy of this index
Calculate the index entropy :
e j = − 1 l n ( m ) ∗ ∑ i = 1 m ( P i j ∗ l n ( P i j ) ) e_j = -\frac{1}{ln(m)}*\sum_{i=1}^m(P_{ij}*ln(P_{ij})) ej=−ln(m)1∗i=1∑m(Pij∗ln(Pij))
, among m Is the number of samples .
5、 Computation first j j j The difference coefficient of each index
The information utility value of an index depends on the information entropy and 1 Difference between , Its value directly affects the size of the weight . The greater the value of information utility , The greater the importance of evaluation , The greater the weight .
d j = 1 − e j d_j=1-e_j dj=1−ej
6、 Calculate the weight of evaluation index
Use entropy method to estimate the weight of each index , Its essence is to use the difference coefficient of the index information to calculate , The higher the coefficient of difference , The greater the importance of evaluation ( Or the greater the weight , The greater the contribution to the evaluation results )
The first j jj The weight of each indicator :
w j = d j ∑ j = 1 m d j w_j=\frac{d_j}{\sum_{j=1}^md_j} wj=∑j=1mdjdj
7、 Calculate the comprehensive score of each sample
z i = ∑ j = 1 m w j x i j z_i=\sum_{j=1}^mw_jx_{ij} zi=j=1∑mwjxij
Two 、 Test cases
1、 This case data set is based on 2012 Mathematical modeling of national college students A Take some data of question as an example
Sample No | Total amino acids | Aspartic acid | Threonine | Serine | Glutamate | Proline |
---|---|---|---|---|---|---|
Grape samples 1 | 2027.96 | 101.22 | 393.42 | 77.61 | 266.6 | 723.88 |
Grape samples 2 | 2128.82 | 64.43 | 140.62 | 71.94 | 39.26 | 1560.97 |
Grape samples 3 | 8397.28 | 108.07 | 222.35 | 173.08 | 67.54 | 7472.28 |
Grape samples 4 | 2144.68 | 79.39 | 133.83 | 158.74 | 156.72 | 1182.23 |
Grape samples 5 | 1844 | 52.28 | 145.09 | 164.05 | 102.43 | 816.08 |
Grape samples 6 | 3434.17 | 68.01 | 102.42 | 75.78 | 80.6 | 2932.76 |
Grape samples 7 | 2391.16 | 65.1 | 267.76 | 239.2 | 208.97 | 1096.28 |
Grape samples 8 | 1950.76 | 72.09 | 345.87 | 44.23 | 176.02 | 962.01 |
2、 Complete code
import numpy as np
import pandas as pd
## Reading data
data=pd.read_csv('redputao.csv',encoding='utf-8',index_col=0)
indicator=data.columns.tolist() ## The number of indicators How many columns
project=data.index.tolist() ## Number of samples How many rows?
value=data.values
print(indicator)
print(project)
print(value)
## Define data standardization functions . In order to avoid the meaningless logarithm when calculating entropy , Translate the data , A constant is added to the standardized data 0.001
def std_data(value,flag):
for i in range(len(indicator)):
if flag[i]=='+':
value[:,i]=(value[:,i]-np.min(value[:,i],axis=0))/(np.max(value[:,i],axis=0)-np.min(value[:,i],axis=0))+0.01
elif flag[i]=='-':
value[:,i]=(np.max(value[:,i],axis=0)-value[:,i])/(np.max(value[:,i],axis=0)-np.min(value[:,i],axis=0))+0.01
return value
# Define entropy function 、 Entropy method calculates the weight of variables
def cal_weight(indicator, project, value):
p = np.array([[0.0 for i in range(len(indicator))] for i in range(len(project))])
# print(p)
for i in range(len(indicator)):
p[:, i] = value[:, i] / np.sum(value[:, i], axis=0)
e = -1 / np.log(len(project)) * sum(p * np.log(p)) # Calculate the entropy
g = 1 - e # Calculate the degree of consistency
w = g / sum(g) # Calculate weight
return w
# Indicates whether each indicator is a positive indicator or a negative indicator
flag=["+","+","+","+","+", "+","+","+","+","+",
"+","+","+","+","+", "+","+","+","+","+",
"+","+","+","+","+", "+","+","+","+","+",
"+","+","+","+","+", "+","+","+","+","+",
"+","+","+","+","+", "+","+","+","+","+",
"+","+","+","+"]
# Call the function to standardize the data , That is, add a positive or negative correlation flag for each data
std_value=std_data(value, flag)
# Call the function to calculate the weight
w = cal_weight(indicator, project, std_value)
w = pd.DataFrame(w, index=data.columns, columns=[' The weight '])
print(w)
# Call the function to get the score
score = np.dot(std_value, w).round(4) # The corresponding data is multiplied by the weight to get the score , Keep the result to four decimal places
score = pd.DataFrame(score, index=data.index, columns=[' Comprehensive score ']).sort_values(by=[' Comprehensive score '], ascending=False)
print(score)
3、 ... and 、 Test case running results
Running results
Four 、 Test forms
1、 Directly copy the above table to Excel
2、csv Download link Extraction code :1234
边栏推荐
- [LZY learning notes dive into deep learning] 3.1-3.3 principle and implementation of linear regression
- Leetcode刷题---278
- Hands on deep learning pytorch version exercise solution-3.3 simple implementation of linear regression
- 多层感知机(PyTorch)
- Boston house price forecast (tensorflow2.9 practice)
- Leetcode skimming ---283
- Jetson TX2 刷机
- LeetCode - 900. RLE iterator
- Ind FHL first week
- Leetcode skimming ---263
猜你喜欢
六、MySQL之数据定义语言(一)
Yolov5 creates and trains its own data set to realize mask wearing detection
Weight decay (pytorch)
High imitation Netease cloud music
Introduction to deep learning linear algebra (pytorch)
8、 Transaction control language of MySQL
C#项目-寝室管理系统(1)
Boston house price forecast (tensorflow2.9 practice)
Data preprocessing - Data Mining 1
七、MySQL之数据定义语言(二)
随机推荐
深度学习入门之线性回归(PyTorch)
Leetcode - the k-th element in 703 data flow (design priority queue)
Hands on deep learning pytorch version exercise solution -- implementation of 3-2 linear regression from scratch
Ind yff first week
Multilayer perceptron (pytorch)
Adaptive Propagation Graph Convolutional Network
Leetcode刷题---202
Data preprocessing - Data Mining 1
Tensorflow - tensorflow Foundation
Matrix calculation of Neural Network Introduction (pytoch)
Realize an online examination system from zero
Leetcode刷题---35
六、MySQL之数据定义语言(一)
LeetCode - 900. RLE iterator
20220606 Mathematics: fraction to decimal
熵值法求权重
Advantageous distinctive domain adaptation reading notes (detailed)
Leetcode - 5 longest palindrome substring
Powshell's set location: unable to find a solution to the problem of accepting actual parameters
Pytorch ADDA code learning notes