当前位置：网站首页>ML's shap: Based on the adult census income binary prediction data set (whether the predicted annual income exceeds 50K), use the shap decision diagram combined with the lightgbm model to realize the

ML's shap: Based on the adult census income binary prediction data set (whether the predicted annual income exceeds 50K), use the shap decision diagram combined with the lightgbm model to realize the

2022-07-07 05:58:00 【A Virgo procedural ape】

ML And shap： be based on adult Census income two classification forecast data set ( Whether the predicted annual income exceeds 50k) utilize shap Decision diagram combination LightGBM A detailed introduction to the case of outlier detection based on the model

Catalog

be based on adult Census income two classification forecast data set ( Whether the predicted annual income exceeds 50k) utilize shap Decision diagram combination LightGBM A detailed introduction to the case of outlier detection based on the model

# 1、 Define datasets

# 2、 Data set preprocessing

# 2.1、 Preliminary screening of modeling features

# 2.2、 Target feature binarization

# 2.3、 Category feature coding digitization

# 2.4、 Separate features from labels

#3、 Model training and reasoning

# 3.1、 Data set segmentation

# 3.2、 Model building and training

# 3.3、 Model to predict

# 4、 utilize shap Decision graph for outlier detection

# 4.1、 A small part of the original data and the preprocessed data are sampled respectively

# 4.2、 establish Explainer And calculate SHAP value

# 4.3、shap Visualization of decision diagram

be based on adult Census income two classification forecast data set ( Whether the predicted annual income exceeds 50k) utilize shap Decision diagram combination LightGBM A detailed introduction to the case of outlier detection based on the model

# 1、 Define datasets

age

workclass

fnlwgt

education

education_num

marital_status

occupation

relationship

race

sex

capital_gain

capital_loss

hours_per_week

native_country

salary

State-gov

77516

Bachelors

Never-married

Adm-clerical

Not-in-family

White

Male

2174

United-States

<=50K

Self-emp-not-inc

83311

Bachelors

Married-civ-spouse

Exec-managerial

Husband

White

Male

United-States

<=50K

Private

215646

HS-grad

Divorced

Handlers-cleaners

Not-in-family

White

Male

United-States

<=50K

Private

234721

11th

Married-civ-spouse

Handlers-cleaners

Husband

Black

Male

United-States

<=50K

Private

338409

Bachelors

Married-civ-spouse

Prof-specialty

Wife

Black

Female

Cuba

<=50K

Private

284582

Masters

Married-civ-spouse

Exec-managerial

Wife

White

Female

United-States

<=50K

Private

160187

9th

Married-spouse-absent

Other-service

Not-in-family

Black

Female

Jamaica

<=50K

Self-emp-not-inc

209642

HS-grad

Married-civ-spouse

Exec-managerial

Husband

White

Male

United-States

>50K

Private

45781

Masters

Never-married

Prof-specialty

Not-in-family

White

Female

14084

United-States

>50K

Private

159449

Bachelors

Married-civ-spouse

Exec-managerial

Husband

White

Male

5178

United-States

>50K

# 2、 Data set preprocessing

# 2.1、 Preliminary screening of modeling features

df.columns
14

# 2.2、 Target feature binarization

# 2.3、 Category feature coding digitization

	age	workclass	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	salary
0	39	7	13	4	1	1	4	1	2174	0	40	39	0
1	50	6	13	2	4	0	4	1	0	0	13	39	0
2	38	4	9	0	6	1	4	1	0	0	40	39	0
3	53	4	7	2	6	0	2	1	0	0	40	39	0
4	28	4	13	2	10	5	2	0	0	0	40	5	0
5	37	4	14	2	4	5	4	0	0	0	40	39	0
6	49	4	5	3	8	1	2	0	0	0	16	23	0
7	52	6	9	2	4	0	4	1	0	0	45	39	1
8	31	4	14	4	10	1	4	0	14084	0	50	39	1
9	42	4	13	2	4	0	4	1	5178	0	40	39	1

# 2.4、 Separate features from labels

age	workclass	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country
39	7	13	4	1	1	4	1	2174	0	40	39
50	6	13	2	4	0	4	1	0	0	13	39
38	4	9	0	6	1	4	1	0	0	40	39
53	4	7	2	6	0	2	1	0	0	40	39
28	4	13	2	10	5	2	0	0	0	40	5
37	4	14	2	4	5	4	0	0	0	40	39
49	4	5	3	8	1	2	0	0	0	16	23
52	6	9	2	4	0	4	1	0	0	45	39
31	4	14	4	10	1	4	0	14084	0	50	39
42	4	13	2	4	0	4	1	5178	0	40	39

salary
0
0
0
0
0
0
0
1
1
1

#3、 Model training and reasoning

# 3.1、 Data set segmentation

X_test

	age	workclass	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country
1342	47	3	10	0	1	1	4	1	0	0	40	35
1338	71	3	13	0	13	3	4	0	2329	0	16	35
189	58	6	16	2	10	0	4	1	0	0	1	35
1332	23	3	9	4	7	1	2	1	0	0	35	35
1816	46	2	9	2	3	0	4	1	0	1902	40	35
1685	37	3	9	2	4	0	4	1	0	1902	45	35
657	34	3	9	2	3	0	4	1	0	0	45	35
1846	21	0	10	4	0	3	4	0	0	0	40	35
554	33	1	11	0	3	4	2	0	0	0	40	35
1963	49	3	13	2	12	0	4	1	0	0	50	35

# 3.2、 Model building and training

params = {
    "max_bin": 512, "learning_rate": 0.05,
    "boosting_type": "gbdt", "objective": "binary",
    "metric": "binary_logloss", "verbose": -1,
     "min_data": 100, "random_state": 1,
    "boost_from_average": True, "num_leaves": 10 }

LGBMC = lgb.train(params, lgbD_train, 10000, 
                  valid_sets=[lgbD_test], 
                  early_stopping_rounds=50, 
                  verbose_eval=1000)

# 3.3、 Model to predict

	age	workclass	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	y_test_predi	y_test
1342	47	3	10	0	1	1	4	1	0	0	40	35	0.045225575	0
1338	71	3	13	0	13	3	4	0	2329	0	16	35	0.074799172	0
189	58	6	16	2	10	0	4	1	0	0	1	35	0.30014332	1
1332	23	3	9	4	7	1	2	1	0	0	35	35	0.003966427	0
1816	46	2	9	2	3	0	4	1	0	1902	40	35	0.363861294	0
1685	37	3	9	2	4	0	4	1	0	1902	45	35	0.738628671	1
657	34	3	9	2	3	0	4	1	0	0	45	35	0.376412174	0
1846	21	0	10	4	0	3	4	0	0	0	40	35	0.002309884	0
554	33	1	11	0	3	4	2	0	0	0	40	35	0.060345836	1
1963	49	3	13	2	12	0	4	1	0	0	50	35	0.703506366	1

# 4、 utilize shap Decision graph for outlier detection

# 4.1、 A small part of the original data and the preprocessed data are sampled respectively

# 4.2、 establish Explainer And calculate SHAP value

shap2exp.values.shape (100, 12, 2) 
 [[[-5.97178729e-01  5.97178729e-01]
  [-5.18879297e-03  5.18879297e-03]
  [ 1.70566444e-01 -1.70566444e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 6.58794799e-02 -6.58794799e-02]
  [ 0.00000000e+00  0.00000000e+00]]

 [[-4.45574118e-01  4.45574118e-01]
  [-1.00665452e-03  1.00665452e-03]
  [-8.12237233e-01  8.12237233e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 8.56381961e-01 -8.56381961e-01]
  [ 0.00000000e+00  0.00000000e+00]]

 [[-3.87412165e-01  3.87412165e-01]
  [ 1.52848351e-01 -1.52848351e-01]
  [-1.02755954e+00  1.02755954e+00]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 1.10240434e+00 -1.10240434e+00]
  [ 0.00000000e+00  0.00000000e+00]]

 ...

 [[-5.28928223e-01  5.28928223e-01]
  [ 7.14116015e-03 -7.14116015e-03]
  [-8.82241728e-01  8.82241728e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 7.47521189e-02 -7.47521189e-02]
  [ 0.00000000e+00  0.00000000e+00]]

 [[ 2.20002984e+00 -2.20002984e+00]
  [ 7.75916086e-03 -7.75916086e-03]
  [ 3.95152810e-01 -3.95152810e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 1.52566789e-01 -1.52566789e-01]
  [ 0.00000000e+00  0.00000000e+00]]

 [[-8.28965461e-01  8.28965461e-01]
  [-4.43687947e-02  4.43687947e-02]
  [ 3.37305776e-01 -3.37305776e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 8.26477289e-03 -8.26477289e-03]
  [ 0.00000000e+00  0.00000000e+00]]]
shap2array.shape (100, 12) 
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
 [[ 5.97178729e-01  5.18879297e-03 -1.70566444e-01 ...  0.00000000e+00
  -6.58794799e-02  0.00000000e+00]
 [ 4.45574118e-01  1.00665452e-03  8.12237233e-01 ...  0.00000000e+00
  -8.56381961e-01  0.00000000e+00]
 [ 3.87412165e-01 -1.52848351e-01  1.02755954e+00 ...  0.00000000e+00
  -1.10240434e+00  0.00000000e+00]
 ...
 [ 5.28928223e-01 -7.14116015e-03  8.82241728e-01 ...  0.00000000e+00
  -7.47521189e-02  0.00000000e+00]
 [-2.20002984e+00 -7.75916086e-03 -3.95152810e-01 ...  0.00000000e+00
  -1.52566789e-01  0.00000000e+00]
 [ 8.28965461e-01  4.43687947e-02 -3.37305776e-01 ...  0.00000000e+00
  -8.26477289e-03  0.00000000e+00]]
mode_exp_value： -1.9982244224656025

# 4.3、shap Visualization of decision diagram

# Stacking the decision diagrams together helps shap Locate outliers , That is, the sample deviates from the dense group

原网站

版权声明
本文为[A Virgo procedural ape]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/188/202207070033279187.html

当前位置：网站首页>ML's shap: Based on the adult census income binary prediction data set (whether the predicted annual income exceeds 50K), use the shap decision diagram combined with the lightgbm model to realize the

ML's shap: Based on the adult census income binary prediction data set (whether the predicted annual income exceeds 50K), use the shap decision diagram combined with the lightgbm model to realize the

be based on adult Census income two classification forecast data set ( Whether the predicted annual income exceeds 50k) utilize shap Decision diagram combination LightGBM A detailed introduction to the case of outlier detection based on the model

# 1、 Define datasets

# 2、 Data set preprocessing

# 2.1、 Preliminary screening of modeling features

# 2.2、 Target feature binarization

# 2.3、 Category feature coding digitization

# 2.4、 Separate features from labels

#3、 Model training and reasoning

# 3.1、 Data set segmentation

# 3.2、 Model building and training

# 3.3、 Model to predict

# 4、 utilize shap Decision graph for outlier detection

# 4.1、 A small part of the original data and the preprocessed data are sampled respectively

# 4.2、 establish Explainer And calculate SHAP value

# 4.3、shap Visualization of decision diagram

边栏推荐

猜你喜欢

随机推荐