当前位置:网站首页>ML's shap: Based on the adult census income binary prediction data set (whether the predicted annual income exceeds 50K), use the shap decision diagram combined with the lightgbm model to realize the

ML's shap: Based on the adult census income binary prediction data set (whether the predicted annual income exceeds 50K), use the shap decision diagram combined with the lightgbm model to realize the

2022-07-07 05:58:00 A Virgo procedural ape

ML And shap: be based on adult Census income two classification forecast data set ( Whether the predicted annual income exceeds 50k) utilize shap Decision diagram combination LightGBM A detailed introduction to the case of outlier detection based on the model

Catalog

be based on adult Census income two classification forecast data set ( Whether the predicted annual income exceeds 50k) utilize shap Decision diagram combination LightGBM A detailed introduction to the case of outlier detection based on the model

# 1、 Define datasets

# 2、 Data set preprocessing

# 2.1、 Preliminary screening of modeling features

# 2.2、 Target feature binarization

# 2.3、 Category feature coding digitization

# 2.4、 Separate features from labels

#3、 Model training and reasoning

# 3.1、 Data set segmentation

# 3.2、 Model building and training

# 3.3、 Model to predict

# 4、 utilize shap Decision graph for outlier detection

# 4.1、 A small part of the original data and the preprocessed data are sampled respectively

# 4.2、 establish Explainer And calculate SHAP value

# 4.3、shap Visualization of decision diagram


Related articles
ML And shap: be based on adult Census income two classification forecast data set ( Whether the predicted annual income exceeds 50k) utilize shap Decision diagram combination LightGBM A detailed introduction to the case of outlier detection based on the model
ML And shap: be based on adult Census income two classification forecast data set ( Whether the predicted annual income exceeds 50k) utilize shap Decision diagram combination LightGBM Model implementation of outlier detection case detailed strategy implementation

be based on adult Census income two classification forecast data set ( Whether the predicted annual income exceeds 50k) utilize shap Decision diagram combination LightGBM A detailed introduction to the case of outlier detection based on the model

# 1、 Define datasets

ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countrysalary
39State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K
50Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
38Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States<=50K
53Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
28Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba<=50K
37Private284582Masters14Married-civ-spouseExec-managerialWifeWhiteFemale0040United-States<=50K
49Private1601879th5Married-spouse-absentOther-serviceNot-in-familyBlackFemale0016Jamaica<=50K
52Self-emp-not-inc209642HS-grad9Married-civ-spouseExec-managerialHusbandWhiteMale0045United-States>50K
31Private45781Masters14Never-marriedProf-specialtyNot-in-familyWhiteFemale14084050United-States>50K
42Private159449Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale5178040United-States>50K

# 2、 Data set preprocessing

# 2.1、 Preliminary screening of modeling features

df.columns 
 14

# 2.2、 Target feature binarization

# 2.3、 Category feature coding digitization

ageworkclasseducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countrysalary
039713411412174040390
150613240410013390
23849061410040390
35347260210040390
428413210520004050
537414245400040390
64945381200016230
75269240410045391
83141441014014084050391
942413240415178040391

# 2.4、 Separate features from labels

ageworkclasseducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_country
3971341141217404039
5061324041001339
384906141004039
534726021004039
2841321052000405
3741424540004039
494538120001623
526924041004539
314144101401408405039
4241324041517804039

salary
0
0
0
0
0
0
0
1
1
1

#3、 Model training and reasoning

# 3.1、 Data set segmentation

X_test

ageworkclasseducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_country
13424731001141004035
133871313013340232901635
1895861621004100135
1332233947121003535
1816462923041019024035
1685373924041019024535
657343923041004535
18462101040340004035
5543311103420004035
196349313212041005035

# 3.2、 Model building and training

params = {
    "max_bin": 512, "learning_rate": 0.05,
    "boosting_type": "gbdt", "objective": "binary",
    "metric": "binary_logloss", "verbose": -1,
     "min_data": 100, "random_state": 1,
    "boost_from_average": True, "num_leaves": 10 }

LGBMC = lgb.train(params, lgbD_train, 10000, 
                  valid_sets=[lgbD_test], 
                  early_stopping_rounds=50, 
                  verbose_eval=1000)

# 3.3、 Model to predict

ageworkclasseducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countryy_test_prediy_test
134247310011410040350.0452255750
1338713130133402329016350.0747991720
18958616210041001350.300143321
13322339471210035350.0039664270
18164629230410190240350.3638612940
16853739240410190245350.7386286711
6573439230410045350.3764121740
184621010403400040350.0023098840
55433111034200040350.0603458361
1963493132120410050350.7035063661

# 4、 utilize shap Decision graph for outlier detection

# 4.1、 A small part of the original data and the preprocessed data are sampled respectively

# 4.2、 establish Explainer And calculate SHAP value

shap2exp.values.shape (100, 12, 2) 
 [[[-5.97178729e-01  5.97178729e-01]
  [-5.18879297e-03  5.18879297e-03]
  [ 1.70566444e-01 -1.70566444e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 6.58794799e-02 -6.58794799e-02]
  [ 0.00000000e+00  0.00000000e+00]]

 [[-4.45574118e-01  4.45574118e-01]
  [-1.00665452e-03  1.00665452e-03]
  [-8.12237233e-01  8.12237233e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 8.56381961e-01 -8.56381961e-01]
  [ 0.00000000e+00  0.00000000e+00]]

 [[-3.87412165e-01  3.87412165e-01]
  [ 1.52848351e-01 -1.52848351e-01]
  [-1.02755954e+00  1.02755954e+00]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 1.10240434e+00 -1.10240434e+00]
  [ 0.00000000e+00  0.00000000e+00]]

 ...

 [[-5.28928223e-01  5.28928223e-01]
  [ 7.14116015e-03 -7.14116015e-03]
  [-8.82241728e-01  8.82241728e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 7.47521189e-02 -7.47521189e-02]
  [ 0.00000000e+00  0.00000000e+00]]

 [[ 2.20002984e+00 -2.20002984e+00]
  [ 7.75916086e-03 -7.75916086e-03]
  [ 3.95152810e-01 -3.95152810e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 1.52566789e-01 -1.52566789e-01]
  [ 0.00000000e+00  0.00000000e+00]]

 [[-8.28965461e-01  8.28965461e-01]
  [-4.43687947e-02  4.43687947e-02]
  [ 3.37305776e-01 -3.37305776e-01]
  ...
  [ 0.00000000e+00  0.00000000e+00]
  [ 8.26477289e-03 -8.26477289e-03]
  [ 0.00000000e+00  0.00000000e+00]]]
shap2array.shape (100, 12) 
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
 [[ 5.97178729e-01  5.18879297e-03 -1.70566444e-01 ...  0.00000000e+00
  -6.58794799e-02  0.00000000e+00]
 [ 4.45574118e-01  1.00665452e-03  8.12237233e-01 ...  0.00000000e+00
  -8.56381961e-01  0.00000000e+00]
 [ 3.87412165e-01 -1.52848351e-01  1.02755954e+00 ...  0.00000000e+00
  -1.10240434e+00  0.00000000e+00]
 ...
 [ 5.28928223e-01 -7.14116015e-03  8.82241728e-01 ...  0.00000000e+00
  -7.47521189e-02  0.00000000e+00]
 [-2.20002984e+00 -7.75916086e-03 -3.95152810e-01 ...  0.00000000e+00
  -1.52566789e-01  0.00000000e+00]
 [ 8.28965461e-01  4.43687947e-02 -3.37305776e-01 ...  0.00000000e+00
  -8.26477289e-03  0.00000000e+00]]
mode_exp_value: -1.9982244224656025

# 4.3、shap Visualization of decision diagram

# Stacking the decision diagrams together helps shap Locate outliers , That is, the sample deviates from the dense group

原网站

版权声明
本文为[A Virgo procedural ape]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207070033279187.html