当前位置:网站首页>Day14. Using interpretable machine learning method to distinguish intestinal tuberculosis and Crohn's disease
Day14. Using interpretable machine learning method to distinguish intestinal tuberculosis and Crohn's disease
2022-07-27 05:51:00 【Ignorant graduate student】
Title:
Differentiation of intestinal tuberculosis and Crohn’s disease through explainable machine learning method
Using interpretable machine learning method to distinguish intestinal tuberculosis and Crohn's disease
Keywords:
Intestinal tuberculosis; Crohn’s disease; Shapley Value; Machine learning
Intestinal tuberculosis , Crohn's disease , Shapley Value , machine learning
Shapley value method ( Shapley Value Method) from 2009 Nobel Laureate in economics 、 Lloyd, a famous economist · Sapley (Lloyd Shapley,1923-2016) Put forward , It is mainly used to solve the problem of benefit distribution of all parties in the cooperative game , prevent “ We can share difficulties , I don't know how to distribute happiness ” The embarrassment of .(From: Manage afternoon tea : Shapley value method ( Shapley Value Method)https://zhuanlan.zhihu.com/p/165051523)
Abstract:
This study set out to develop an effective framework to distinguish Crohn’s disease from intestinal tuberculosis through an explainable machine learning (ML) model. A cohort consisting of 200 patient data (CD = 160, ITB = 40) is used in training and validating models. After feature selection, a total of nine variables are extracted, including intestinal surgery, abdominal, bloody stool, PPD, knot, ESAT-6, CFP-10, intestinal dilatation and comb sign. Besides, we compared the predictive performance of the ML models with traditional statistical methods. This work also provides insights into the ML model’s outcome through the SHAP method. Results illustrate that the XGBoost algorithm outperforms other classifiers in terms of area under the receiver operating characteristic curve (AUC), sensitivity, specificity, precision and Matthews correlation coefficient (MCC), yielding values of 0.891, 0.813, 0.969, 0.867 and 0.801 respectively. More importantly, the prediction outcomes of XGBoost can be effectively explained through the SHAP method. The proposed framework proves that the effectiveness of distinguishing CD from ITB through interpretable machine learning, which has potential value in clinical application.
This study aims to establish an effective framework , Through interpretable machine learning (ML) Model to distinguish Crohn's disease and intestinal tuberculosis . One by 200 Patient data ( Crohn's disease 160, Intestinal tuberculosis 40) The queue is used to train and verify the model . After feature selection , Co extraction 9 A variable , Including intestinal surgery 、 abdomen 、 Bloody stool 、 Tuberculin test 、 Tubercle 、ESAT-6、CFP-10、 Intestinal dilatation and comb sign . Besides , We also compared ML Prediction performance of the model and traditional statistical methods . This work also passed SHAP Method ( Black box model post attribution analysis ) Provide for the right to ML Insights into model results . It turns out that ,XGBoost The area of the algorithm under the receiver's working characteristic curve (AUC)、 sensitivity 、 Specificity 、 Precision and Matthews correlation coefficient (MCC) And other aspects are better than other classifiers , Get... Separately 0.891、0.813、0.969、0.867 and 0.801. what's more , adopt SHAP Methods can effectively explain XGBoost Forecast results of . This framework proves the effectiveness of distinguishing Crohn's disease and intestinal tuberculosis through interpretable machine learning , It has potential clinical application value .
Matthews correlation coefficient :
Matthews correlation coefficient is used in machine learning as binary (2 class ) The classification of quality measures , Through Brian W. Matthews is 1975 Introduced by biochemistry in . It returns between -1 and +1 Between the value of the . coefficient +1 Indicates a perfect forecast ,0 It means no better than random prediction ,-1 Indicates a complete inconsistency between predictions and observations . Statistics are also known as phi coefficient . The following formula can be used to calculate directly from the confusion matrix MCC :
(From:【 machine learning 】 Matthews correlation coefficient (Matthews correlation coefficient)https://blog.csdn.net/ARPOSPF/article/details/84997220)
The proposed framework consists of three components. The first level performs the imbalanced treatment of the dataset using a SMOTE algorithm (Chawla et al., 2002). In the second level, a tree-based model is applied to detect CD from ITB. At the last level, the interpretation and visualization of the model are demonstrated through Shapley values (Lundberg and Lee, 2017b). To validate the superiority of the proposed method, we compare the performance of six different classical algorithms, including Latent Dirichlet Allocation (LDA), Logistic Regression (LOG), Support Vector Machine (SVM), Artificial Neural Network (ANN), Radom Forest (RF) and Adaptive Boosting (Adaboost) (Fisher, 1936; Kleinbaum et al., 2002; Noble, 2006;Wang, 2003; Breiman, 2001; Hastie et al., 2009). The main contribution of this research addresses a real-world problem, differentiating CD fromITB based on explainablemachine learning. Thismethod can provide local interpretation and direct results of visualization without losing the classification accuracy.
The framework proposed in this study includes three parts . use first SMOTE The algorithm deals with the imbalance of the data set (Chawla et al., 2002); Secondly, the tree based model is used to detect intestinal tuberculosis and Crohn's disease ; Finally through Shapley Values demonstrate the interpretation and visualization of the model (Lundberg and Lee, 2017b). In order to verify the superiority of this method , We compare the performance of six classical algorithms , Include LDA、 Logical regression 、 Support vector machine 、 Artificial neural network 、 Random forest and adaptive lifting (Fisher, 1936; Kleinbaum et al., 2002; Noble, 2006;Wang, 2003; Breiman, 2001; Hastie et al., 2009). The main contribution of this study is to solve a practical problem , That is to distinguish Crohn's disease from intestinal tuberculosis based on interpretable machine learning . This method can be used without losing classification accuracy , Provide local interpretation and visual results .
SMOTE Introduction of algorithm :
In order to solve the problem of unbalanced data ,2002 year Chawla Put forward SMOTE Algorithm , That is, the synthesis of a few oversampling techniques , It is an improved scheme based on random oversampling algorithm . This technology is a common means to deal with unbalanced data , And has been unanimously recognized by academia and industry , Next, briefly describe the theoretical idea of the algorithm .SMOTE The basic idea of the algorithm is to analyze and simulate a few categories of samples , And add new samples of artificial simulation to the data set , So that the categories in the original data are no longer seriously unbalanced . The simulation process of the algorithm adopts KNN technology , The steps of generating new samples by simulation are as follows :
(1) Sampling nearest neighbor algorithm , Calculate the of each minority sample K A close neighbor ;
(2) from K Randomly selected from the nearest neighbors N Random linear interpolation of samples ;
(3) Construct new minority class samples ;
(4) Combine the new sample with the original data , Generate a new training set ;
( Copyright notice : This paper is about CSDN Blogger 「MXuDong」 The original article of , follow CC 4.0 BY-SA Copyright agreement , For reprint, please attach the original source link and this statement . Link to the original text :https://blog.csdn.net/qq_33472765/article/details/87891320)
边栏推荐
- 难道Redis真的变慢了吗?
- kettle如何处理文本数据传输为‘‘而不是null
- Aquanee will land in gate and bitmart in the near future, which is a good opportunity for low-level layout
- Day 11. Evidence for a mental health crisis in graduate education
- GBASE 8C——SQL参考6 sql语法(4)
- 定点一键查询GUI编程的设计与开发
- Which futures company do you go to and how do you open an account?
- GBASE 8C——SQL参考6 sql语法(9)
- 身为技术管理者应该具备的素质(猜想)
- The main advantage of face brushing payment users is their high degree of intelligence
猜你喜欢

inno setup 打包 jar + h5 + mysql + redis 成 exe

如果面试官问你 JVM,额外回答“逃逸分析”技术会让你加分

Getaverse, a distant bridge to Web3

Minimum handling charges and margins for futures companies

The NFT market pattern has not changed. Can okaleido set off a new round of waves?

minio8.x版本设置policy桶策略

19.上下采样与BatchNorm

What are the conditions and procedures for opening crude oil futures accounts?

根据文本自动生成UML时序图(draw.io格式)

NFT new opportunity, multimedia NFT aggregation platform okaleido will be launched soon
随机推荐
inno setup 打包 jar + h5 + mysql + redis 成 exe
GBASE 8C——SQL参考6 sql语法(9)
MySQL查询操作索引优化实践
【好文种草】根域名的知识 - 阮一峰的网络日志
14.实例-多分类问题
Move protocol launched a beta version, and you can "0" participate in p2e
Minio分片上传解除分片大小限制 - chunk size must be greater than 5242880
Face brushing payment is more in line with Alipay's concept of always being ecological
本地ORACLE报ORA-12514: TNS:监听程序当前无法识别请求服务
Day 9. Graduate survey: A love–hurt relationship
Aquanee will land in gate and bitmart in the near future, which is a good opportunity for low-level layout
Common interview questions in software testing
GBASE 8C——SQL参考6 sql语法(4)
Graph node deployment
基于深度神经网络的社交媒体用户级心理压力检测
GBASE 8C——SQL参考6 sql语法(5)
2021中大厂php+go面试题(1)
DDD领域驱动设计笔记
If the interviewer asks you about JVM, the extra answer of "escape analysis" technology will give you extra points
什么是okr,和kpi的区别在哪里