当前位置：网站首页>MIT-6874-Deep Learning in the Life Sciences Week 7

MIT-6874-Deep Learning in the Life Sciences Week 7

2022-07-05 05:51:00 【木姑娘】

Lecture 05 Interpretable Deep Learning

可解释性深度学习
一、Intro to Interpretability
2. Interpreting Deep Neural Networks
Evaluating Attribution Methods

可解释性深度学习

本节讨论的是深度学习的可解释性。模型本身意味着知识，可解释性对于如深度学习这样的“黑盒模型”而言，是解释其为何做出如此判断的原因和方法的根本所在，能够帮助模型朝着人类预期的方向工作。在许多场景，如推荐、医疗等场景有很大的应用前景。

以下是本节课的提纲
在这里插入图片描述

一、Intro to Interpretability

1a. Interpretability definition: Convert implicit NN information to human-interpretable information

在这里插入图片描述

1b. Motivation: Verify model works as intended; debug classifier; make discoveries; Right to explanation

Why Interpretability?
1.Verify that model works as expected: Wrong decisions can be costly and dangerous
2. Improve / Debug classifier
3. Make new discoveries
4.Right to explanation
“Right to be given an explanation for an output of the algorithm”

1c. Ante-hoc (train interpretable model) vs. Post-hoc (interpret complex model; degree of “locality”)

知乎好文：可解释AI的调研
 事后解释VS自解释

获得模型可解释性的两种方法（即可解释性的分类）
Ante-hoc & Post-hoc

在这里插入图片描述

1. Ante-hoc 可解释性——事先可解释性（模型内置可解释性）

通过训练一个自认具有可解释的模型，以获得对结果的解释。

常见的可解释模型：
朴素贝叶斯
线性回归
决策树
基于规则的模型

但是这类模型可达的复杂程度有限，从而导致其根本上的性能受限

2. Post-hoc Interpretability——事后可解释性

指对于黑盒模型，通过某些方法，体现出他的决策逻辑

在这里插入图片描述

可解释性的几个级别：
模型级别的可解释：DNN模型为什么要如此决定决策边界
特征的可解释性：哪一部分特征能最大化的激活当前的模型
走向单个个体的可解释：解释为何这个输入会被如此分类

在这里插入图片描述

2. Interpreting Deep Neural Networks

2a. Interpreting Models (macroscopic, understand internals) vs. decisions (microscopic, practical applications)

（课程走向）模型可解释性的几种分类

Interpreting decisions:
- Attribution method: 什么属性决定了模型当前的输出
- Example-based: 什么特殊的案例导致模型当前的输出
Interpreting models:
- Representation analysis: 模型表示本身
- Data generation：如何使用模型生成数据
- Example-based：相关案例
  
  DNN interpretability 可以分为宏观和微观两个层面

Interpreting models 又可以分为以下四个方面，其中对于表示的分析可以分为权重可视化和代理模型

在这里插入图片描述

2b. Interpreting Models: Weight visualization, Surrogate model, Activation maximization, Example-based

1. Weight visualization 权重可视化

对CNN的每一层滤波器进行可视化，以理解模型在当前层在学习什么东西
在这里插入图片描述

2. Surrogate model 代理模型

使用一个简单的，“可解释的”模型来“summarize”模型的输出，试图解释“black box”的输出。
在这里插入图片描述

3. Data Generation / Activation maximization 数据生成 / 激活最大化

激活最大化：找到最大程度激活神经元的方式，即找到输入X，使模型在当前类别下的概率最大
在这里插入图片描述

模型的卷积与解卷积

刚开始的输入是混沌的，随着训练层数的增加，渐渐可以分辨数字之间的特征

优点：这种做法的优点和缺点
在这里插入图片描述

DNN可以通过寻找输出量最大化的输入模式来解释。
与数据连接可以提高可视化的可解释性。

4. Example-based

在这里插入图片描述

小结：
通过将每层权重可视化
替代准确性不高但可解释性强的模型
在某种程度上通过最大化激活函数获得有用的特征/信息
通过有效构造prototype和criticism，指导模型学习获取该类别下最有用的，用于区分的信息

2c. Interpreting Decisions:

在这里插入图片描述

Example-based

输入的训练样本对模型的结果产生决定性的影响
在这里插入图片描述

Attribution Methods: why are gradients noisy?

给每个像素值一个因果分数，即当前像素对模型给这个结果起到了多大的作用。
在这里插入图片描述
将归因可视化后的结果

关键是，构建（特征）显著图（Saliency Map）
在这里插入图片描述
提升saliency map的方法，首先：思路转变

假设1: saliency map是真实的

图像中随机分布的某些像素对网络如何做出决定至关重要。
噪音很重要

假设2: 梯度是不连续的
DNN使用分段线性函数(ReLU激活，max-pooling等)。
重要性分数在输入的无穷小变化上的突变跳跃。

假设3:
一个特征可能会在全球范围内产生强大的影响，但在局部却会产生很小的影响

其他的归因方法

Gradient-based Attribution: SmoothGrad, Interior Gradient

在这里插入图片描述

在这里插入图片描述

Backprop-based Attribution: Deconvolution, Guided Backpropagation

在这里插入图片描述

观察:移除更多的梯度会带来更清晰的视觉效果

在这里插入图片描述

Evaluating Attribution Methods

在这里插入图片描述

3a. Qualitative: Coherence: Attributions should highlight discriminative features / objects of interest

归因应基于区别性特征

3b. Qualitative: Class Sensitivity: Attributions should be sensitive to class labels

归因应该是类别敏感的

3c. Quantitative: Sensitivity: Removing feature with high attribution --> large decrease in class probability

移除属性高的特征会导致类概率大幅下降

3d. Quantitative: ROAR & KAR. Low class prob cuz image unseen --> remove pixels, retrain, measure acc. drop

在这里插入图片描述

版权声明
本文为[木姑娘]所创，转载请带上原文链接，感谢
https://blog.csdn.net/m0_37678226/article/details/125585050

边栏推荐

猜你喜欢

随机推荐