当前位置:网站首页>论文阅读_ICD编码_MSMN
论文阅读_ICD编码_MSMN
2022-07-03 04:39:00 【xieyan0811】
介绍
英文题目:Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding
中文题目:自动ICD编码的同义词匹配网络
论文地址:https://export.arxiv.org/pdf/2203.01515.pdf
领域:自然语言处理、生物医疗
发表时间:2022
作者:Zheng Yuan等,清华大学,阿里巴巴
出处:ACL
代码和数据: https://github.com/GanjinZero/ICD-MSMN
阅读时间:2022.06.14
读后感
通过代入外部资源UMLS,论文收集了每个编码的同义词,从而弥补了电子病历与ICD编码描述中同义不同词的问题。
其算法并没有像之前一些模型那么精巧,但引入外部资源后,效果的确提升不少。
泛读
- 针对问题:ICD编码中一义多词问题
- 核心方法:
- 提出了多同义词匹配网络 (MSMN)
- 使用LSTM+多头注意力
- 将编码的同义词作为query以关注描述中的不同短语,从而生成与ICD编码相关的表示。
- 使用双仿射的ICD编码相似度的文本表示,用于最终分类。
- 泛读后理解程度:
- 半小时看完,半小时整理(这是一篇短文)
方法
ICD编码同义词
使用UMLS(一体化医学语言系统)知识图,对ICD编码描述进行扩展,首先,将代码描述l1与UMLS中的概念唯一标识符CUIs对齐;然后从UMLS中选择具有相同CUIs的英语术语同义词,并通过删除连字符和单词“NOS”来添加额外的同义词。从而对每个ICD编码生成 {l2,l3…lM} 文本,下面用N表示每个描述包含的单词个数。
编码
使用LSTM作为编码器,利用预训练的词向量将词wi映射成xi,使用d层的双向LSTM,将词嵌入作为输入,计算其隐藏层作为表示。
对同义词编码时,使用同样的编码器编码,然后用最大池化获取其表示:

多同义词注意力
受多头注意力的启发,文中使用了多同义词注意力,将隐藏层切分成M块(M头):

此时,使用编码同义词的表示qj来查询Hj,用Hj和qj的线性变换来计算注意力得分a;文本与代码同义词的相关编码可用Ha求得。聚合基于编码的文本表示v,当只需要与一个编码匹配时,使用

分类器
分类器用于判断文本S是否包含ICD编码l,基于前面计算的依赖编码的文本表示vl和编码的表示qj,使用双仿射变换来衡量分类的相似性。

之前很多模型只依赖编码,因此需要训练集中包含每种编码的实例,而这里的q是基于编码的文本表示,因此,学习的是文本之间的关系,与具体的代码无关。
训练
用交叉熵来计算预测概率与实际标签的差异:

边栏推荐
- [BMZCTF-pwn] 20-secret_ file
- 4 years of experience to interview test development, 10 minutes to end, ask too
- [set theory] binary relation (example of binary relation on a | binary relation on a)
- [software testing-6] & Test Management
- vulnhub HA: Natraj
- When using the benchmarksql tool to preheat data for kingbasees, execute: select sys_ Prewarm ('ndx_oorder_2 ') error
- Web - Information Collection
- [BMZCTF-pwn] 18-RCTF-2017-Recho
- [set theory] binary relationship (binary relationship notation | binary relationship from a to B | number of binary relationships | example of binary relationship)
- 使用BENCHMARKSQL工具对KingbaseES执行测试时报错funcs sh file not found
猜你喜欢
![[fxcg] market analysis today](/img/ac/294368e3496a5b808b38833053ee81.jpg)
[fxcg] market analysis today

Contents of welder (primary) examination and welder (primary) examination in 2022

Handling record of electric skateboard detained by traffic police

Preliminary cognition of C language pointer

使用BENCHMARKSQL工具对KingbaseES执行测试时报错funcs sh file not found

Web - Information Collection

Leetcode simple question: check whether two string arrays are equal

使用BENCHMARKSQL工具对kingbaseES执行灌数据提示无法找到JDBC driver

使用BENCHMARKSQL工具对KingbaseES预热数据时执行:select sys_prewarm(‘NDX_OORDER_2 ‘)报错

Youdao cloud notes
随机推荐
What functions need to be set after the mall system is built
Auman Galaxy new year of the tiger appreciation meeting was held in Beijing - won the double certification of "intelligent safety" and "efficient performance" of China Automotive Research Institute
2022 new examination questions for the main principals of hazardous chemical business units and examination skills for the main principals of hazardous chemical business units
Number of uniform strings of leetcode simple problem
会员积分商城系统的功能介绍
How do you use lodash linking function- How do you chain functions using lodash?
data2vec! New milestone of unified mode
FFMpeg filter
[BMZCTF-pwn] 20-secret_ file
雇佣收银员(差分约束)
Joint search set: the number of points in connected blocks (the number of points in a set)
Why does I start with =1? How does this code work?
Writing skills of multi plate rotation strategy -- strategy writing learning materials
Matplotlib -- save graph
[set theory] relational representation (relational matrix | examples of relational matrix | properties of relational matrix | operations of relational matrix | relational graph | examples of relationa
2022 tea master (intermediate) examination questions and tea master (intermediate) examination skills
Mount NFS in kubesphere
C Primer Plus Chapter 10, question 14 3 × 5 array
General undergraduate college life pit avoidance Guide
2022 chemical automation control instrument examination summary and chemical automation control instrument certificate examination