当前位置:网站首页>Detailed explanation of convirt paper (medical pictures)
Detailed explanation of convirt paper (medical pictures)
2022-06-12 06:36:00 【PD, I am your true love fan】
ConVIRT Detailed explanation of the paper ( Medical pictures ) – Panden's in-depth study notes
List of articles
Preface
ConVIRT The full name is (contrastive learning of medical visual representations from paired images and text), It is also a work of comparative learning , But it combines multimodality ; Is in CLIP Previous work ;
- Learning the visual representation of medical images is the core of medical images , But its progress has been hindered by the small size data set of manual labels ;
- Existing work usually relies on starting from ImageNet Pre training model , Because the image features are completely different , Not so good ;
- Or extract rule-based label images from text report data matched with medical treatment , Labels are inaccurate and difficult to generalize ;
The way the medical field labels
- Ask experts to label with high quality , This leads to a small number ;
- Use a rule to extract labels from a report , There are many such text image pairs in the medical system ;
- But extraction rules are often difficult to use , Sometimes it is difficult to extract certain tags ;
- Because every doctor writes differently , This rule is also difficult to use across hospitals ;
The overall architecture

- A picture is randomly cropped first , Another data enhancement , Then enter Image Encoder(ResNet50), One last MLP obtain 512 The feature representation of dimension ;
- A passage paired with this picture , Random sampling of some of them ( a few words , Or incomplete sentences ), Then enter text Encoder(Bert), One last MLP obtain 512 The feature representation of dimension ;
- Because a batch There is N Picture text pairs , So it can be understood as having N-1 A negative example , There is only one positive example , Then calculate the image and text respectively infoNCE loss;

Data sets
- public MIMIC-CXR The second part of the database 2 edition , This is a set of breasts X The light image is paired with its text report , And because of its publication, it has become a standard resource for the study of multimodal modeling of medical images . After pretreatment , The data set contains about 217k Images - The text is right , Each pair contains on average 1.7 Pictures and 6.0 A sentence ;
- Bone image : A set of musculoskeletal images was obtained from the Rhode Island Hospital system - The text is right . After the chest image , Musculoskeletal images constitute the second most common type of radiographic images in typical hospitals . The data set contains 48k Images - The text is right , Each pair contains on average 2.5 Two images and 8.0 A sentence .
experiment
Classification task
- RSNA Pneumonia Detection: Whether pneumonia is a dichotomous task ;
- CheXpert image classification: Multi label classification task for lung ;
- COVIDx image classificatio: Whether novel coronavirus pneumonia , Common pneumonia , Normal three category tasks ;
- MURA bony abnormality detection: The dichotomous task of determining whether skeletal muscle is normal ;
Two ways : linear probe Only train the sorting head ,fine-tuning

- From top to bottom is :
- Random initialization ;
- ResNet50 stay ImageNet Pre trained ;
- Caption-LSTM It's a network that looks at pictures and talks ( Its architecture is shown in the following figure );
- Caption-Transformer Same as the last one , But with Transformer To replace the LSTM( yes COCO image captioning benchmar);
- Contrastive-Binary It is also a network of comparative learning , Is to input a group of pairs , Judge whether it belongs to a pair , As a pre training task ;
- because COVIDx There isn't that much data , So I didn't do it 1% Testing at level ;
The following figure shows the visualization of feature space , On the left is ImageNet Pre trained , The picture on the right is ConVIRT
Zero-shot Mission
image CLIP equally , The most multimodal is Zero-shot, There's no need to fine tune , Sort through the tips of the picture ;
Zero-shot Mission
- Image-Image: Transfer all pictures into Image-Encoder, Similar to all requirements , Gather in a heap ;( Make a picture query To find the similarity with all other pictures , It is the similarity of the feature space of comparative learning )
- still CheXpert Data sets , But this dataset is not a multi label classification task , The author takes out those pictures with only one label as query( After expert treatment and screening , Each category is reserved for 10 Zhang );
- With these query To find similarity with others , Get the results , If that picture also has query This label is an example , Otherwise, it's a negative example ;
- Text-image: Ask the experts to write CheXpert Symptoms for each label in ( Yes 5 Share ), Then enter text Encoder, With all access Image-Encoder The characteristics of the ( Of course, these characteristics have to be MLP layer ); The method of distinguishing positive and negative examples is the same as above ;

Super parameter setting

- Big batch It's bad for Image-Image and Text-Image, Because it is possible that the negative cases themselves are potential positive cases ;
- Will the last MLP Remove the activation function , Yes Image-Image and Text-Image Their performance has declined ;
- The above two questions do not affect the classification ;
Compare with other models of comparative learning , But because SimCLR And MoCo It's all a comparison of the pictures , So we can only compare Image-Image
- It's all used MIMIC-CXR Data sets ,SimCLR All useless v2 edition , Nature can't compare with , however MoCo v2 Mingming used a bigger and more consistent Dictionary , Why still can't compare with ;
- I think there are several reasons : The super parameter is not adjusted properly ( Because they moved here directly ), Using text data can improve the learning efficiency of the model , Provide model accuracy ;
边栏推荐
- 六月集训 第八日——前缀和
- platform driver
- Tomato learning notes -seq2seq
- Dlib face detection
- LeetCode-1490. Clone n-ary tree
- Apache poi 导入导出Excel文件
- Tips for using the potplayer video player
- SQL 注入-盲注
- SQL 注入读写文件
- [easyexcel] easyexcel checks whether the header matches the tool class encapsulated in easyexcel, including the field verification function. You can use validate to verify
猜你喜欢

2021 RoboCom 世界机器人开发者大赛-本科组(初赛)

Opencv_ 100 questions_ Chapter V (21-25)

Node. Detailed installation tutorial of CPM and cnpm (including error resolution)

8. 表单标签

SQL injection read / write file

An error occurred while downloading the remote file The errormessage

QT--实现TCP通信

Cv2.fillpoly coco annotator segment coordinate conversion to mask image

Reentrantlock underlying AQS source code analysis

Tips for using the potplayer video player
随机推荐
Computer composition and design work05 ——fifth verson
Tips for using the potplayer video player
Get the size of the picture
8. 表单标签
Solution: content type 'application/x-www-form-urlencoded; charset=UTF-8‘ not supported
What states do threads have?
SQL language
[word] word 2010 recording macro batch replacing paragraph marks in the selected text
LeetCode-1587. Bank account summary II
C language pointer
(14)Blender源码分析之闪屏窗口显示软件版本号
Jetson TX2 machine brushing jetpack4.2 (self test successful version)
六月集训 第一日——数组
LeetCode-1405. Longest happy string
Computer composition and design work06 —— 基于MIPS
PHP 开发环境搭建及数据库增删改查
PDF. JS help file
六月集训 第九日——位运算
LeetCode-1185. Day of the week
platform driver