当前位置:网站首页>Detailed explanation of convirt paper (medical pictures)
Detailed explanation of convirt paper (medical pictures)
2022-06-12 06:36:00 【PD, I am your true love fan】
ConVIRT Detailed explanation of the paper ( Medical pictures ) – Panden's in-depth study notes
List of articles
Preface
ConVIRT The full name is (contrastive learning of medical visual representations from paired images and text), It is also a work of comparative learning , But it combines multimodality ; Is in CLIP Previous work ;
- Learning the visual representation of medical images is the core of medical images , But its progress has been hindered by the small size data set of manual labels ;
- Existing work usually relies on starting from ImageNet Pre training model , Because the image features are completely different , Not so good ;
- Or extract rule-based label images from text report data matched with medical treatment , Labels are inaccurate and difficult to generalize ;
The way the medical field labels
- Ask experts to label with high quality , This leads to a small number ;
- Use a rule to extract labels from a report , There are many such text image pairs in the medical system ;
- But extraction rules are often difficult to use , Sometimes it is difficult to extract certain tags ;
- Because every doctor writes differently , This rule is also difficult to use across hospitals ;
The overall architecture

- A picture is randomly cropped first , Another data enhancement , Then enter Image Encoder(ResNet50), One last MLP obtain 512 The feature representation of dimension ;
- A passage paired with this picture , Random sampling of some of them ( a few words , Or incomplete sentences ), Then enter text Encoder(Bert), One last MLP obtain 512 The feature representation of dimension ;
- Because a batch There is N Picture text pairs , So it can be understood as having N-1 A negative example , There is only one positive example , Then calculate the image and text respectively infoNCE loss;

Data sets
- public MIMIC-CXR The second part of the database 2 edition , This is a set of breasts X The light image is paired with its text report , And because of its publication, it has become a standard resource for the study of multimodal modeling of medical images . After pretreatment , The data set contains about 217k Images - The text is right , Each pair contains on average 1.7 Pictures and 6.0 A sentence ;
- Bone image : A set of musculoskeletal images was obtained from the Rhode Island Hospital system - The text is right . After the chest image , Musculoskeletal images constitute the second most common type of radiographic images in typical hospitals . The data set contains 48k Images - The text is right , Each pair contains on average 2.5 Two images and 8.0 A sentence .
experiment
Classification task
- RSNA Pneumonia Detection: Whether pneumonia is a dichotomous task ;
- CheXpert image classification: Multi label classification task for lung ;
- COVIDx image classificatio: Whether novel coronavirus pneumonia , Common pneumonia , Normal three category tasks ;
- MURA bony abnormality detection: The dichotomous task of determining whether skeletal muscle is normal ;
Two ways : linear probe Only train the sorting head ,fine-tuning

- From top to bottom is :
- Random initialization ;
- ResNet50 stay ImageNet Pre trained ;
- Caption-LSTM It's a network that looks at pictures and talks ( Its architecture is shown in the following figure );
- Caption-Transformer Same as the last one , But with Transformer To replace the LSTM( yes COCO image captioning benchmar);
- Contrastive-Binary It is also a network of comparative learning , Is to input a group of pairs , Judge whether it belongs to a pair , As a pre training task ;
- because COVIDx There isn't that much data , So I didn't do it 1% Testing at level ;
The following figure shows the visualization of feature space , On the left is ImageNet Pre trained , The picture on the right is ConVIRT
Zero-shot Mission
image CLIP equally , The most multimodal is Zero-shot, There's no need to fine tune , Sort through the tips of the picture ;
Zero-shot Mission
- Image-Image: Transfer all pictures into Image-Encoder, Similar to all requirements , Gather in a heap ;( Make a picture query To find the similarity with all other pictures , It is the similarity of the feature space of comparative learning )
- still CheXpert Data sets , But this dataset is not a multi label classification task , The author takes out those pictures with only one label as query( After expert treatment and screening , Each category is reserved for 10 Zhang );
- With these query To find similarity with others , Get the results , If that picture also has query This label is an example , Otherwise, it's a negative example ;
- Text-image: Ask the experts to write CheXpert Symptoms for each label in ( Yes 5 Share ), Then enter text Encoder, With all access Image-Encoder The characteristics of the ( Of course, these characteristics have to be MLP layer ); The method of distinguishing positive and negative examples is the same as above ;

Super parameter setting

- Big batch It's bad for Image-Image and Text-Image, Because it is possible that the negative cases themselves are potential positive cases ;
- Will the last MLP Remove the activation function , Yes Image-Image and Text-Image Their performance has declined ;
- The above two questions do not affect the classification ;
Compare with other models of comparative learning , But because SimCLR And MoCo It's all a comparison of the pictures , So we can only compare Image-Image
- It's all used MIMIC-CXR Data sets ,SimCLR All useless v2 edition , Nature can't compare with , however MoCo v2 Mingming used a bigger and more consistent Dictionary , Why still can't compare with ;
- I think there are several reasons : The super parameter is not adjusted properly ( Because they moved here directly ), Using text data can improve the learning efficiency of the model , Provide model accuracy ;
边栏推荐
- The seventh day of June training - hash table
- torch在高版本训练的模型在低版本中使用报错
- It only takes 10 minutes to understand the underlying principle of NiO
- Redis supports data structure types
- 集合判断存在交集
- Tomato learning notes -seq2seq
- 使用 ms17-010 永恒之蓝漏洞对 win7 进行渗透及建立永久后门
- 张驰课堂:2022年CAQ中质协六西格玛考试时间通知
- Set judge the existence of intersection
- 六月集训 第八日——前缀和
猜你喜欢

2021 RoboCom 世界机器人开发者大赛-本科组(初赛)

AI作业ch8

SQL injection read / write file

Qt-- realize TCP communication

Summary of some problems in sensor bring up

Tips for using the potplayer video player

leetcode 704. Binary search

Explanation of sensor flicker/banding phenomenon

The vs 2019 community version Microsoft account cannot be logged in and activated offline

Redis configuration (IV) -- cluster
随机推荐
LeetCode-219. Duplicate Element II present
Set [list] to find out the subscript of repeated elements in the list (display the position of the subscript)
张驰咨询:流程是一剂万能良药吗?
The first principle of thinking method
Solution: unsatisfieddependencyexception: error creating bean with name 'authaspect':
The first day of June training - array
Zip and Items() difference
五月集训(第28天)——动态规划
The principle of SQL injection is to build sqli labs, and SQL injection is simple and practical
六月集训 第九日——位运算
LeetCode-1873. Calculate special bonus
LeetCode-1741. Find total time spent per employee
Multithreading (V) -- Concurrent tools (II) -- j.u.c concurrent contracting (I) -- AQS and reentrantlock principles
The vs 2019 community version Microsoft account cannot be logged in and activated offline
Multithreading Foundation (XI) -- prevent CPU from occupying 100%
Cv2.fillpoly coco annotator segment coordinate conversion to mask image
Node. Detailed installation tutorial of CPM and cnpm (including error resolution)
LeetCode-1587. Bank account summary II
leetcode 35. Search insert location
Bid farewell to the charged xshell, and the free function of tabby is more powerful