当前位置:网站首页>2022 ICLR | CONTRASTIVE LEARNING OF IMAGE- AND STRUCTURE BASED REPRESENTATIONS IN DRUG DISCOVERY
2022 ICLR | CONTRASTIVE LEARNING OF IMAGE- AND STRUCTURE BASED REPRESENTATIONS IN DRUG DISCOVERY
2022-06-13 04:30:00 【Dazed flounder】
CONTRASTIVE LEARNING OF IMAGE- AND STRUCTURE BASED REPRESENTATIONS IN DRUG DISCOVERY

CLOOME: A molecular characterization tool based on multimodal contrastive learning
This article is written by John · Kepler Linz University Ana Sanchez-Fernandez The team recently published on ICLR 2022, Its main content is : before , Comparative learning methods CLIP and CLOOB It has been proved , When training on multiple modal data , The learned representations can be highly transferred to a large number of different tasks . In the field of drug discovery , Molecular images and chemical structures are similar multimodal datasets , At present, there is no comparative study on the two , This method has great research value in the field of drug discovery with high label cost . Therefore, this work starts with the easily obtained molecular microscopic images and structures , This paper proposes a method based on CLOOB(Contrastive Leave One Out Boost) A new method of contrastive learning ——CLOOME(Contrastive Leave One Out Boost for Molecule Encoders). Through the linear detection of the molecular activity prediction task , It is proved that the method can be used for the transfer characterization , Besides , This characterization can also be used for alternative tasks of biological isomerism .
Method
This work compares and learns the molecular characterization from the microscopic images and chemical structure data of molecules , To obtain a highly transportable molecular encoder ( Pictured 1 Shown ).CLOOME Compared with traditional molecular encoder or manual extraction of molecular features , Its biggest innovation is that it can optimize the molecular characterization without the input of active molecular data or artificial prior knowledge .
Training data from N Microscopic images of disturbed molecular cells and molecular chemical structure composition : { ( x 1 , z 1 ) , . . . ( x n , z N ) } \{(x_1,z_1),...(x_n, z_N)\} { (x1,z1),...(xn,zN)}. Suppose an adaptive image encoder h x ( . ) h^x(.) hx(.) And adaptive structure encoder h z ( . ) h^z(.) hz(.) Images and chemical structures can be mapped to e m b e d d i n g x n = h x ( x n ) embedding x_n=h^x(x_n) embeddingxn=hx(xn) and z n = h z ( z n ) z_n=h^z(z_n) zn=hz(zn). Pictured 1(a), To stack microscopic images embeddings( That is, the features encoded by the picture encoder ) Write it down as X = ( x 1 , . . . x N ) X=(x_1,...x_N) X=(x1,...xN), Through the structure encoder embeddings Write it down as z = { z 1 … , z N } z=\{z_1…,z_N\} z={ z1…,zN}. The goal of contrastive learning is to improve the similarity of matching pairs , Reduce the similarity of mismatched pairs . This goal is usually achieved by minimizing InfoNCE Loss is achieved by maximizing embedded mutual information :
L i n f o N C B = − 1 N ∑ i = 1 N I n e x p ( τ − 1 x i T z i ) ∑ j = 1 N e x p ( τ − 1 x i T z j ) − 1 N ∑ i = 1 N l n e x p ( τ − 1 x i T z i ) ∑ j = 1 N e x p ( τ − 1 x j T z i ) L_{infoNCB}=-\frac{1}{N}\sum_{i=1}^{N}{In \frac{exp(\tau^{-1}x^T_iz_i)}{\sum^N_{j=1}exp(\tau^{-1}x^T_iz_j)} -\frac{1}{N}\sum_{i=1}^{N}{ln \frac{exp(\tau^{-1}x^T_iz_i)}{\sum^N_{j=1}exp(\tau^{-1}x_j^Tz_i)}}} LinfoNCB=−N1i=1∑NIn∑j=1Nexp(τ−1xiTzj)exp(τ−1xiTzi)−N1i=1∑Nln∑j=1Nexp(τ−1xjTzi)exp(τ−1xiTzi)
But with this InfoLoss It is easy to over present some features , Other features are ignored . Therefore, this work is based on CLOOB To optimize the contrastive learning .
CLOOB Method . First, embed from the stored image U U U And structural Embeddedness V V V Retrieval image embedding and structure embedding in , U x i U_{x_i} Uxi, U z i U_{z_i} Uzi; Represent image embedding and structure embedding respectively , And CLOOB similar , utilize modern Hopfield Search through the network :
U x i = U s o f t m a x ( β U T x i ) V x i = V s o f t m a x ( β V T x i ) U z i = U s o f t m a x ( β U T z i ) V z i = V s o f t m a x ( β V T z i ) U_{x_i} = U softmax(\beta U^Tx_i) \\ V_{x_i}=V softmax(\beta V^Tx_i) \\ U_{z_i}=U softmax(\beta U^Tz_i) \\ V_{z_i}=V softmax(\beta V^Tz_i) \\ Uxi=Usoftmax(βUTxi)Vxi=Vsoftmax(βVTxi)Uzi=Usoftmax(βUTzi)Vzi=Vsoftmax(βVTzi)
then , take InfoLOOB Loss as objective function :
There are some differences between microscope image and natural image , For example, coloring will affect the number of image channels , All experiments in this paper adopt 5 Of input channels ResNet-50 As an encoder , And reduce the microscope image to 320*320.
Molecular structure encoder CLOOME Use descriptor based fully connected networks . Besides , Graph neural network with proper pooling operation 、 Message passing neural network or sequence based neural network can be used as structural encoder .
result

chart 2. Retrieve task result examples . Given a micrograph ,CLOOME The molecular structure corresponding to the micrograph can be retrieved from several molecular structures ( The blue box in the figure shows the matched molecular structure ).CLOOME It can be used to extract molecules that can produce similar biological effects on treated cells , Bio isomers .
边栏推荐
- Ionic Cordova command line
- MVP framework for personal summary
- Introduction to applet Basics (dark horse learning notes)
- MCU: EEPROM multi byte read / write operation sequence
- Small program imitating Taobao Jiugong grid sliding effect
- Small program input element moving up
- Explanation of line segment tree
- Redis
- XOR prefix and +map maintenance
- 第007天:go语言字符串
猜你喜欢

Filter and listener

Unity Shader 学习 004-Shader 调试 平台差异性 第三方调试工具

重读经典:《End-to-End Object Detection with Transformers》

R: Employee turnover forecast practice

一款开源的Markdown转富文本编辑器的实现原理剖析

Use the visual studio code terminal to execute the command, and the prompt "because running scripts is prohibited on this system" will give an error

一致性哈希的简单认识

2022 ICLR | CONTRASTIVE LEARNING OF IMAGE- AND STRUCTURE BASED REPRESENTATIONS IN DRUG DISCOVERY

Single chip microcomputer: MODBUS multi computer communication program design

CTFSHOW SQL注入篇(211-230)
随机推荐
【Flutter 问题系列第 67 篇】在 Flutter 中使用 Get 插件在 Dialog 弹窗中不能二次跳转路由问题的解决方案
个人总结的MVP框架
Solve the problem of running server nodemon reporting errors
SEO specification
Idea Download
Explanation of line segment tree
This Sedata uses multiple methods to dynamically modify objects and values in arrays. Object calculation properties
Collection of wrong questions in soft test -- morning questions in the first half of 2010
SCM: introduction and operation of EEPROM
Call C function in Lua
Zoom and move the H5 part of the mobile end
Lenovo notebook computer uses the insert key. When the mouse becomes a small square, how to solve it
ET框架-22 创建ServerInfo实体及事件
php安全开发15用户密码修改模块
剑指 Offer 11. 旋转数组的最小数字-二分查找
SQL advanced challenge (1 - 5)
基于DE2-115平台的VGA显示
是“凯撒密码”呀。(*‘▽‘*)*
Answer private message @ Tiantian Wx //2022-6-12 C language 51 single chip microcomputer led analog traffic light
Manage PC startup items