当前位置:网站首页>KDD 2021 | MoCl: comparative learning of molecular graphs using multi-level domain knowledge
KDD 2021 | MoCl: comparative learning of molecular graphs using multi-level domain knowledge
2022-06-10 17:12:00 【DrugAI】
compile | Tao Wen reviewing | Yang Huidan
This article is introduced by Michigan State University and Agios Pharmaceutical company cooperation published on KDD 2021 Research work on . The author studies graph contrast learning in biomedical field , Put forward a name called MoCL New framework for , It uses local and global level domain knowledge to assist representation learning . Local level domain knowledge guides the enhancement process , Changes can be introduced without changing the semantics of the graph . Global knowledge encodes the similar information between graphs in the whole data set , Help to learn more semantically rich representations . The author of this paper has made an analysis of MoCL An assessment was made , It turns out that MoCL State of the art performance .
1
brief introduction
Figure neural network (GNN) It has been proved to have the most advanced performance on graph related tasks . Recently, it is often used in biomedical field , Solve problems related to drugs . However , Like most deep learning networks , It requires a lot of tagged data for training , In the real world, there are usually a limited number of tags for specific tasks , So about GNN The pre training program of has been actively explored recently . However , Different from image , Comparative learning has its unique challenges . First , The structural information and semantics of graphs are in different fields ( For example, social networks and molecular graphs ) There are significant differences between them , Therefore, it is difficult to design a general enhancement scheme for all scenarios . second , At present, most graph contrast learning frameworks ignore the global structure of the entire data , For example, two graphs with similar structures should be closer in the embedded space . Third , Comparison scheme is not unique , Comparisons can occur at nodes - chart , node - node , chart - Between the pictures .
In addition to these unique graph challenges , There are still some unsolved problems in Contrastive learning . for example , It is difficult to estimate mutual information accurately in high dimensional situations . The relationship between mutual information maximization and contrastive learning is not clear .
therefore , The author's goal is to address these challenges in the biomedical field . The author's hypothesis is that , By injecting domain knowledge into enhancement and comparison schemes , You can learn to express better . The author suggests using local and global domain knowledge to assist the comparative learning of molecular graphs . The author proposes a new enhancement scheme , It is called substructure substitution , The effective substructure in the molecule is replaced by a bioelectronic isosteric body , The bioelectronic isosteric body introduces changes without changing the molecular properties . Replacement rules come from domain resources , The author regards it as local level domain knowledge . Global hierarchical domain knowledge encodes the global similarity between graphs . The author suggests using this information to learn more abundant representations through double contrast goals . This paper is the first attempt to use domain knowledge to assist comparative learning , The author's contributions are as follows :
- In this paper, a molecular graph enhancement scheme based on local level domain knowledge is proposed , Make the semantics of the graph unchanged in the process of enhancement .
- Using the similarity information between molecular graphs , By adding a global comparison loss , Encode the global structure of the data into the graph representation .
- It provides a theoretical basis for learning objectives and measuring the triple loss in learning , It shows the effectiveness of the whole framework .
- On various molecular datasets MoCL To assess the , And prove that it is superior to the most advanced method .
2
Method
2.1 Comparative learning framework
chart 1 by MoCL The overall framework of . First , Generate two enhanced views from local domain knowledge . then , They are the same as the original view ( Blue ) Type... Together GNN Encoder and projector head . Local hierarchical contrast maximizes mutual information between two enhanced views . The global level comparison maximizes the mutual information between two similar graphs , The similar information comes from the domain knowledge at the global level .
chart 1 MoCL The overall framework of
2.2 Local level domain knowledge
Most of the existing methods may change the semantics of molecular graphs in the process of enhancement ( chart 2a,b,c,d). In general enhancements , Only attribute masking ( chart 2d) Does not violate biological assumptions , Because it doesn't change the molecules , It only masks some atoms and edge properties .
Therefore, the author injects domain knowledge to assist the enhancement process . The author proposes a substructure substitution , The effective substructure in the molecule is replaced by a bioelectronic isosteric body , The bioelectronic isosteric body produces a new molecule with physical or chemical properties similar to the original molecule ( chart 2e). The author collected from the domain resources 218 Bar rule , Each rule consists of a source substructure and a target substructure , And increased. 12 An additional rule is to subtract and add carbon groups from a molecule . therefore MoCL contain 230 Bar rule , Used to generate molecular variants with similar properties .
chart 2 Enhanced comparison
2.3 Global level domain knowledge
Maximizing the mutual information between corresponding views can learn transformation invariant representation , However, it may ignore the global semantics of the data . for example , For some graphs with similar graph structure or semantics , It should be closer to... In the embedded space . For molecular graphs , Such information can be obtained from multiple sources . For general graph structure , Extended connectivity fingerprints (ECFP) Encode the molecular substructure , It is widely used to calculate the structural similarity between molecular graphs . Author use ECFP Calculate the similarity between molecular graphs , Two strategies are proposed to incorporate global semantics into the learning framework , One strategy is to use it as direct supervision , The second strategy is to use a comparison target , In this goal, two similar graphs have higher mutual information .
3
experiment
3.1 Local level domain knowledge
chart 3 Displayed in linear protocol Results of different enhancement combinations for all data sets . Each cell represents... From scratch GNN Compared with the model trained with different enhancement combination methods in linear protocol Performance improvement under . Blue represents a negative value , Red for positive .MoCL-DK The resulting representation plus the prediction accuracy produced by the linear classifier is consistent with GNN effect (bace、bbbp、sider) Quite a , Even better (clintox,mutag). You can see that it contains MoCL-DK The row and column values of are usually higher , therefore MoCL-DK Combining with other enhancement methods almost always yields better results . Attribute masking and MoCL-DK It is generally effective in all scenarios , Combining them usually results in better performance . This verifies the author's previous hypothesis , namely MoCL-DK And attribute masking does not violate biological assumptions , So it is better than other enhancement effects .
chart 3 linear evaluation protocol Enhanced combination under
surface 1 It shows that they are in linear protocol and semi-supervised protocol The results of the experiment in this paper are as follows . Compared with other methods using data enhancement and comparative learning ,MoCL The performance of is the best on most datasets .
surface 1 Average of various methods AUC
The enhancement proposed in this paper MoCL-DK Can be applied many times , Generate more complex views . The author tried a series of different intensities , Strength refers to the number of enhancements ( For example, replace again after replacement , It is enhanced twice ). chart 4 The effects of different intensities are compared . For most data sets , As the number of enhancements increases , Its performance first rises and then declines ,MoCL-DK3 Better results are usually achieved .
chart 4 MoCL-Local The average of different intensities AUC
3.2 Global level domain knowledge
chart 5 It shows the performance improvement of different enhancement methods after adding global domain knowledge . You can see , Global information usually improves the performance of all enhancement methods . Compared with other enhancement schemes , The proposed domain enhancements (MoCL-DK1 and MoCL-DK3) The promotion is much higher .
chart 5 After adding global domain knowledge, the performance of different enhancement methods is improved
3.3 Sensitivity analysis
chart 6 The performance surfaces of the proposed method under different combinations of superparameters are shown . You can see , For global losses , Use a relatively small neighborhood size ( Not too small ) And a larger weight ( Not too big ) You can get the best results .
chart 6 Average under different combinations of super parameters AUC
4
summary
In this work , The author uses multi-level domain knowledge to assist the comparative representation learning of molecular graphs . Local domain knowledge supports new enhancement schemes , Global domain knowledge integrates the global structure of data into the learning process . The author proves that these two kinds of knowledge can improve the quality of learning representation .
Reference material
Thesis link
https://dl.acm.org/doi/10.1145/3447548.3467186
边栏推荐
- Thread interview related questions
- PyTorch基础(一)-- Anaconda 和 PyTorch安装
- Fabric.js 缩放画布
- Installing kubernetes using kuboardspray (v1.23.1)
- 复利最高的保险产品是什么?一年要交多少钱?
- Calling subroutines from other modules in VBA - call a subroutine from a different module in VBA
- [web security self-study] section 1 building of basic Web Environment
- Fiddler为测试做什么
- 看先进科技如何改变人类生活
- Webdypro layout control cannot be used_ SAP LIUMENG
猜你喜欢

接口测试学习笔记

Cannot locate a 64-bit Oracle Client library:The specified module could not be found.

运筹说 第64期丨动态规划奠基人——理查德·贝尔曼

Mm main tables and main fields_ SAP LIUMENG_

看先进科技如何改变人类生活

Pytorch Foundation (I) -- anaconda and pytorch installation

Fiddler模拟低速网络环境

PyTorch基础(一)-- Anaconda 和 PyTorch安装

Postman common assertions

Cannot locate a 64-bit Oracle Client library:The specified module could not be found.
随机推荐
消除业务代码中if....else的五种方式
Docker安装Redis镜像详细步骤(简单易懂,适合新手快速上手)
ADB is not an internal or external command, nor is it a runnable program or batch file
【BSP视频教程】BSP视频教程第17期:单片机bootloader专题,启动,跳转配置和调试下载的各种用法(2022-06-10)
智慧景區視頻監控 5G智慧燈杆網關組網綜合杆
Pytorch Foundation (I) -- anaconda and pytorch installation
SOA architecture / test phase interface description language transformation scheme
Fiddler set breakpoint
Simple file upload
线程面试相关问题
adb不是内部或外部命令,也不是可运行的程序或批处理文件
IDEA的Swing可视化插件JFormDesigner
How does Dao achieve decentralized governance?
VBA判断一个长字符串中是否包含另一个短字符串
“禁塑令”下,中宝新材深挖可降解塑料,港交所买单吗?
隐形马尔可夫模型及其训练(一)
再联合 冲量在线与飞腾完成合作伙伴认证,携手打造信创隐私计算生态圈
Can deleted wechat friends be recovered? How to retrieve wechat friends after they are accidentally deleted
ASP. Net core 6 framework unveiling example demonstration [12]: advanced usage of diagnostic trace
【Proteus仿真】DS18B20+报警温度可调+LM016显示