当前位置:网站首页>KDD 2021 | MoCl: comparative learning of molecular graphs using multi-level domain knowledge
KDD 2021 | MoCl: comparative learning of molecular graphs using multi-level domain knowledge
2022-06-10 17:12:00 【DrugAI】
compile | Tao Wen reviewing | Yang Huidan
This article is introduced by Michigan State University and Agios Pharmaceutical company cooperation published on KDD 2021 Research work on . The author studies graph contrast learning in biomedical field , Put forward a name called MoCL New framework for , It uses local and global level domain knowledge to assist representation learning . Local level domain knowledge guides the enhancement process , Changes can be introduced without changing the semantics of the graph . Global knowledge encodes the similar information between graphs in the whole data set , Help to learn more semantically rich representations . The author of this paper has made an analysis of MoCL An assessment was made , It turns out that MoCL State of the art performance .
1
brief introduction
Figure neural network (GNN) It has been proved to have the most advanced performance on graph related tasks . Recently, it is often used in biomedical field , Solve problems related to drugs . However , Like most deep learning networks , It requires a lot of tagged data for training , In the real world, there are usually a limited number of tags for specific tasks , So about GNN The pre training program of has been actively explored recently . However , Different from image , Comparative learning has its unique challenges . First , The structural information and semantics of graphs are in different fields ( For example, social networks and molecular graphs ) There are significant differences between them , Therefore, it is difficult to design a general enhancement scheme for all scenarios . second , At present, most graph contrast learning frameworks ignore the global structure of the entire data , For example, two graphs with similar structures should be closer in the embedded space . Third , Comparison scheme is not unique , Comparisons can occur at nodes - chart , node - node , chart - Between the pictures .
In addition to these unique graph challenges , There are still some unsolved problems in Contrastive learning . for example , It is difficult to estimate mutual information accurately in high dimensional situations . The relationship between mutual information maximization and contrastive learning is not clear .
therefore , The author's goal is to address these challenges in the biomedical field . The author's hypothesis is that , By injecting domain knowledge into enhancement and comparison schemes , You can learn to express better . The author suggests using local and global domain knowledge to assist the comparative learning of molecular graphs . The author proposes a new enhancement scheme , It is called substructure substitution , The effective substructure in the molecule is replaced by a bioelectronic isosteric body , The bioelectronic isosteric body introduces changes without changing the molecular properties . Replacement rules come from domain resources , The author regards it as local level domain knowledge . Global hierarchical domain knowledge encodes the global similarity between graphs . The author suggests using this information to learn more abundant representations through double contrast goals . This paper is the first attempt to use domain knowledge to assist comparative learning , The author's contributions are as follows :
- In this paper, a molecular graph enhancement scheme based on local level domain knowledge is proposed , Make the semantics of the graph unchanged in the process of enhancement .
- Using the similarity information between molecular graphs , By adding a global comparison loss , Encode the global structure of the data into the graph representation .
- It provides a theoretical basis for learning objectives and measuring the triple loss in learning , It shows the effectiveness of the whole framework .
- On various molecular datasets MoCL To assess the , And prove that it is superior to the most advanced method .
2
Method
2.1 Comparative learning framework
chart 1 by MoCL The overall framework of . First , Generate two enhanced views from local domain knowledge . then , They are the same as the original view ( Blue ) Type... Together GNN Encoder and projector head . Local hierarchical contrast maximizes mutual information between two enhanced views . The global level comparison maximizes the mutual information between two similar graphs , The similar information comes from the domain knowledge at the global level .
chart 1 MoCL The overall framework of
2.2 Local level domain knowledge
Most of the existing methods may change the semantics of molecular graphs in the process of enhancement ( chart 2a,b,c,d). In general enhancements , Only attribute masking ( chart 2d) Does not violate biological assumptions , Because it doesn't change the molecules , It only masks some atoms and edge properties .
Therefore, the author injects domain knowledge to assist the enhancement process . The author proposes a substructure substitution , The effective substructure in the molecule is replaced by a bioelectronic isosteric body , The bioelectronic isosteric body produces a new molecule with physical or chemical properties similar to the original molecule ( chart 2e). The author collected from the domain resources 218 Bar rule , Each rule consists of a source substructure and a target substructure , And increased. 12 An additional rule is to subtract and add carbon groups from a molecule . therefore MoCL contain 230 Bar rule , Used to generate molecular variants with similar properties .
chart 2 Enhanced comparison
2.3 Global level domain knowledge
Maximizing the mutual information between corresponding views can learn transformation invariant representation , However, it may ignore the global semantics of the data . for example , For some graphs with similar graph structure or semantics , It should be closer to... In the embedded space . For molecular graphs , Such information can be obtained from multiple sources . For general graph structure , Extended connectivity fingerprints (ECFP) Encode the molecular substructure , It is widely used to calculate the structural similarity between molecular graphs . Author use ECFP Calculate the similarity between molecular graphs , Two strategies are proposed to incorporate global semantics into the learning framework , One strategy is to use it as direct supervision , The second strategy is to use a comparison target , In this goal, two similar graphs have higher mutual information .
3
experiment
3.1 Local level domain knowledge
chart 3 Displayed in linear protocol Results of different enhancement combinations for all data sets . Each cell represents... From scratch GNN Compared with the model trained with different enhancement combination methods in linear protocol Performance improvement under . Blue represents a negative value , Red for positive .MoCL-DK The resulting representation plus the prediction accuracy produced by the linear classifier is consistent with GNN effect (bace、bbbp、sider) Quite a , Even better (clintox,mutag). You can see that it contains MoCL-DK The row and column values of are usually higher , therefore MoCL-DK Combining with other enhancement methods almost always yields better results . Attribute masking and MoCL-DK It is generally effective in all scenarios , Combining them usually results in better performance . This verifies the author's previous hypothesis , namely MoCL-DK And attribute masking does not violate biological assumptions , So it is better than other enhancement effects .
chart 3 linear evaluation protocol Enhanced combination under
surface 1 It shows that they are in linear protocol and semi-supervised protocol The results of the experiment in this paper are as follows . Compared with other methods using data enhancement and comparative learning ,MoCL The performance of is the best on most datasets .
surface 1 Average of various methods AUC
The enhancement proposed in this paper MoCL-DK Can be applied many times , Generate more complex views . The author tried a series of different intensities , Strength refers to the number of enhancements ( For example, replace again after replacement , It is enhanced twice ). chart 4 The effects of different intensities are compared . For most data sets , As the number of enhancements increases , Its performance first rises and then declines ,MoCL-DK3 Better results are usually achieved .
chart 4 MoCL-Local The average of different intensities AUC
3.2 Global level domain knowledge
chart 5 It shows the performance improvement of different enhancement methods after adding global domain knowledge . You can see , Global information usually improves the performance of all enhancement methods . Compared with other enhancement schemes , The proposed domain enhancements (MoCL-DK1 and MoCL-DK3) The promotion is much higher .
chart 5 After adding global domain knowledge, the performance of different enhancement methods is improved
3.3 Sensitivity analysis
chart 6 The performance surfaces of the proposed method under different combinations of superparameters are shown . You can see , For global losses , Use a relatively small neighborhood size ( Not too small ) And a larger weight ( Not too big ) You can get the best results .
chart 6 Average under different combinations of super parameters AUC
4
summary
In this work , The author uses multi-level domain knowledge to assist the comparative representation learning of molecular graphs . Local domain knowledge supports new enhancement schemes , Global domain knowledge integrates the global structure of data into the learning process . The author proves that these two kinds of knowledge can improve the quality of learning representation .
Reference material
Thesis link
https://dl.acm.org/doi/10.1145/3447548.3467186
边栏推荐
猜你喜欢

Rethinking atlas revolution for semantic image segmentation (deeplobv3) personal summary

Fiddler设置断点

Comply with medical reform and actively layout -- insight into the development of high-value medical consumables under the background of centralized purchase 2022

Hidden Markov model and its training (1)

Build a leading privacy computing scheme impulse online data interconnection platform and obtain Kunpeng validated certification
![[web security self-study] section 1 building of basic Web Environment](/img/f8/f2d13c2879cdbc03ad261c6569bc1d.jpg)
[web security self-study] section 1 building of basic Web Environment

What is the 100th trillion digit of PI decimal point? Google gives the answer with Debian server

What open source tools are actually used in the black cool monitoring interface?

Pytorch Foundation (I) -- anaconda and pytorch installation

Postman switching topics
随机推荐
Fabric.js 居中元素 ️
Nat. Rev. Drug Discov. | AI在小分子药物发现中的应用:一个即将到来的浪潮?
Only by truly finding the right way to land and practice the meta universe can we ensure the steady and long-term development
From web2 to Web3, ideology also needs to change
You have a ml.net quick reference manual to check!
看先进科技如何改变人类生活
Fiddler模拟低速网络环境
fail-fast和fail-safe
如何用Pygame制作简单的贪吃蛇游戏
Fabric.js 元素被选中时保持原有层级
Comply with medical reform and actively layout -- insight into the development of high-value medical consumables under the background of centralized purchase 2022
How does Dao achieve decentralized governance?
Basic settings of pycharm [detailed explanation with pictures and words]
IDEA的Swing可视化插件JFormDesigner
SOA architecture / test phase interface description language transformation scheme
How to own your own blog website from scratch [Huawei cloud to jianzhiyuan]
Pytorch Foundation (I) -- anaconda and pytorch installation
Rethinking atlas revolution for semantic image segmentation (deeplobv3) personal summary
【Proteus仿真】DS18B20+报警温度可调+LM016显示
postman参数化