当前位置:网站首页>Deep embedding and alignment of Google | protein sequences
Deep embedding and alignment of Google | protein sequences
2022-07-03 05:14:00 【Zhiyuan community】
Once trained ,DEDAL A gap and replacement scoring matrix calculated specifically for each new pair of sequences will be generated . Besides , Gaps and replacement scores are background : For each pair of positions , They depend on the complete sequence to be aligned . Then use a standard SW The algorithm uses these parameters to calculate the best arrangement . This article shows that ,DEDAL It can be effectively trained on modern hardware with accelerators . Once the training is done ,DEDAL With the standard SW comparison , The alignment quality predicted for remote homologues is improved 2-3 times , And it produces an alignment score that can detect remote homologues more accurately .
The figure above shows the data from Pfam-A Examples of sequence alignment of two protein domains of seeds .
a. Respectively from the Pfam-A Seed database ( The second line )、DEDAL forecast ( The third line ) Harmony PFASUM70 Alternative matrix prediction ( In the fourth row ) The comparison . This article shows Pfam-A Seed and DEDAL All residues in the two sequences aligned , But it didn't show PFASUM Aligned upstream and downstream misaligned residues in the sequence . The residues highlighted in green correspond to conservative residues that are correctly aligned , The residues shown in red correspond to predicted alignment and Pfam-A Differences between seed alignments .
b. come from PFASUM Substitution scores between all residue pairs of the substitution matrix .
c. from DEDAL Predicted SW Parameters .

In terms of Technology , This article explores two ways to create a distinguishable SW Align modules , Need to be in " Learn alignment " Training in the task DEDAL Parameters of , Use smoothing or perturbation techniques ; This paper finds that there is no obvious difference in performance between the two , And in the final DEDAL The disturbance based method is implemented in the model . About training DEDAL The permutation of , This paper finds that , When this article hopes DEDAL When accurate local arrangement can be predicted , Use Pfam Expand the domain instead of Pfam Domains are beneficial . Pre training in masking language modeling tasks DEDAL when , The sequences related to families outside the distribution are separated from " Protein universe " Exclude from , This leads to a slight decrease in the performance of remote homologues , Although the performance gap relative to the baseline is not obvious .
About the strategy of end-to-end joint training converter and parameter , This paper finds that this is indeed significantly better than the more classic two-step strategy , That is, first train the converter encoder in the shielded language modeling task , Then fix the converter at " Learn alignment " Training parameter device in task . This shows that , A general language model , Such as ESM, It's not enough. , At least fine tune , To achieve the best performance of alignment .
The above figure shows the application of learning embedding in downstream tasks . This paper evaluates the benefits of context sensitive embedding by simply training a model , In this model , The replacement cost is limited to depend only on the amino acids to be aligned ; It's not hard to see. , This paper observes that the performance of this model has a great decline , Achieved and " aim " The performance of the best replacement matrix in .
边栏推荐
- Redis 击穿穿透雪崩
- Go practice -- design patterns in golang's singleton
- Interface frequency limit access
- 1106 lowest price in supply chain (25 points)
- Actual combat 8051 drives 8-bit nixie tube
- Introduction to deep learning - definition Introduction (I)
- Go language interface learning notes Continued
- appium1.22.x 版本後的 appium inspector 需單獨安裝
- [set theory] relation properties (reflexivity | reflexivity theorem | reflexivity | reflexivity theorem | example)
- The consumption of Internet of things users is only 76 cents, and the price has become the biggest obstacle to the promotion of 5g industrial interconnection
猜你喜欢

【实战项目】自主web服务器
![[set theory] relational power operation (relational power operation | examples of relational power operation | properties of relational power operation)](/img/8b/c10423ee95200a0d94f9fb9dde76eb.jpg)
[set theory] relational power operation (relational power operation | examples of relational power operation | properties of relational power operation)

Gbase8s unique index and non unique index

Oracle SQL table data loss

"Hands on deep learning" pytorch edition Chapter II exercise

Without 50W bride price, my girlfriend was forcibly dragged away. What should I do

2022-02-11 daily clock in: problem fine brush
![[set theory] relation properties (reflexivity | reflexivity theorem | reflexivity | reflexivity theorem | example)](/img/2a/362f3b0491f721d89336d4f468c9dd.jpg)
[set theory] relation properties (reflexivity | reflexivity theorem | reflexivity | reflexivity theorem | example)

Kept hot standby and haproxy

Actual combat 8051 drives 8-bit nixie tube
随机推荐
Wechat applet distance and map
[set theory] relationship properties (common relationship properties | relationship properties examples | relationship operation properties)
Compile and decompile GCC common instructions
Notes | numpy-09 Broadcast
[Yu Yue education] basic reference materials of interchangeability and measurement technology of Zhongyuan Institute of Technology
[research materials] 2022q1 game preferred casual game distribution circular - Download attached
XML配置文件
Burp suite plug-in based on actual combat uses tips
2022-02-12 daily clock in: problem fine brush
JS string and array methods
MySQL master-slave configuration
[set theory] relationship properties (symmetry | symmetry examples | symmetry related theorems | antisymmetry | antisymmetry examples | antisymmetry theorems)
Retirement plan fails, 64 year old programmer starts work again
Yolov5 input (I) -- mosaic data enhancement | CSDN creative punch in
[clock 223] [binary tree] [leetcode high frequency]: 102 Sequence traversal of binary tree
1115 counting nodes in a BST (30 points)
BTC-密码学原理
Automatic voltage rise and fall 5-40v multi string super capacitor charging chip and solution
Redis 过期淘汰机制
Kept hot standby and haproxy

