当前位置:网站首页>Deep embedding and alignment of Google | protein sequences
Deep embedding and alignment of Google | protein sequences
2022-07-03 05:14:00 【Zhiyuan community】
Once trained ,DEDAL A gap and replacement scoring matrix calculated specifically for each new pair of sequences will be generated . Besides , Gaps and replacement scores are background : For each pair of positions , They depend on the complete sequence to be aligned . Then use a standard SW The algorithm uses these parameters to calculate the best arrangement . This article shows that ,DEDAL It can be effectively trained on modern hardware with accelerators . Once the training is done ,DEDAL With the standard SW comparison , The alignment quality predicted for remote homologues is improved 2-3 times , And it produces an alignment score that can detect remote homologues more accurately .
The figure above shows the data from Pfam-A Examples of sequence alignment of two protein domains of seeds .
a. Respectively from the Pfam-A Seed database ( The second line )、DEDAL forecast ( The third line ) Harmony PFASUM70 Alternative matrix prediction ( In the fourth row ) The comparison . This article shows Pfam-A Seed and DEDAL All residues in the two sequences aligned , But it didn't show PFASUM Aligned upstream and downstream misaligned residues in the sequence . The residues highlighted in green correspond to conservative residues that are correctly aligned , The residues shown in red correspond to predicted alignment and Pfam-A Differences between seed alignments .
b. come from PFASUM Substitution scores between all residue pairs of the substitution matrix .
c. from DEDAL Predicted SW Parameters .
In terms of Technology , This article explores two ways to create a distinguishable SW Align modules , Need to be in " Learn alignment " Training in the task DEDAL Parameters of , Use smoothing or perturbation techniques ; This paper finds that there is no obvious difference in performance between the two , And in the final DEDAL The disturbance based method is implemented in the model . About training DEDAL The permutation of , This paper finds that , When this article hopes DEDAL When accurate local arrangement can be predicted , Use Pfam Expand the domain instead of Pfam Domains are beneficial . Pre training in masking language modeling tasks DEDAL when , The sequences related to families outside the distribution are separated from " Protein universe " Exclude from , This leads to a slight decrease in the performance of remote homologues , Although the performance gap relative to the baseline is not obvious .
About the strategy of end-to-end joint training converter and parameter , This paper finds that this is indeed significantly better than the more classic two-step strategy , That is, first train the converter encoder in the shielded language modeling task , Then fix the converter at " Learn alignment " Training parameter device in task . This shows that , A general language model , Such as ESM, It's not enough. , At least fine tune , To achieve the best performance of alignment .
The above figure shows the application of learning embedding in downstream tasks . This paper evaluates the benefits of context sensitive embedding by simply training a model , In this model , The replacement cost is limited to depend only on the amino acids to be aligned ; It's not hard to see. , This paper observes that the performance of this model has a great decline , Achieved and " aim " The performance of the best replacement matrix in .
边栏推荐
- study hard and make progress every day
- [clock 223] [binary tree] [leetcode high frequency]: 102 Sequence traversal of binary tree
- Redis breakdown penetration avalanche
- leetcode452. Detonate the balloon with the minimum number of arrows
- JQ style, element operation, effect, filtering method and transformation, event object
- Notes | numpy-10 Iterative array
- JS dynamic table creation
- 5-36v input automatic voltage rise and fall PD fast charging scheme drawing 30W low-cost chip
- Source insight garbled code solution
- Redis 击穿穿透雪崩
猜你喜欢
[research materials] 2022q1 game preferred casual game distribution circular - Download attached
Retirement plan fails, 64 year old programmer starts work again
【实战项目】自主web服务器
[research materials] 2021 China's game industry brand report - Download attached
[research materials] 2021 annual report on mergers and acquisitions in the property management industry - Download attached
How to connect the network: Chapter 1 CSDN creation punch in
[set theory] relation properties (reflexivity | reflexivity theorem | reflexivity | reflexivity theorem | example)
Source insight garbled code solution
Shuttle + alluxio accelerated memory shuffle take-off
es7创建索引容易犯的错误
随机推荐
Overview of basic knowledge of C language
cookie session jwt
JS function algorithm interview case
Go language interface learning notes Continued
Compile and decompile GCC common instructions
1107 social clusters (30 points)
The process of browser accessing the website
Introduction to deep learning - definition Introduction (I)
Interface frequency limit access
Maximum continuous sub segment sum (dynamic programming, recursive, recursive)
Yolov5 model construction source code details | CSDN creation punch in
谷歌 | 蛋白序列的深度嵌入和比对
Redis 过期淘汰机制
Yolov5 input (II) | CSDN creative punch in
Realize file download through the tag of < a > and customize the file name
Redis 入門和數據類型講解
Oracle SQL table data loss
Kept hot standby and haproxy
XML配置文件
[set theory] relation properties (transitivity | transitivity examples | transitivity related theorems)