当前位置:网站首页>Deep embedding and alignment of Google | protein sequences
Deep embedding and alignment of Google | protein sequences
2022-07-03 05:14:00 【Zhiyuan community】
Once trained ,DEDAL A gap and replacement scoring matrix calculated specifically for each new pair of sequences will be generated . Besides , Gaps and replacement scores are background : For each pair of positions , They depend on the complete sequence to be aligned . Then use a standard SW The algorithm uses these parameters to calculate the best arrangement . This article shows that ,DEDAL It can be effectively trained on modern hardware with accelerators . Once the training is done ,DEDAL With the standard SW comparison , The alignment quality predicted for remote homologues is improved 2-3 times , And it produces an alignment score that can detect remote homologues more accurately .
The figure above shows the data from Pfam-A Examples of sequence alignment of two protein domains of seeds .
a. Respectively from the Pfam-A Seed database ( The second line )、DEDAL forecast ( The third line ) Harmony PFASUM70 Alternative matrix prediction ( In the fourth row ) The comparison . This article shows Pfam-A Seed and DEDAL All residues in the two sequences aligned , But it didn't show PFASUM Aligned upstream and downstream misaligned residues in the sequence . The residues highlighted in green correspond to conservative residues that are correctly aligned , The residues shown in red correspond to predicted alignment and Pfam-A Differences between seed alignments .
b. come from PFASUM Substitution scores between all residue pairs of the substitution matrix .
c. from DEDAL Predicted SW Parameters .

In terms of Technology , This article explores two ways to create a distinguishable SW Align modules , Need to be in " Learn alignment " Training in the task DEDAL Parameters of , Use smoothing or perturbation techniques ; This paper finds that there is no obvious difference in performance between the two , And in the final DEDAL The disturbance based method is implemented in the model . About training DEDAL The permutation of , This paper finds that , When this article hopes DEDAL When accurate local arrangement can be predicted , Use Pfam Expand the domain instead of Pfam Domains are beneficial . Pre training in masking language modeling tasks DEDAL when , The sequences related to families outside the distribution are separated from " Protein universe " Exclude from , This leads to a slight decrease in the performance of remote homologues , Although the performance gap relative to the baseline is not obvious .
About the strategy of end-to-end joint training converter and parameter , This paper finds that this is indeed significantly better than the more classic two-step strategy , That is, first train the converter encoder in the shielded language modeling task , Then fix the converter at " Learn alignment " Training parameter device in task . This shows that , A general language model , Such as ESM, It's not enough. , At least fine tune , To achieve the best performance of alignment .
The above figure shows the application of learning embedding in downstream tasks . This paper evaluates the benefits of context sensitive embedding by simply training a model , In this model , The replacement cost is limited to depend only on the amino acids to be aligned ; It's not hard to see. , This paper observes that the performance of this model has a great decline , Achieved and " aim " The performance of the best replacement matrix in .
边栏推荐
- Go language interface learning notes
- Review the configuration of vscode to develop golang
- [research materials] 2022q1 game preferred casual game distribution circular - Download attached
- Yolov5 model construction source code details | CSDN creation punch in
- [clock 223] [binary tree] [leetcode high frequency]: 102 Sequence traversal of binary tree
- 50 practical applications of R language (36) - data visualization from basic to advanced
- Based on RFC 3986 (unified resource descriptor (URI): general syntax)
- Go practice -- use JWT (JSON web token) in golang
- (subplots usage) Matplotlib how to draw multiple subgraphs (axis field)
- Distinguish between releases and snapshots in nexus private library
猜你喜欢
![[batch dos-cmd command - summary and summary] - CMD window setting and operation command - close CMD window and exit CMD environment (exit, exit /b, goto: EOF)](/img/ce/d6f4fb30727e7436b6443537429ad4.png)
[batch dos-cmd command - summary and summary] - CMD window setting and operation command - close CMD window and exit CMD environment (exit, exit /b, goto: EOF)

Go practice - gorilla / handlers used by gorilla web Toolkit

Basic use of Metasploit penetration testing framework

微服务常见面试题

es7创建索引容易犯的错误

Yolov5 network structure + code + application details | CSDN creation punch in

"Hands on deep learning" pytorch edition Chapter II exercise

Go practice -- gorilla / websocket used by gorilla web Toolkit

(perfect solution) how to set the position of Matplotlib legend freely

Pan details of deep learning
随机推荐
Yolov5 input (I) -- mosaic data enhancement | CSDN creative punch in
1107 social clusters (30 points)
1110 complete binary tree (25 points)
[backtrader source code analysis 4] use Python to rewrite the first function of backtrader: time2num, which improves the efficiency by 2.2 times
Source insight garbled code solution
Common interview questions of microservice
Chapter II program design of circular structure
Notes | numpy-09 Broadcast
Online VR model display - 3D visual display solution
Gbase8s composite index (I)
动态规划——相关概念,(数塔问题)
Webapidom get page elements
Ueditor, FCKeditor, kindeditor editor vulnerability
音频焦点系列:手写一个demo理解音频焦点与AudioMananger
Dynamic programming - related concepts, (tower problem)
Go practice - gorilla / handlers used by gorilla web Toolkit
The programmer resigned and was sentenced to 10 months for deleting the code. JD came home and said that it took 30000 to restore the database. Netizen: This is really a revenge
Promise
C language program ideas and several commonly used filters
[backtrader source code analysis 5] rewrite several time number conversion functions in utils with Python

