当前位置:网站首页>Deep embedding and alignment of Google | protein sequences
Deep embedding and alignment of Google | protein sequences
2022-07-03 05:14:00 【Zhiyuan community】
Once trained ,DEDAL A gap and replacement scoring matrix calculated specifically for each new pair of sequences will be generated . Besides , Gaps and replacement scores are background : For each pair of positions , They depend on the complete sequence to be aligned . Then use a standard SW The algorithm uses these parameters to calculate the best arrangement . This article shows that ,DEDAL It can be effectively trained on modern hardware with accelerators . Once the training is done ,DEDAL With the standard SW comparison , The alignment quality predicted for remote homologues is improved 2-3 times , And it produces an alignment score that can detect remote homologues more accurately .
The figure above shows the data from Pfam-A Examples of sequence alignment of two protein domains of seeds .
a. Respectively from the Pfam-A Seed database ( The second line )、DEDAL forecast ( The third line ) Harmony PFASUM70 Alternative matrix prediction ( In the fourth row ) The comparison . This article shows Pfam-A Seed and DEDAL All residues in the two sequences aligned , But it didn't show PFASUM Aligned upstream and downstream misaligned residues in the sequence . The residues highlighted in green correspond to conservative residues that are correctly aligned , The residues shown in red correspond to predicted alignment and Pfam-A Differences between seed alignments .
b. come from PFASUM Substitution scores between all residue pairs of the substitution matrix .
c. from DEDAL Predicted SW Parameters .

In terms of Technology , This article explores two ways to create a distinguishable SW Align modules , Need to be in " Learn alignment " Training in the task DEDAL Parameters of , Use smoothing or perturbation techniques ; This paper finds that there is no obvious difference in performance between the two , And in the final DEDAL The disturbance based method is implemented in the model . About training DEDAL The permutation of , This paper finds that , When this article hopes DEDAL When accurate local arrangement can be predicted , Use Pfam Expand the domain instead of Pfam Domains are beneficial . Pre training in masking language modeling tasks DEDAL when , The sequences related to families outside the distribution are separated from " Protein universe " Exclude from , This leads to a slight decrease in the performance of remote homologues , Although the performance gap relative to the baseline is not obvious .
About the strategy of end-to-end joint training converter and parameter , This paper finds that this is indeed significantly better than the more classic two-step strategy , That is, first train the converter encoder in the shielded language modeling task , Then fix the converter at " Learn alignment " Training parameter device in task . This shows that , A general language model , Such as ESM, It's not enough. , At least fine tune , To achieve the best performance of alignment .
The above figure shows the application of learning embedding in downstream tasks . This paper evaluates the benefits of context sensitive embedding by simply training a model , In this model , The replacement cost is limited to depend only on the amino acids to be aligned ; It's not hard to see. , This paper observes that the performance of this model has a great decline , Achieved and " aim " The performance of the best replacement matrix in .
边栏推荐
- Detailed explanation of the output end (head) of yolov5 | CSDN creation punch in
- Go language interface learning notes
- Go practice -- use JWT (JSON web token) in golang
- XML Configuration File
- JS dynamic table creation
- Prepare for 2022 and welcome the "golden three silver four". The "summary of Android intermediate and advanced interview questions in 2022" is fresh, so that your big factory interview can go smoothly
- The principle is simple, but I don't know how to use it? Understand "contemporaneous group model" in one article
- Self introduction and objectives
- Automatic voltage rise and fall 5-40v multi string super capacitor charging chip and solution
- 1106 lowest price in supply chain (25 points)
猜你喜欢
![[basic grammar] Snake game written in C language](/img/cb/83631ef3ccd7047ca42d33dc49bf90.jpg)
[basic grammar] Snake game written in C language

Use posture of sudo right raising vulnerability in actual combat (cve-2021-3156)

Prepare for 2022 and welcome the "golden three silver four". The "summary of Android intermediate and advanced interview questions in 2022" is fresh, so that your big factory interview can go smoothly

Burp suite plug-in based on actual combat uses tips

RT thread flow notes I startup, schedule, thread

Web APIs exclusivity

音频焦点系列:手写一个demo理解音频焦点与AudioMananger

Shuttle + alluxio accelerated memory shuffle take-off

2022-02-11 daily clock in: problem fine brush

Go practice - gorilla / handlers used by gorilla web Toolkit
随机推荐
study hard and make progress every day
Botu uses peek and poke for IO mapping
[research materials] 2021 annual report on mergers and acquisitions in the property management industry - Download attached
Ueditor, FCKeditor, kindeditor editor vulnerability
1111 online map (30 points)
JS dynamic table creation
XML配置文件
appium1.22.x 版本后的 appium inspector 需单独安装
1114 family property (25 points)
Silent authorization login and registration of wechat applet
动态规划——相关概念,(数塔问题)
2022-02-12 daily clock in: problem fine brush
C language program ideas and several commonly used filters
Without 50W bride price, my girlfriend was forcibly dragged away. What should I do
(完美解决)matplotlib图例(legend)如何自由设置其位置
谷歌 | 蛋白序列的深度嵌入和比对
Redis Introduction et explication des types de données
Audio Focus Series: write a demo to understand audio focus and audiomananger
1094 the largest generation (25 points)
Notes | numpy-10 Iterative array

