当前位置:网站首页>Deep embedding and alignment of Google | protein sequences
Deep embedding and alignment of Google | protein sequences
2022-07-03 05:14:00 【Zhiyuan community】
Once trained ,DEDAL A gap and replacement scoring matrix calculated specifically for each new pair of sequences will be generated . Besides , Gaps and replacement scores are background : For each pair of positions , They depend on the complete sequence to be aligned . Then use a standard SW The algorithm uses these parameters to calculate the best arrangement . This article shows that ,DEDAL It can be effectively trained on modern hardware with accelerators . Once the training is done ,DEDAL With the standard SW comparison , The alignment quality predicted for remote homologues is improved 2-3 times , And it produces an alignment score that can detect remote homologues more accurately .
The figure above shows the data from Pfam-A Examples of sequence alignment of two protein domains of seeds .
a. Respectively from the Pfam-A Seed database ( The second line )、DEDAL forecast ( The third line ) Harmony PFASUM70 Alternative matrix prediction ( In the fourth row ) The comparison . This article shows Pfam-A Seed and DEDAL All residues in the two sequences aligned , But it didn't show PFASUM Aligned upstream and downstream misaligned residues in the sequence . The residues highlighted in green correspond to conservative residues that are correctly aligned , The residues shown in red correspond to predicted alignment and Pfam-A Differences between seed alignments .
b. come from PFASUM Substitution scores between all residue pairs of the substitution matrix .
c. from DEDAL Predicted SW Parameters .

In terms of Technology , This article explores two ways to create a distinguishable SW Align modules , Need to be in " Learn alignment " Training in the task DEDAL Parameters of , Use smoothing or perturbation techniques ; This paper finds that there is no obvious difference in performance between the two , And in the final DEDAL The disturbance based method is implemented in the model . About training DEDAL The permutation of , This paper finds that , When this article hopes DEDAL When accurate local arrangement can be predicted , Use Pfam Expand the domain instead of Pfam Domains are beneficial . Pre training in masking language modeling tasks DEDAL when , The sequences related to families outside the distribution are separated from " Protein universe " Exclude from , This leads to a slight decrease in the performance of remote homologues , Although the performance gap relative to the baseline is not obvious .
About the strategy of end-to-end joint training converter and parameter , This paper finds that this is indeed significantly better than the more classic two-step strategy , That is, first train the converter encoder in the shielded language modeling task , Then fix the converter at " Learn alignment " Training parameter device in task . This shows that , A general language model , Such as ESM, It's not enough. , At least fine tune , To achieve the best performance of alignment .
The above figure shows the application of learning embedding in downstream tasks . This paper evaluates the benefits of context sensitive embedding by simply training a model , In this model , The replacement cost is limited to depend only on the amino acids to be aligned ; It's not hard to see. , This paper observes that the performance of this model has a great decline , Achieved and " aim " The performance of the best replacement matrix in .
边栏推荐
- [backtrader source code analysis 5] rewrite several time number conversion functions in utils with Python
- Problems encountered in fuzzy query of SQL statements
- Introduction to deep learning - definition Introduction (I)
- Online VR model display - 3D visual display solution
- Automatic voltage rise and fall 5-40v multi string super capacitor charging chip and solution
- Promise
- 1115 counting nodes in a BST (30 points)
- 1095 cars on campus (30 points)
- Without 50W bride price, my girlfriend was forcibly dragged away. What should I do
- 1118 birds in forest (25 points)
猜你喜欢

RT thread flow notes I startup, schedule, thread

Common interview questions of microservice

Yolov5 model construction source code details | CSDN creation punch in

Go practice -- design patterns in golang's singleton
![[set theory] relation properties (transitivity | transitivity examples | transitivity related theorems)](/img/c2/87358af6b2b2892a6eceb751b3b60c.jpg)
[set theory] relation properties (transitivity | transitivity examples | transitivity related theorems)

JS scope

leetcode406. Rebuild the queue based on height

Automatic voltage rise and fall 5-40v multi string super capacitor charging chip and solution

How to connect the network: Chapter 1 CSDN creation punch in

Use posture of sudo right raising vulnerability in actual combat (cve-2021-3156)
随机推荐
动态规划——相关概念,(数塔问题)
2022-02-12 daily clock in: problem fine brush
Botu uses peek and poke for IO mapping
1107 social clusters (30 points)
[Yu Yue education] basic reference materials of interchangeability and measurement technology of Zhongyuan Institute of Technology
Go practice -- gorilla / websocket used by gorilla web Toolkit
Covering Safari and edge, almost all mainstream browsers have realized webgl 2.0 support
Congratulations to musk and NADELLA on their election as academicians of the American Academy of engineering, and Zhang Hongjiang and Fang daining on their election as foreign academicians
Notes | numpy-11 Array operation
大学校园IP网络广播-厂家基于校园局域网的大学校园IP广播方案设计指南
Ueditor, FCKeditor, kindeditor editor vulnerability
联想R7000显卡的拆卸与安装
Redis 入門和數據類型講解
酒店公共广播背景音乐-基于互联网+的酒店IP网络广播系统设计
Coordinatorlayout appbarrayout recyclerview item exposure buried point misalignment analysis
The process of browser accessing the website
[backtrader source code analysis 5] rewrite several time number conversion functions in utils with Python
C language program ideas and several commonly used filters
Without 50W bride price, my girlfriend was forcibly dragged away. What should I do
Distinguish between releases and snapshots in nexus private library

