当前位置:网站首页>穀歌 | 蛋白序列的深度嵌入和比對
穀歌 | 蛋白序列的深度嵌入和比對
2022-07-03 05:13:00 【智源社區】
一旦經過訓練,DEDAL就會產生專門為每一對新的序列計算的間隙和替換的評分矩陣。此外,間隙和替換的分數是有背景的:對於每一對比特置,它們取决於要對齊的完整序列。然後用一個標准的SW算法使用這些參數計算出最佳的排列。本文錶明,DEDAL可以在帶有加速器的現代硬件上有效地訓練。一旦訓練完成,DEDAL與標准SW相比,為遠程同源物預測的對准質量提高了2-3倍,並產生了一個能更准確檢測遠程同源物的對准分數。
上圖展示了來自Pfam-A種子的兩個蛋白域序列比對例子。
a. 分別從Pfam-A種子數據庫(第二行)、DEDAL預測(第三行)和用PFASUM70替代矩陣預測(第四行)進行的比對。本文顯示了Pfam-A種子和DEDAL對准的兩個序列中的所有殘基,但沒有顯示PFASUM的序列中對准的上遊和下遊的未對准殘基。綠色突出顯示的殘基對應於正確對齊的保守殘基,而紅色顯示的殘基對應於預測對齊和Pfam-A種子對齊之間的差异。
b. 來自PFASUM替代矩陣的所有殘基對之間的替代分數。
c. 由DEDAL預測的SW參數。

在技術方面,本文探索了兩種方法來創建一個可區分的SW對齊模塊,需要在 "學習對齊 "任務中訓練DEDAL的參數,使用平滑技術或擾動技術;本文發現兩者在性能上沒有明顯區別,並在最終的DEDAL模型中實施了基於擾動的方法。關於用於訓練DEDAL的排列組合,本文發現,當本文希望DEDAL能够預測准確的局部排列時,使用Pfam擴展域而不是Pfam域是有益的。在遮蔽語言建模任務中預訓練DEDAL時,將與分布外家族相關的序列從 "蛋白質宇宙 "中排除,導致遠程同源物的性能略有下降,盡管相對於與基線的性能差距來說並不明顯。
關於端到端聯合訓練變換器和參數器的策略,本文發現這確實明顯優於更經典的兩步策略,即首先在屏蔽的語言建模任務中訓練變換器編碼器,然後通過保持變換器固定在 "學習對齊 "任務中訓練參數器。這錶明,一個通用的語言模型,如ESM,是不够的,至少應該進行微調,以達到對齊的最佳性能。
上圖展示了學習的嵌入在下遊任務的應用情况。本文通過簡單地訓練一個模型來評估與上下文相關的嵌入的好處,在這個模型中,替換成本被限制為只取决於要對齊的氨基酸;不難看出,本文觀察到這個模型的性能有很大的下降,達到了與 "對准 "中錶現最好的替換矩陣差不多的性能。
边栏推荐
- Blog building tool recommendation (text book delivery)
- Based on RFC 3986 (unified resource descriptor (URI): general syntax)
- [backtrader source code analysis 4] use Python to rewrite the first function of backtrader: time2num, which improves the efficiency by 2.2 times
- [research materials] 2021 China's game industry brand report - Download attached
- Detailed explanation of the output end (head) of yolov5 | CSDN creation punch in
- 112 stucked keyboard (20 points)
- SSM framework integration
- Actual combat 8051 drives 8-bit nixie tube
- Dynamic programming - related concepts, (tower problem)
- Notes | numpy-10 Iterative array
猜你喜欢
![[set theory] relational power operation (relational power operation | examples of relational power operation | properties of relational power operation)](/img/8b/c10423ee95200a0d94f9fb9dde76eb.jpg)
[set theory] relational power operation (relational power operation | examples of relational power operation | properties of relational power operation)

Unity tool Luban learning notes 1
![[set theory] relation properties (reflexivity | reflexivity theorem | reflexivity | reflexivity theorem | example)](/img/2a/362f3b0491f721d89336d4f468c9dd.jpg)
[set theory] relation properties (reflexivity | reflexivity theorem | reflexivity | reflexivity theorem | example)

Promise

Basic knowledge of reflection (detailed explanation)
![[research materials] 2021 China's game industry brand report - Download attached](/img/b7/a377b0b7c742078e2feb28ebfbca62.jpg)
[research materials] 2021 China's game industry brand report - Download attached
![[clock 223] [binary tree] [leetcode high frequency]: 102 Sequence traversal of binary tree](/img/0f/bc8c44aee7a2c9dccac050b1060017.jpg)
[clock 223] [binary tree] [leetcode high frequency]: 102 Sequence traversal of binary tree

Botu uses peek and poke for IO mapping

Go practice - gorilla / handlers used by gorilla web Toolkit

Congratulations to musk and NADELLA on their election as academicians of the American Academy of engineering, and Zhang Hongjiang and Fang daining on their election as foreign academicians
随机推荐
5-36v input automatic voltage rise and fall PD fast charging scheme drawing 30W low-cost chip
[backtrader source code analysis 5] rewrite several time number conversion functions in utils with Python
Esp32-c3 learning and testing WiFi (II. Wi Fi distribution - smart_config mode and BlueIf mode)
Objects. Requirenonnull method description
sql语句模糊查询遇到的问题
Dynamic programming - related concepts, (tower problem)
Yolov5 model construction source code details | CSDN creation punch in
The principle is simple, but I don't know how to use it? Understand "contemporaneous group model" in one article
Pan details of deep learning
Prepare for 2022 and welcome the "golden three silver four". The "summary of Android intermediate and advanced interview questions in 2022" is fresh, so that your big factory interview can go smoothly
1114 family property (25 points)
[research materials] 2022q1 game preferred casual game distribution circular - Download attached
cookie session jwt
Redis 过期淘汰机制
微服务常见面试题
50 practical applications of R language (36) - data visualization from basic to advanced
Problems encountered in fuzzy query of SQL statements
Go practice - gorilla / handlers used by gorilla web Toolkit
[develop wechat applet local storage with uni app]
JS string and array methods

