当前位置:网站首页>穀歌 | 蛋白序列的深度嵌入和比對
穀歌 | 蛋白序列的深度嵌入和比對
2022-07-03 05:13:00 【智源社區】
一旦經過訓練,DEDAL就會產生專門為每一對新的序列計算的間隙和替換的評分矩陣。此外,間隙和替換的分數是有背景的:對於每一對比特置,它們取决於要對齊的完整序列。然後用一個標准的SW算法使用這些參數計算出最佳的排列。本文錶明,DEDAL可以在帶有加速器的現代硬件上有效地訓練。一旦訓練完成,DEDAL與標准SW相比,為遠程同源物預測的對准質量提高了2-3倍,並產生了一個能更准確檢測遠程同源物的對准分數。
上圖展示了來自Pfam-A種子的兩個蛋白域序列比對例子。
a. 分別從Pfam-A種子數據庫(第二行)、DEDAL預測(第三行)和用PFASUM70替代矩陣預測(第四行)進行的比對。本文顯示了Pfam-A種子和DEDAL對准的兩個序列中的所有殘基,但沒有顯示PFASUM的序列中對准的上遊和下遊的未對准殘基。綠色突出顯示的殘基對應於正確對齊的保守殘基,而紅色顯示的殘基對應於預測對齊和Pfam-A種子對齊之間的差异。
b. 來自PFASUM替代矩陣的所有殘基對之間的替代分數。
c. 由DEDAL預測的SW參數。

在技術方面,本文探索了兩種方法來創建一個可區分的SW對齊模塊,需要在 "學習對齊 "任務中訓練DEDAL的參數,使用平滑技術或擾動技術;本文發現兩者在性能上沒有明顯區別,並在最終的DEDAL模型中實施了基於擾動的方法。關於用於訓練DEDAL的排列組合,本文發現,當本文希望DEDAL能够預測准確的局部排列時,使用Pfam擴展域而不是Pfam域是有益的。在遮蔽語言建模任務中預訓練DEDAL時,將與分布外家族相關的序列從 "蛋白質宇宙 "中排除,導致遠程同源物的性能略有下降,盡管相對於與基線的性能差距來說並不明顯。
關於端到端聯合訓練變換器和參數器的策略,本文發現這確實明顯優於更經典的兩步策略,即首先在屏蔽的語言建模任務中訓練變換器編碼器,然後通過保持變換器固定在 "學習對齊 "任務中訓練參數器。這錶明,一個通用的語言模型,如ESM,是不够的,至少應該進行微調,以達到對齊的最佳性能。
上圖展示了學習的嵌入在下遊任務的應用情况。本文通過簡單地訓練一個模型來評估與上下文相關的嵌入的好處,在這個模型中,替換成本被限制為只取决於要對齊的氨基酸;不難看出,本文觀察到這個模型的性能有很大的下降,達到了與 "對准 "中錶現最好的替換矩陣差不多的性能。
边栏推荐
- Coordinatorlayout appbarrayout recyclerview item exposure buried point misalignment analysis
- 1106 lowest price in supply chain (25 points)
- Redis 入門和數據類型講解
- Detailed explanation of yolov5 training own data set
- The consumption of Internet of things users is only 76 cents, and the price has become the biggest obstacle to the promotion of 5g industrial interconnection
- Huawei personally ended up developing 5g RF chips, breaking the monopoly of Japan and the United States
- [batch dos-cmd command - summary and summary] - CMD window setting and operation command - close CMD window and exit CMD environment (exit, exit /b, goto: EOF)
- The programmer resigned and was sentenced to 10 months for deleting the code. JD came home and said that it took 30000 to restore the database. Netizen: This is really a revenge
- Differences among bio, NiO and AIO
- Class loading mechanism (detailed explanation of the whole process)
猜你喜欢

ES7 easy mistakes in index creation

大学校园IP网络广播-厂家基于校园局域网的大学校园IP广播方案设计指南

Handler understands the record

cookie session jwt

Gbase8s composite index (I)

Yolov5 network structure + code + application details | CSDN creation punch in

Prepare for 2022 and welcome the "golden three silver four". The "summary of Android intermediate and advanced interview questions in 2022" is fresh, so that your big factory interview can go smoothly

SSM framework integration

【批处理DOS-CMD命令-汇总和小结】-CMD窗口的设置与操作命令-关闭cmd窗口、退出cmd环境(exit、exit /b、goto :eof)

Promise
随机推荐
1111 online map (30 points)
112 stucked keyboard (20 points)
编译GCC遇到的“pthread.h” not found问题
(perfect solution) how to set the position of Matplotlib legend freely
酒店公共广播背景音乐-基于互联网+的酒店IP网络广播系统设计
The programmer resigned and was sentenced to 10 months for deleting the code. JD came home and said that it took 30000 to restore the database. Netizen: This is really a revenge
Automatic voltage rise and fall 5-40v multi string super capacitor charging chip and solution
Introduction to redis and explanation of data types
Technical analysis of qianyuantong multi card aggregation router
1110 complete binary tree (25 points)
[Yu Yue education] basic reference materials of interchangeability and measurement technology of Zhongyuan Institute of Technology
联想R7000显卡的拆卸与安装
(subplots usage) Matplotlib how to draw multiple subgraphs (axis field)
Force GCC to compile 32-bit programs on 64 bit platform
[backtrader source code analysis 4] use Python to rewrite the first function of backtrader: time2num, which improves the efficiency by 2.2 times
Yolov5 input (II) | CSDN creative punch in
RT thread flow notes I startup, schedule, thread
Based on RFC 3986 (unified resource descriptor (URI): general syntax)
Redis 入門和數據類型講解
Without 50W bride price, my girlfriend was forcibly dragged away. What should I do

