当前位置:网站首页>December 19, 2021 [reading notes] - bioinformatics and functional genomics (Chapter 5 advanced database search)
December 19, 2021 [reading notes] - bioinformatics and functional genomics (Chapter 5 advanced database search)
2022-06-30 07:38:00 【Muyiqing】
- 5.5 Use something similar to BLAST A quick search of the genome using the comparison tool DNA
- demand : With the genome DNA The number of databases is growing , Contrast requires more and more tools
- Can be found in the genome DNA Exons found in
- Consider the genome when comparing DNA Contains sequencing errors
- There are corresponding algorithms to solve the problem that the genomes of related species are deleted in the comparison 、 repeat 、 The problem of inversion or displacement
- There are corresponding algorithms to solve DNA Small differences between sequences , Such as SNP site
- Use the standard set to evaluate the effect of genome alignment
- When using a sequential evolutionary random model (ROSE) The software package ships a simulated sequence set for testing , You can get the global comparison tool LAGAN The highest sensitivity , Local comparison tool ( Such as BLASTZ) The comparison in the compilation section is more accurate
- PatternHunter: Discontinuous seeds improve sensitivity
- PatternHunter Add a mismatch site between the matching sites , Improved speed and sensitivity ( Two models are described )
- Match as 1, Mismatch as 0 For example , The model format is as follows :
- BLASTN:11111111111
- PatternHunter:110100110010101111( Another kind 11101001010011011)
- reason : Very few clips are shared between adjacent seed matches , It makes the matching more independent than using the continuous seed model
- BLASTZ and MegaBLAST The isologous protein search algorithm also uses this strategy
- Icon

- Icon
- BLASTZ
- Compare the human and mouse genomes DNA Sequence .
- function :
- Pedigree specific sporadic repeats are removed from both sequences
- use 12 A word length match does not allow empty spaces to extend it , When the score exceeds a certain threshold , The extension will allow vacancies , namely 1110100110010101111
- For sections adjacent to the successful comparison, the second step is repeated with a lower ( More sensitive ) Word length , such as 7.
- have access to UCSC visualization BLASTZ Compared genome sequences
- Icon

- Icon
- have access to UCSC visualization BLASTZ Compared genome sequences
- Enredo and Pecan( A little )
- Ensemble For multiple sequence alignment , The comparison results are more accurate than other software based on other criteria
- MegaBLAST And discontinuities MegaBLAST
- MegaBLAST:NCBI Optimized for fast comparison of long DNA Tools for querying sequences , The default word length is 28, Adjustable to 256, Increase the running speed .
- Small word length , High sensitivity , Low running speed .
- The output similarity percentage threshold can be defined
- Corresponding match and mismatch scores can be defined
- Discontinuous MegaBLAST Tools for comparing more distant related genome sequences .
- Icon

- Icon
- MegaBLAST:NCBI Optimized for fast comparison of long DNA Tools for querying sequences , The default word length is 28, Adjustable to 256, Increase the running speed .
- class BLAST Comparison tool (BLAT)
- A very fast genome DNA Search tools
- BLAT Put the entire genome DNA The database is broken down into word indexes , These words contain all the non overlapping words in the genome 11-mers.
- BLAT The database indexing strategy used is also SSAH2 And subsequent MegaBLAST use
- Other properties :
- BLAST The extension is fired when two matches occur ,BLAT Multiple matches are required ;
- BLAT The main purpose is to find the data similar to the query sequence 95% The above matches
- BLAT Will search the boundaries of introns and exons , In essence, it establishes a model of gene structure .
- Query case , Icon

- Query case , Icon
- LAGAN
- Double sequence alignment
- Icon

- Icon
- The global double sequence alignment is carried out in three steps
- 1. In the two sequences, a local alignment is first generated to identify a set of anchors , Allow multiple short imprecise word matches instead of long exact word matches ;
- 2. Generate a rough global map , A collection containing the largest anchors sorted by score ;
- 3. Calculate the final global comparison , Limited to the priority areas defined by the rough map .
- Double sequence alignment
- SSAHA2
- SSAHA2 take DNA The database is converted to a hash table with fixed word length , Double sequence alignment can quickly find a match in the hash table .
- demand : With the genome DNA The number of databases is growing , Contrast requires more and more tools
- 5.6 Compare the second generation sequencing reading segment with the reference genome
- 1977 year :sanger Sequence 2005 year :NGS Sequence
- Comparison considerations :
- Match and mismatch
- Running speed
- Introduce index : Hash table and suffix tree
- Hash table based comparison
- utilize “ Seed Extension ” Strategy
- 1. Enter two types of data :
- Reference genome sequence
- A large number of short sequence fragments
- 2. Index fragments and create multiple hash tables
- 3. Then search the hash table to identify matching sections in the database .
- be based on Burrows-Wheeler Comparison of conversions ( Suffix tree )
- Using suffix tree and suffix array is a way to improve the comparison speed ,BWA and Bowtie2 Commonly used , Length of segments is considered , Sequencing error rate , The vacancy penalty shall be given and the local and global comparison of the reading segment shall be comprehensively considered .
- BWT Transform and compress the reference genome ( lossless compression ), That is, a complete original sequence can be restored from the compressed data .
- 1. Given a length of N String , Generate N*N Matrix
- 2. Sort by dictionary sort , Generate matrix M, Each line corresponds to the cyclic shift of the string , The first column is F The last column L
- 3. Compressed string only F and L Information or index of , You can quickly restore the matrix M
- 5.7 expectation
- With BLAST Search has become a basic tool for studying proteins and genes , Many special applications have been developed , Including different algorithms and special databases .
- BLAST You can't search a lot of genomes DNA, Other methods can be done by using longer word lengths 、 The empty seed and the index of database and query sequence achieve this purpose .
- The short sequence alignment tool is specially designed to align millions of short sequences to the reference genome , Typical applications include finding SNP Locus and SV site .
- 5.8 common problem
- For any bioinformatics problem , You must specify the target of querying the database , That is to achieve what purpose
- consider BLAST False positives in , Remove... From the result , And reset the appropriate expectation threshold
- Try to use the right tools and databases for a specific target
- I hope this article can help you , You are also welcome to join the exchange group , Or add VX:bbplayer2021 Share the learning experience of Shengxin .

边栏推荐
- uniapp图片下方加标签标图片
- Virtual machine VMware: due to vcruntime140 not found_ 1.dll, unable to continue code execution
- 期末复习-PHP学习笔记5-PHP数组
- MCU essay
- Lodash filter collection using array of values
- Experiment 1: comprehensive experiment [process on]
- Deloitte: investment management industry outlook in 2022
- Use of ecostruxure (2) IEC61499 to establish function blocks
- C language implementation of chain stack (without leading node)
- 4diac getting started example
猜你喜欢

The most convenient serial port screen chip scheme designed at the charging pile in China

National technology n32g45x series about timer timing cycle calculation

STM32 register

03 - programming framework: Division of application layer, middle layer and driver layer in bare metal programming

2022 retail industry strategy: three strategies for consumer goods gold digging (in depth)

Final review -php learning notes 4-php custom functions

Cross compile opencv3.4 download cross compile tool chain and compile (3)

Examen final - notes d'apprentissage PHP 3 - Déclaration de contrôle du processus PHP

String application -- string violent matching (implemented in C language)

Final review -php learning notes 3-php process control statement
随机推荐
Stepper motor
Dynamic memory management
Permutation and combination of probability
Sublime text 3 configuring the C language running environment
Final review -php learning notes 1
String application -- string violent matching (implemented in C language)
4diac getting started example
期末复习-PHP学习笔记3-PHP流程控制语句
Self study notes -- use of 74h573
Parameter calculation of deep learning convolution neural network
STM32 control LED lamp
Xiashuo think tank: 42 reports on planet update today (including 23 planning cases)
Experiment 1: comprehensive experiment [process on]
Installation software operation manual (continuous update)
C language implementation sequence stack
Directory of software
Examen final - notes d'apprentissage PHP 6 - traitement des chaînes
right four steps of SEIF SLAM
Next initializesecuritycontext failed: unknown error (0x80092012) - the revocation function cannot check whether the certificate is revoked.
Proteus catalog component names and Chinese English cross reference






