[email protected] * This project is completely written by the team members , And plan to publish papers background (Introduction) The following is the g...">

当前位置:网站首页>Biological sequence intelligent analysis platform blog (1)

Biological sequence intelligent analysis platform blog (1)

2022-06-11 07:06:00 Boiled wine cos

[email protected]

This project is completely written by the team members , And plan to publish papers

background (Introduction)

The following is the general biological and computer background of the project

Bioinformatics is an important frontier subject of life science and natural science , It is always inseparable from the analysis of biological sequences , To understand the specific meaning of a large amount of biological data . The sequencing technology has not been developed yet , When the amount of data is very small , Manual data processing can meet the needs of the time . But today, with the large-scale expansion of biological sequences , The obviously backward manual processing method has lagged far behind the computer processing data , Simultaneous machine learning 、 Deep learning 、 With the rapid development of naturallanguageprocessing and other technologies, new methods for processing biological sequences are developing and advancing .

But this also puts forward higher requirements for biological workers , When they want to extract the information behind biological sequences efficiently and quickly, they have to learn programming , Thus, we can make use of relevant cutting-edge data analysis tools . therefore , Conform to the development of the times , Develop a system that can receive biological sequence analysis and automate sequence analysis 、 Available platforms or toolkits for forecasting and related visualization , Such demand is the general trend .

However , Limited by many factors , The traditional method based on machine learning is not accurate enough , Cannot handle unbalanced data sets . Biological sequence analysis and deep learning algorithms related to natural language processing have also been widely introduced into biological sequence analysis , for example Bidirectional Encoder Representations from Transformers (BERT) Model , It applies Attention based architecture Transformer, Great achievements have been made in most naturallanguageprocessing tasks . meanwhile ,AAAI Relevant papers have appeared at the conference to show that BERT The model can perform well in biological sequence analysis .

Last , Based on the above requirements and Analysis , We developed a system based on BERT Model Web The server . Compared to currently available tools , Our server platform has the following main advantages :

  1. As far as we know , Our platform is the first one based on BERT Network platform for sequence binary classification analysis , And provide downloading and visualization of analysis results .
  2. Besides , The server can handle unbalanced data sets .
  3. Our server can perform characterization and visualization for the next step .
  4. Unlike other machine learning based platforms , The workflow of my server can be seen as a black box . We provide end-to-end services , The user uploads his sequence and gets the result , There is no need to set specific machine learning method parameters .
  5. Our deep learning model has good migration ability , It allows us to quickly upgrade on other follow-up tasks .

Division of labor

The biological sequence intelligent analysis platform group was established in 2021 year 9 month 20 The opening meeting was held on the th , The specific contents of the meeting mainly include task allocation and work schedule .

Project ai Part of the python Mainly by jinjunru , The back-end port is guochangrui , The front part is Jiang Yi , The research part of the thesis is chenchaoyi and fengjiuxin , The writing of the thesis is in the charge of the graduate student yinchenglin . Because there is no ready-made code for the project to run , All the code of our project needs to be written by hand , We all started working on the second working day , I plan to write the code in three to four weeks , And deployed to our lab's servers for global use .

The technology stacks used in the project are :python、pytorch、springboot、react、antd、cdn etc. , The reserve of knowledge required is huge , The members of our group need to learn the corresponding knowledge in a short time .

Corresponding to the computing machines of some team members 、 Experimental equipment and discussion room are provided by the laboratory , about ai Algorithms, various network technologies and other related knowledge mainly come from students' self-study . As a result of the discussion , The team decided to use cs Architecture to write the general framework of the model and front-end and back-end code .

python End architecture

use pytorch Training , Rely on the back end for storage .

The package needed , namely requirements.txt

pytorch
matplotlib
seaborn
transformers
原网站

版权声明
本文为[Boiled wine cos]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206110656425731.html