当前位置:网站首页>Research progress of DNA digital information storage

Research progress of DNA digital information storage

2022-06-12 09:35:00 YoungerChina

Abstract :

        With the development of computer technology , Digital information storage has changed our lives . Information is being produced at a faster and faster rate , But the accompanying , How to effectively store data . Such as disk 、 Hard disk 、 Traditional storage media, such as magnetic or optical flash memory, have gradually failed to meet the needs of data storage all over the world .DNA Molecules depend on their stability 、 High storage density and low maintenance costs , It is expected to become a new practical information storage medium . This paper first introduces the use of DNA The workflow of data storage by molecules , Then it introduces DNA Research history and progress in the field of data storage , Including storage mode 、 Read mode 、 Coding method, etc . In order to achieve DNA Information storage , Convert binary information into by information encoding DNA Sequence information ;DNA Composition implementation information writing ; Finally, sequence information is obtained through gene sequencing , Then, the information is decoded to get the original information . The development of modern molecular biology technology , In especial DNA A leap in synthesis and sequencing technology , send DNA Large scale molecular storage of artificial data has gradually become a reality . after , Contrast DNA The advantages and disadvantages of molecules over traditional data storage media , Based on DNA The risks and challenges of molecular data storage , Such as data security 、 The speed and cost of reading and writing information . Last , Yes DNA The future research direction of data storage field is prospected , This paper introduces some new biotechnological fields that have cross potential with this field , Such as “DNA Bar code ”“DNA Origami ”.

 

 

        With the development of human's observation of the world towards higher accuracy and greater breadth , diversification 、 miniaturization 、 Invention and popularization of dynamic sensors , The amount of human data keeps growing exponentially or even super exponentially ,“ Astronomical figure ” This concept has been constantly subverted . Now , In the field of scientific research , The Atacama large millimeter array observing space will increase every day 2 TB The observation data of ; In the field of health , Digital human body and digital medicine cover personal health data 、 Various types of clinical big data and operational data , Global healthcare data has reached 2.26 ZB; Besides , Finance 、 industrial production 、 Networking in security and other fields 、 Real time has become the standard configuration in modern society , The data in these fields are based on the population 、 Accumulated in seconds . According to IDC (International Data Corporation, IDC) Estimation ,2025 In, the global data output will reach 175 ZB(1 ZB≈1.18×1021 B), The current mainstream storage media production has been overwhelmed . The copying and transmission of massive data are also facing challenges . According to the transmission rate of civil optical fiber 1 Gbps It is estimated that ,PB(1PB≈106 GB) Data exchange of the order of magnitude takes much longer than physical transport , The latter incurs a lot of unnecessary costs . besides , The existing storage media will inevitably be consumed with the number of reads and writes and natural time , It leads to hundreds of millions of information maintenance costs every year . therefore , Practical new data storage media need to be developed , To meet the challenge of information explosion .

        Deoxyribonucleic acid (DNA) It is the carrier used by organisms to store genetic information . adopt A、T、C、G Four bases ,DNA It stores all the genetic information of the species and stably inherits it to the offspring , Our height 、 Skin colour 、 Information such as iris is recorded in small cells , Genome and central law are the most exquisite information storage and transmission algorithms in nature .DNA It also has the potential to store digital information . The data can be transformed into a linear sequence of bases , Encoding in DNA In this new information storage medium . The most striking thing is that DNA Information storage capacity and storage density , Studies have shown that ,DNA Information storage density can reach 1019 bit/cm3, It's a hard drive 106 times . Besides ,DNA Strong stability , Long storage time , And no frequent maintenance . In the fossil DNA The average half-life is estimated to be 521 year ; Some special materials, such as synthetic silica or gel, can be preserved for a longer time . Information can be easily reproduced by biochemical means (PCR Method )、 cutting ( Restriction endonuclease ) And paste (DNA Ligase ) etc. . These characteristics make DNA Molecules have become an ideal new data storage medium .

1  DNA Research progress of data storage

1.1  DNA Brief description of information storage process

        Use DNA Molecules store information , It can be divided into information coding 、DNA synthesis ( write in )、DNA Sequence ( Read ) And information decoding , Pictured 1 Shown .

chart 1  DNA Information storage process

        First, the information must be transformed into DNA In molecules 4 A sequence of bases . In the field of Information Science , Different data types have different coding and compression algorithms , Commonly used algorithms are Huffman coding 、 Arithmetic coding 、 Dictionary code, etc . Besides , about DNA In terms of molecules , In synthesis 、 Copy 、 Errors can occur during sequencing , Physical redundancy and logical redundancy can restore original data in case of information distortion , Error correcting code . chart 2 The direct transformation of information 、 Linear block code 、 Fountain code and convolutional code principle .

 

chart 2  DNA Information coding methods used in storage research ( Forward error correction system )

(a) Direct conversion , Error correction scheme is not included . In this scheme , Data is read as a digital stream , And then convert to DNA Sequence . for example ,Church Equal sum Goldman Each bit in the binary digital stream and the ternary digital stream is converted into one DNA Base .

(b) Linear block code , That is, through linear operation , From the original information ( Information symbol ) Generate redundancy for error correction ( be called “ Check symbol ” or “ Monitor symbols ”). When decoding , The check matrix corresponding to the generation matrix can be used to check whether the received information contains errors , And correct .

(c) Fountain code , Convert the original information into a large number of shorter information , These shorter messages are not part of the original message , Instead, the symbols in the original information are XOR calculated through a specific distribution . When decoding , As long as you get enough SMS messages , You can recover the original information .(d) Convolutional code , namely “ There is memory ” Coding scheme of . When encoding symbols for transmission , Not only do you have to deal with the current information symbols , It also needs to calculate several information symbols before the current position .

        After encoding , Conduct DNA synthesis , Write now . The three generation DNA Synthetic technology —— Chemical synthesis ( Solid phase phosphimide chemical method )、 Microarray DNA The evolution of synthetic and enzymatic methods has greatly reduced DNA Time and cost of synthesis . in addition , With the development of gene assembly and editing technology, we can change genetic information flexibly and accurately , And the information is processed and stored in living cells , by DNA The development of information storage provides favorable conditions .

        The reading of information depends on gene sequencing technology . since 1977 First generation of the year DNA Sequencing technology (Sanger Law ) Since its appearance , Sequencing technology has made great progress . Compared to the original , Its cost has been reduced by 100000 times . Restore the base sequence by sequencing , According to the coding principle, the information recovery ability can be predicted . Get in DNA After the sequence information , Convert the base sequence back to binary sequence , thereafter , Then, the error correction principle of coding is used to correct the sequence automatically , You can get the original digital information .

1.2  DNA History of information storage

        About DNA Molecular cognition begins with 19 century 70 years Miescher and Kossel And so on , But until 1953 year Watson and Crick stay Nature Published on “Molecular Structures of Nucleic Acids” One article , People are right DNA We have a clear understanding of the molecular structure . At the same time Avery Equal sum Hershey And so on DNA Molecules are the carriers of genetic information stored in organisms . Some subsequent studies have made people realize that , The genetic information of an organism is stored in its composition DNA Molecular 4 In the linear arrangement of nucleotides .4 The specific arrangement of the three bases contains the genetic information of the organism .

        These research results naturally led to the use of DNA Conjecture and attempt of molecular storage of artificial data . However , Limited by the immature fashion DNA Synthesis and sequencing technology , These attempts failed . until 1996 year ,Davis Will contain 35 The black-and-white image information of pixels is encoded into DNA molecular , It was imported into E. coli and read out successfully . here we are 2001 year ,Bancroft Wait for 《 Two cities 》 The two famous sayings at the beginning are coded to DNA In molecules , The method used is similar to DNA Coding for protein sequences “ The code ” The method is similar to . stay 2012 Years and 2013 year ,Nature and Science Harvard Medical School Church And the European Institute of bioinformatics Goldman Wait in DNA Research achievements in the field of data storage . Unlike earlier studies , Both groups of studies have stored a considerable amount of data .Church And so on DNA Molecules store 659 KB The data of , and Goldman When it is stored 739 KB. The success of these two studies depends on DNA Great advances in synthesis and sequencing technology , So that tens of thousands of DNA Molecules become possible .

        After these two studies ,DNA New developments in the field of data storage have sprung up . stay 2015 Years and 2016 year ,Grass Equal sum Blawat And so on, the two studies in the field of Information Science “ Forward error correcting code ” introduce DNA Data storage , When errors occur during synthesis and sequencing , Information can still be recovered , Thus, the use of DNA The reliability of molecular data storage .2016 year ,Bornholt And so on DNA Of data in a storage system “ Random access ”(random access).2017 year ,Erlich Wait for “ Fountain code ” Introduced DNA In the coding system , be called “DNA fountain ”, High data storage density is realized . Same year ,Shipman Wait for a movie information to pass CRISPR Technology is encoded into living cells .2018 year ,Organick Wait in DNA As many as... Are stored in molecules 200 MB The data of , It realizes random access in large-scale system , And try to use single-molecule sequencing (single molecule sequencing,SMS) Read and restore data .

       2020 year ,Erlich and Grass Use fountain code for information storage , They put forward a “ Everything can be stored DNA Information ” Concept (DNA-of-things,DoT). The author will 3D Printed rabbit —— The design blueprint information of Stanford rabbit is transformed into DNA Sequence , Synthetic oligonucleotide fragments , These short fragments are then encapsulated in a size of 160 nm Of silicon dioxide nanoparticles , Mixed with degradable thermoplastic polyester for 3D Print . It is also very easy to read and copy information , Cut a small piece from the rabbit's ear and dissolve it , You can get one of them DNA, And then sequencing and amplification , The obtained information can also be used for the next generation of rabbits 3D Print . Final , The researchers perfectly copied and printed five generations of rabbits , It shows DNA Stability and fidelity as information storage medium . Besides , They will also 1.4 MB The size video code is stored in the Plexiglas of the glasses . In this study , They also use “DNA fountain ”, That is to use LT Code response error .

       2020 year ,Press Etc. developed a method that can handle DNA Additions and deletions in synthesis and sequencing errors (indel) FALSE DNA Coding algorithm , be called “HEDGES”. This algorithm uses RS Code and convolutional code , And use tree structure to decode . be based on HEDGES code , They synthesized 5865 The length of the bar is 300 bp Oligonucleotides , these DNA The molecule was then artificially introduced with mutation and deletion errors and in Illumina Sequencing on the platform . The decoding result shows that , At the expense of a certain coding density ,HEDGES Able to handle a total of about 1.2% Addition and deletion errors of . This algorithm is designed to deal with more complex DNA Error types provide a reference , So as to guarantee DNA The robustness of molecular storage information . Compared with the traditional information storage method, the magnetic storage medium is used ( disk )、 Optical storage media ( Compact disc ) And electronic storage media ( Memory 、U disc ) comparison ,DNA Reading and writing is slow and tedious . Many researchers are committed to achieving full automation DNA Information storage . Microsoft and the University of Washington have set up a fully automatic system based on column synthesis and third-generation sequencing DNA Storage and reading devices , Storage and reading “hello” The whole process of 21 h. Although there is still a long way to go , But the automation of information storage and reading is important for DNA The industrialization of storage is of great significance .

      It can be seen that , Researchers DNA The field of molecular storage and DNA Synthesis and sequencing technology 、 Cell biology and molecular biology technology 、 The fields of information science and communication technology continue to cross and merge , More possibilities for the future of this field , Keep improving DNA Molecular storage potential , bring DNA Data storage is more and more close to the reality of production and life .

2  DNA Advantages of information storage

2.1  Storage density

        Magnetic storage media use the electromagnetic effect of magnetic media for information storage . The optical storage medium records information in a groove on the surface of the optical disc , Then read by laser , The larger the amount of data, the higher the precision of the laser is required . The working resolution of physical devices determines the ultimate density of these traditional media . The storage density of carbon based biomolecules is at the molecular scale , Compared with traditional media , Have natural advantages .

        Ideally ,DNA The storage density of the molecule can reach about 460 EB/g, This means that only a few grams of DNA Molecules can store information that the world produces in a year .DNA It has a double helix solid structure , The data density per unit space is very high . Because they can't be packed together infinitely , Bulk density is more representative of DNA The actual data storage capacity of molecules . It is estimated that , Per cubic centimeter DNA Molecules can store about 1 EB Information about , This density is currently the highest density media ( Flash memory ) Of 1000 times , It is a million times the data storage density of the hard disk . Even because of the encapsulation 、 Practical factors such as redundancy cannot realize the maximum storage potential , The available storage density is still much higher than the current mainstream data storage media .

        natural DNA The molecule contains four bases , So each base can store up to 2 bit Information about . However , There is also a part of research work trying to expand the base system , That is to use DNA In addition to the four natural bases in the molecule “ Artificial base ” or “ Unnatural base ” To store information , So as to improve DNA Molecular information storage density . The work of unnatural bases originated from 20 century 80 years , In recent years, it has made great breakthroughs , It has been realized 8 Base system .

        In addition to using additional unnatural bases , There are also studies that use “ Degenerate base ” To expand DNA Molecular storage density . stay 2019 year , Several different studies have successfully used degenerate bases for data storage , And improve the storage density . To be specific , Degenerate bases will DNA The sequence space continuity of each position in the sequence , It is expressed as a mixed system of four bases . for example ,Anavy Et al. Defined two new base symbols in their research :M, It's the same amount A and T Mixture ;K, It's the same amount G and T Mixture . After adding these two symbols ,DNA Every bit of the molecule contains 6 individual “ Base ”, Thus, it can accommodate 2.58 bit Information about . This base system can be extended , To include more “ Degenerate base ” Symbol , So as to further improve DNA Molecular storage potential . stay Anavy And so on , They try to use a larger base space to store information on a smaller scale (22.5 B), And each synthesis round is realized 4.29 bit The storage density of .Choi And so on , And use the 15 individual “ Base ” Your system stores 854 B Information about , Realized each DNA 3.37 bit The storage density of .

        except DNA outside , Other carbon based storage media also exhibit information storage capabilities . Tao Hu, Professor of Shanghai Institute of Microsystem and information technology, Chinese Academy of Sciences, has invented a biological memory based on silk protein , Every square inch can store 64 GB Data and information (1 Square inch =6.4516×10−4 m2), And can be rewritten repeatedly . Silk protein and DNA be similar , It can withstand abnormal humidity 、 Environment such as radiation and magnetic field . Silk proteins can also be used to store organisms DNA Wait for biological samples , Hope for the future and DNA Media combination , For digital storage . Although its storage density is still limited by the resolution of optical writing devices , But it shows the academic recognition of carbon based media as information storage . And metabolic molecules ( sugar 、 Amino acids, etc ) smaller , It can also be used as information storage . Brown University Kennedy Wait and receive DNA Storage inspiration , Using the liquid droplets of metabolic molecules to store pictures and other information in the metal plate point array . Similar to the idea of degenerate bases , They used the measurement of the distribution of metabolic components to realize the information coding in higher dimensional space .

        Although carbon based storage is especially DNA It has a great advantage in density , Considering the dilute solution conditions and molecular diffusion rate required for random access , One 1 L Of DNA The amount of information that can be held in the storage pool is limited to TB~ZB Magnitude . therefore , An interesting concept is “Storage-on-Chip”. The design of storage hardware architecture needs to adapt to these practical considerations , Large scale data storage cannot be separated from the innovation of storage system .

2.2  Data maintenance

        The traditional data storage media always suffer from spontaneous loss , Result in damage or loss of information . Hard disk and flash memory can retain information for no more than ten years . Maintaining large amounts of data in traditional data storage media requires extremely high costs . for example , If a data center wants to store on tape 109 GB data , It takes up to a billion dollars and more than a decade to build and maintain , And the consumption of hundreds of millions of kilowatt hours of electricity .

        and DNA The molecules are extremely stable under appropriate conditions , It can ensure that the information stored in it will not be damaged . The fossils in the hands of geologists are DNA The data retention capacity of molecules provides a strong proof —— Sometimes we can get fossils from hundreds of thousands of years ago DNA Molecules and read their sequence information . If you will DNA The molecules are kept in a suitable environment , The sequence can last even longer . for example ,Grass And so on DNA Molecules are encapsulated in silica , It shows better performance than pure solid state DNA Better retention properties of powders and other storage media . They figured out what would happen if it were encapsulated in silica spheres DNA The first-order kinetic activation energy of molecular degradation , It can be inferred that under the same conditions it can be 9.4 ℃ Save the next 2000 year , Or in the −18 ℃ Save the next 200 In ten thousand, .

          meanwhile , Compared with traditional media , Use DNA Molecular data storage requires little maintenance cost . Use DNA Molecular storage 109 GB Data power consumption is insufficient 0.1 W. Such low maintenance costs make DNA Molecules are especially useful for storing large-scale, infrequently accessed “ Cold data ”.

2.3  In vivo information storage potential

        so far , majority DNA All attempts at storage are made in vitro , for example DNA Oligonucleotide pool (oligo pool), Or right DNA Fragments are physically encapsulated to further enhance storage stability ( chart 3). At the current level of Technology , In vitro storage at storage cost ( Short segment storage 、 There is no need to connect growing segments , There is no need to import into plasmids or genomes )、DNA Engrave ( Living cells DNA It is necessary to avoid functional genes and their related sequences when writing )、DNA Read ( The second generation sequencing technology is relatively mature ) And stability ( Living cells DNA mutation ) And so on .

  chart 3  DNA The carrier of information storage

        For all that , More and more scientists are looking at DNA In vivo storage . The genome of a living cell DNA Due to its durability and biocompatibility , Has become another potential form of information storage . And in vitro DNA Storage compared to , In vivo storage takes advantage of the cells themselves DNA The mechanism of copying and proofreading , It also provides a practical means for microscale random data access . besides , Extreme environmental microbes have a lot of room for development in the energy consumption of information storage .

        about DNA In vivo storage , The researchers first turned their attention to the plasmid ( chart 3), Because it is easy to operate 、 Editing and writing are simpler . plasmid DNA Storage can be traced back to 1996 year ,Davis The little Venus goddess was stored in E. coli plasmid “Microvenus” Pictures of the . thereafter , Many researchers put the text 、 music 、 The picture information is stored on the plasmid .

        However, the problems of storage capacity and genetic stability limit the application of plasmids as information storage vectors , As an alternative, genome has become a new way of in vivo storage .2010 In a landmark study of synthetic biology ,Venter The team synthesized the entire Mycoplasma genome through chemical synthesis , It is proved that it has biological activity and replication ability . Besides , They added a lot to the synthetic genome “ Watermark information ”, Including the author's name 、 Institute Information and poetry, etc . This is also the first attempt to store information in the genome .2017 year ,Shipman Etc CRISPR Technology will “ A running horse ” Five frames of video are stored in the genome of the colony cells , E. coli was used to replicate the data , It is proved that video can be preserved stably in generations .

        Based on the body DNA The fidelity and generative potential of stored information , Researchers try to use DNA Sequence information as label , To track the results 、 Information flow , Even logistics tracking , This technology is collectively referred to as “DNA Bar code ”(DNA barcoding). The United States Springer The professor put forward “BMS” technology , Through the design DNA Bar codes are combined , And integrate it into the genome of Bacillus subtilis and Saccharomyces cerevisiae spores , Trace tracking is realized by spraying and transferring to the contact object .DNA Bar code identification , You can use SHERLOCK、RPA、Cas13a And sequencing , So as to trace the source of food, etc , It can also be combined with CRISPR Technology tracking sequence , To study the dynamic processes of tumor growth and cancer evolution . These conceptual demonstrations suggest that the body DNA Storage and cell sensing 、 Possible interfaces for new biotechnologies such as cell processors . In addition to the nano Internet of things and disease detection ,DNA Stored without intervention , Have the nature that cannot be changed and erased at will , This makes it naturally suitable for building tamper proof 、 Forgery proof and traceable “ Blockchain ” data structure . But from the practical point of view of information operation , The non erasable storage system will be greatly limited in the application field . In the following , We have summarized people's views on DNA Some attempts made by the data erasure function in the storage system .

        Although so far DNA In vivo storage is presented in the form of short fragments , Yeast artificial chromosome 、 Recent advances in synthetic biology, such as large fragment genome manipulation, can be applied to DNA Storage . Long segment DNA In vivo storage is suitable for third generation single molecule sequencing , It is possible to achieve DNA Information is read in real time .

3  DNA Challenges of data storage

3.1  Data security

        Data security is an important issue in the field of information storage and transmission , It includes the integrity of information 、 Reliability and confidentiality . Although stored in DNA Information on molecules has dynamic stability , But its erasure 、 Anti counterfeiting and other operations are limited by the accuracy of biochemical reaction and cannot be achieved 100% determine , This has two sides for specific applications , It will promote the iterative progress of related technologies in a period of time .

        at present , The development and application of synthetic biology and gene editing technology , send DNA Molecular rewriting becomes possible . This is beneficial to DNA Storage moves towards broader application scenarios , It also puts forward higher requirements for data security . In cells DNA In the storage system , We can use some tool enzymes to erase and rewrite information , For example, site-specific recombinant enzymes can recognize specific DNA site , And then flip 、 A segment between insertion or excision sites DNA. Besides , In vitro DNA In the storage system , Through carefully designed biochemical reactions , Information can also be realized “ erase ”.2020 year ,Baym and Zhang The research group encodes the true and false information in DNA In solution , By designing the tag chain and hybridizing with the information in the solution, the authenticity of the information can be distinguished —— True information can be compared with “ The real mark ” Oligonucleotides are hybridized , The tag chain of error messages can prevent DNA Chain extension and amplification , This ensures that only real information is read . be based on DNA Temperature sensitivity of hybrid molecules , The author found out in 25 °C Next ,DNA Information is being stored 65 It can be read stably after days , And speculate DNA stay 25 °C The half-life of the lower is more than 15 year , It can be used for long-term and stable information storage ; But in 95 °C Next DNA Hybrid molecules quickly dissociate , Heat only 5 min, The message will be permanently erased . Although currently limited by the means of operation , People are right. DNA The research on erasure of stored information is not in-depth , But with the development and progress of Technology , There may be more general rewriting tools for several types of storage systems .

        Besides , The principle of encryption and coding in information science also applies to DNA Storage .Grass Waiting for human beings DNA Generated in the 80 bit Strong key for , The stored in the DNA In molecules 17 KB Data encryption , And successfully read and restore the original information .DNA Origami also has the potential to encrypt 3D information . Zuoxiaolei research group of Shanghai Jiaotong University and fanchunhai research group of Shanghai Institute of Applied Physics, Chinese Academy of sciences have successively used DNA Origami precise positioning and assembly ability , Made a preliminary attempt in storage . some time ,DNA Origami pattern diversity may be used in information security fields such as information encryption .

3.2  Reading and writing speed and cost

        With DNA The rapid development of synthetic technology , Synthetic DNA The cost of molecules continues to fall . However , If you want to store a lot of information , To be synthesized DNA The number of molecules is also huge , Become DNA The main expenses of molecular information storage . At present , Using arrays ( High throughput ) synthesis DNA The cost is about per base 0.0001 dollar . If each base is stored 1 bit Information about , So storage 1 TB Your information needs at least 8 Billion dollars . by comparison , The cost of using tape to store data of the same size is only 16 dollar . obviously , synthesis DNA The high cost of DNA The competitiveness of molecules over traditional storage media , Limit the DNA Data storage has entered a large-scale practical stage .

        Microarray DNA More efficient synthesis technology 、 Fast , With higher cost effectiveness , The rate of synthesis can reach several thousand bases per second . The third generation DNA Synthesis technology is based on enzyme synthesis , Although it is still in the early stage of development , But it is expected to greatly reduce DNA Time and cost of synthesis .Lee The time of enzymatic synthesis is estimated to be per cycle 40 s, Is the rate of chemical synthesis 6 times . The cost per cycle of phosphimide reagent used in the chemical synthesis method is 0.626 dollar ; The cost per cycle of enzymatic synthesis is cheaper than that of biafosamide 1000 More than times . Once the enzyme reaction system is miniaturized , It is expected that the cost will be reduced by several orders of magnitude .

        since 1977 First generation of the year DNA Sequencing technology (Sanger Law ) Since its appearance , Sequencing technology has made great progress , Compared with the initial sequencing cost 100 000 times . at present DNA The mainstream way of storage is short segment information storage (oligo pool), The most suitable reading method is second generation sequencing . The core idea of second generation sequencing is large-scale parallel sequencing , Hundreds of thousands to millions of samples can be added at a time DNA Molecular sequencing , This is enough to meet the current DNA Storage scale requirements . But with the increasing amount of information , The running speed of second generation sequencing ( Including database building 、 Read and other processes , A round of several days ) It can only barely meet the needs of cold data reading .

       Helicos The company's Heliscope Single molecule sequencer 、Pacific Biosciences The company's SMRT Single molecule sequencing technology and Oxford Nanopore Technologies The company's nanopore single molecule technology and single cell genome sequencing technology , It is collectively referred to as the third generation sequencing technology , Also known as “ Single molecule sequencing technology ”. stay DNA Information storage applications , The third generation sequencing technology is of great help to the expansion of data storage and the realization of real-time reading . Besides , Third generation sequencing in addition to eliminating the PCR The dependence of amplification , More significantly increases the read length and improves the read speed , It has greater advantages in long segment data storage , It has a wide application prospect . Among them, nanopore single molecule technology , Although the error rate is higher than that of other biochemical sequencing platforms , But in sequencing flux 、 Read length 、 It has unique advantages and development potential in terms of portability . for example Oxford Nanopore Technologies The company has developed three generations of sequencing products , Its DNA The average through hole rate is 450 bp/s, Pocket portable third generation sequencing MinION There are as many as 512 Simultaneous sequencing of nanopore channels , And high-throughput desktop products PromethION 48 The data flux of is 7.6 TB(72 h) Magnitude , amount to 29 MB/s Data read rate .

        As technology changes and algorithm upgrades , Third generation sequencing or long fragments that can be used for stabilization in vivo or in vitro DNA The stored information is read , And with the current reading speed of traditional media (KB/s~GB/s) On a par . at present , There are already some DNA The storage work tries to use the third generation sequencing for data reading .

4  Summary and Outlook

        DNA Because of its universal durability and biological function compatibility, it has become an ideal medium for artificial information storage . From data stability 、 transmission 、 change 、 maintain 、 From a practical point of view, such as saving , It has unique advantages , It is possible to replace traditional storage media in specific data storage fields such as archive file storage .

        In the form of storage , In vitro storage is still the most commonly used form of storage , In vitro storage uses a short segment pool (oligo pool) To store information , The main reading method is the second generation sequencing technology . The core idea of second generation sequencing is large-scale parallel sequencing , Its characteristic is that it can parallel hundreds of thousands to millions of pieces at a time DNA Molecular sequencing , Generally, the reading length is short , It is suitable for reading information stored in short fragments in vitro . But with the increasing amount of information , Second generation sequencing can not meet its requirements . Third generation sequencing technology has a higher error rate , But it has great application potential for larger data volume and real-time reading . The corresponding reading speed is faster , Therefore, it has greater advantages in long segment data storage . Besides , Third generation sequencing in addition to eliminating the PCR The dependence of amplification , The read length is significantly increased and the read speed is improved , stay DNA The field of information storage has broad application prospects .

        For all that , There are still some problems affecting DNA Use and promotion of storage . The first is the high cost of writing and reading , But as the DNA Improvements in synthesis and sequencing techniques , Its cost and accuracy are expected to be further optimized , Make it better applicable to DNA Storage areas . conversely ,DNA The rapid development of storage will also drive the second leap of synthesis and sequencing technology .

      secondly , In information coding and hardware system ,DNA Storage will also provide momentum for continued technological development . Coding algorithm and DNA Joint development of biochemical reaction system , Will mainly conquer random reading 、 Erasure 、 Key issues such as information encryption . For example, random reading problem , How to efficiently read files from a storage pool at a specified location is a challenge . At present, researchers are adding specific tags in specific locations or optimizing retrieval algorithms , To overcome this problem . For erasure problems , The application of new tools and technologies will make it possible to rewrite information , In particular, the latest advances in synthetic biology and genome editing technology have shown the possibility of flexibly and accurately changing genetic or artificial information in living cells . Natural and engineering DNA Targeting enzymes and modifying enzymes , Including recombinant enzymes 、 Multifunctional variants such as reverse transcriptase , Can be used as DNA Writing modules in the storage system . And a variety of information coding methods and utilization DNA Three dimensional structure and other methods to encrypt information , Can guarantee DNA Security of stored information . These studies are expected to DNA Storage is released from the field of cold data archive file storage , Make it reach a wider field of data operation , For example, dynamic data storage 、 New encryption 、 Block chain, etc. .

        Last , Living cells DNA Memory technology with advanced cell microprocessor technology , Data storage and decision-making can be integrated on a small scale , Data “ save ” And “ count ” Integration and marginalization of , The realization of this vision will depend on DNA Great breakthroughs in storage technology and cellular computing . In the future era of super large data , Living cells DNA Store or be able to take medical health as the center for a wide range of applications , Potential for disruptive technologies .

  The official account number : Strategic frontier technology

原网站

版权声明
本文为[YoungerChina]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/163/202206120921251608.html