The Role of DNA Data Storage in Health and Technology

With the evolution of both science and technology, we live in a world where everything is digitalized, which can also include human DNA. As we witness the rise of efforts associated with DNA data storage, it is impossible not to wonder about its purpose in health, science, or technology itself. This post explores DNA data storage, its uses, problems it could solve, and other factors.

DNA data storage brief history

DNA data storage is defined as the process of encoding and decoding binary data to and from synthesized DNA strands. DNA molecules are comprised of genetic blueprints for living cells and organisms. Although DNA data storage has become a hot topic recently, it is not a modern-day idea. In fact, its origins date back to 1964-65 when Mikhail Neiman, a Soviet physicist, published his works in the journal Radiotehnika. Neiman wrote about general considerations regarding the possibility of recording, storage, and retrieval of information on DNA molecules. The famous physicist explained he had the idea from an interview with Norbert Wiener, an American cybernetic, mathematician, and philosopher, published in 1964.

One of the earliest examples of DNA storage was a device created at the University of Arizona in 2007. The device was created using addressing molecules to encode mismatch sites within a DNA sequence. Performing restriction digest would make mismatches able to be read out in order to recover the data.

In August 2012, the journal Science published research carried out by George Church, an American geneticist, molecular engineer, and chemist, and a team of scientists from the Harvard University. The research team encoded DNA with digital information that also included HTML draft of a book written by a lead research and containing 53,400 words. Besides the book, they also encoded DNA with one JavaScript program and 11 JPG images. They used binary code to preserve the formatting of the book, images, and text. Although the scale is almost what a 5 ¼-inch floppy disc held, the density is 5.5 petabits or 1 million gigabits per cubic millimeter.

The January 2013 issue of the Nature journal featured a study where scientists encoded computer files totaling 73 kilobytes of hard disk storage and with Shannon information of 5.2×10⁶ bits into DNA code. Then, they synthesized and sequenced DNA after which they reconstructed the original files with 100% accuracy. Shannon information is a term that refers to the mathematical theory of communication invented by a research mathematician Claude Shannon.

When talking about the history of DNA data storage efforts, it is important to mention a February 2015 report from Switzerland, where scientists succeeded in creating long-term stability of DNA encoding, and a study from March 2017 where scientists published a method known as DNA Fountain. The method allows perfect retrieval of information from a specific density (215 petabytes per DNA gram).

How does it work?

DNA data storage is a complicated process and the reality is that scientists are still working on methods that would make it easier, more stable, and convenient. Here is a simple explanation: in order to encode binary digital files into DNA sequence, it is necessary to convert the individual bits from 1 and 0 to letters A, C, G, and T. Each of these letters stands for one of four main compounds in DNA:

Adenine
Cytosine
Guanine
Thymine

The process of recovery of DNA damage requires the A, C, G, T sequence to be decoded back to the original sequence comprised of 1, and 0 (binary code).

Implications of DNA data storage

Implications of DNA data storage are numerous and they go beyond encoding a photo or book into the DNA sequence. Thanks to the amazing breakthrough of science and the wonderful combination of biology and technology, Microsoft decided to develop an apparatus whose purpose will be to use biology to replace tape drives. The primary goal of the tech giant is to have an operational storage system based on DNA that will work inside a data center. Not just that, Microsoft plans to accomplish this goal within a decade.

Why would tech companies want to store data in DNA? Encoding data in DNA sequences would solve the problem with huge physical storage and ever-growing need to look for data centers. For example, when formatted in DNA every movie ever made would fit into a volume that is smaller than a sugar cube. Microsoft has already done something in this field. In 2016, they stored 200 megabytes of data in DNA strands including a music video.

With DNA storage all data in the world could fit into one room. Imagine that! There would be no need for big tech companies to build gigantic data centers that require more money and energy and latter is particularly important for the environment. Benefits of DNA data could also extend to allowing future generations to learn about the world in an easy manner. DNA is ultracompact and can last for hundreds of thousands of years when kept in a cool and dry place. What’s more, human society will be able to decode DNA as long as we read and write it. Unlike other forms of data storage, DNA doesn’t degrade over time and it doesn’t become obsolete. Saving information for future generations could be simplified and more efficient with DNA data.

While DNA data storage can be practical for tech companies, its implications also extend to health and medicine. Recently, it was reported that Dubai wants to do a DNA sequence of its entire population. Why? The goal of DNA sequencing and storing that data is to prevent, mitigate, and eradicate diseases in the future. The genome project aims to improve the health of residents and thanks to the artificial intelligence it will be able to issue reports that support research, forecast future epidemics and disorders, and plan preventive measures.

Combining artificial intelligence or AI with DNA data would allow scientists to build a map that decodes functions of our genes and predicts the effect of gene mutations on the overall genome. This would also enable early detection of some health conditions and, thus, make treatments more effective. Of course, more research and experimentation is needed to make it happen.

There are still some concerns

Although DNA data storage and its combination with AI seem incredibly practical for health, medicine, technology, there are some concerns regarding the whole process. The biggest concern is, like in many endeavors, money. Converting digital bits into DNA code is expensive due to a chemical process that is needed for the manufacturing of DNA strands. During Microsoft’s project in 2016, the company used 13,448,372 unique pieces of DNA and it is estimated the cost of the process was $800,000. The question is whether Microsoft and other companies will be able to solve this problem and reduce the cost of DNA data storage. In fact, Microsoft explains that the cost of the process needs to be $10,000 or lower before it becomes adopted as a standard practice.

The second concern is the rate at which DNA is encoded with data. The rate of moving data to DNA in Microsoft’s experiment was 400 bytes per second, but it needs to fall to 100 megabytes for second in order to be widely adopted. Basically, the process is too slow and it is important to speed it up in order to make DNA data storage and sequencing a standard.

While speed and cost seem big problems, security concerns are even bigger. With the growing need to store data in DNA sequences, it is important to address potential threats and security risks that could come up. For example, in 2010 a nuclear plant in Iran had a major problem when a virus called Stuxnet made the equipment vibrate and caused numerous failures. Unlike regular viruses, this one didn’t only target computers but all the equipment that was controlled by them. This raises questions about DNA data storage too. How? Well, in order to encode and decode DNA sequences we need computers. That would mean that one such virus would not only harm the computer, but DNA storage it regulates. There are some concerns that creating a dangerous human pathogen would require good internet connection only. Plus, would this lead to bioterrorism and viruses that would cause genetic mutations and increase illnesses? At this point, there are many questions to which we don’t know answers.

Conclusion

At this point, DNA data storage is in experimental stages only, but it shows promising results. Microsoft plans to store its data to DNA within ten years. The practice would make data storing easier for big tech companies, but it would also contribute to health, medicine, and science. Besides many advantages, there are some setbacks such as expensive price and slow rate. The process of encoding data in DNA also comes with concerns regarding security. More experiments are necessary to get all the answers and with growing interest in this topic, we probably won’t wait too long to find out.

Reference

https://goo.gl/DgbRNT
http://21stcenturywiener.org/wp-content/uploads/2013/11/Machines-Smarter-Than-Man-Interview-with-Norbert-Wiener.pdf
http://www.ingentaconnect.com/content/asp/jobn/2007/00000001/00000001/art00004;jsessionid=10i91iywkteoq.x-ic-live-01
http://science.sciencemag.org/content/337/6102/1628
https://hms.harvard.edu/news/writing-book-dna-8-16-12
https://www.nature.com/articles/nature11875
http://onlinelibrary.wiley.com/doi/10.1002/anie.201411378/abstract;jsessionid=3AA1D99806D6A658669E714F74E07B17.f02t04
https://www.theatlantic.com/science/archive/2017/03/this-speck-of-dna-contains-a-movie-a-computer-virus-and-an-amazon-gift-card/518373/
http://whatis.techtarget.com/definition/DNA-storage
https://www.sciencealert.com/microsoft-could-be-storing-data-on-dna-within-the-next-three-years
https://www.technologyreview.com/s/601851/microsoft-reports-a-big-leap-forward-for-dna-data-storage/
http://www.zdnet.com/article/dubai-to-dna-sequence-its-entire-population/
https://www.wilsoncenter.org/blog-post/your-dna-avatar-what-happens-when-artificial-intelligence-meets-cutting-edge-genetics
https://www.technologyreview.com/s/607880/microsoft-has-a-plan-to-add-dna-data-storage-to-its-cloud/
https://www.livescience.com/61131-digital-dna-cyberbiosecurity-risks.html

Helen Santoro

Helen Santoro is a Boston-based science writer who has been working in the field of science for over six years. Before moving to the bustling city, Helen attended Hamilton College where he received Bachelor of Arts in Neuroscience. She then worked as a res archer at Boston Children’s Hospital where she helped uncover the mechanisms behind acute and chronic pain conditions with the long-term goal of improving patient care. Her writing has spanned from genome engineering to biochemistry to gender issues in the STEMM field.