World DNA
Technologies

World DNA

Noah, a six-year-old boy from Canada, has a disease that has no name. Doctors look at the shrinking part of the brain, called the cerebellum, on an MRI. They suspect that among the millions of words written in the letters of Noah's genetic code, there is a typo. So they send the boy's DNA out into the world via the Internet, hoping to find the same mistake in someone else.

A defect can be identified if the same error is found somewhere using network tools. Therefore, developers from Toronto began testing a system for exchanging genetic information between healthcare institutions in early 2016. The network currently includes hospitals from Canada, the US and the UK. The purpose of the MatchMaker Exchange system is to automate and globalize DNA comparison procedures. The goal of the computer scientists working on the project is to bring gene sequencing methods closer to modern telecommunication technologies. There are already about 200 XNUMX of them in the world. people whose genomes have been sequenced. Soon their number may reach millions.

Canadian MatchMaker co-creator David Haussler, a bioinformatics scientist at the University of California, Santa Cruz, co-founded the Global Alliance for Genomics and Health, GA2013GH, in 4 with a group of others, which he often compares to the Internet Standards Organization. W3C. Many well-known figures and entire companies, such as Google, managed to join the new organization, the seed of "World DNA". GA4GH is committed to improving protocols, developing programming interfaces (APIs) and file formats for transferring genetic data over the network.

One of the arguments in favor of creating such a "genetic" Internet is the rapidly growing volume of data generated in laboratories. The largest and most efficient centers sequence human genomes at a rate of two genomes per hour (it took thirteen years to sequence the first human). It is estimated that 85 petabytes of data will be produced worldwide this year. In 2019, there should be twice as many. And all this - unless a global network and the ability to search is created - will be in isolated, hard-to-reach databases. Under such conditions, it is impossible, for example, to test all similar mutations that lead to a particular type of cancer, in comparison with the drugs and therapies used. And being able to compare in a global database would be a great tool for clinicians. So Haussler created a genetic search engine called Beacon that searched twenty public DNA databases and implemented the GA4GH protocols. The searcher can ask questions about the positions of genetic "letters" on individual chromosomes of genomes in the database. Despite the recognition of the importance of widespread access to sequenced DNA for the progress of medicine, there is considerable resistance in society, as well as among doctors and researchers, to the exchange of such data. The idea of ​​placing human genomes on the Internet seems controversial to many. To prevent privacy violations, GA4GH offers a peer-to-peer Internet model.

Data in eternal chains

On the one hand, we are striving to create an Internet with DNA data - on the other hand, DNA is beginning to present an interesting alternative to computer data recording. A few months ago, a group of Swiss scientists from the Institute of Technology in Zurich presented a technique for encoding data in DNA chains in such a way that they could be stored without damage and errors for up to two thousand years! No other known human data recording technology can match this durability. Of course, an observant person will immediately ask how it was possible to prove the longevity of millennia in one presentation. It turns out that the Swiss have developed a simulation of such a long period by encapsulating the given DNA strands in silicone spheres and heating them to a temperature of about 72 ° C. According to scientists, a week of stay at this temperature is equal to 2. years at 10°C. After just such a simulation, no recording errors were noticed. The researchers also highlight other advantages of the DNA helix as a storage medium compared to hard drives or magnetic tapes. For example, a five terabyte book-sized disk can store this amount of data under optimal conditions for up to fifty years. The entry in the DNA code will not be binary, but will be based on the use of four nucleotide letters A, C, T and G. Talking about the achievements of the Swiss, New Scientist gave the following calculation: one gram of molecular DNA chains can encode 455 exabytes of information, and according to the company's calculations EMC Computer in 2011, the total amount of data collected on Earth was 1,8 zettabytes. One zettabyte is equal to 1 thousand. Exabytes, so about 2011 grams of DNA is needed to record data for 4. Of course, since 2011, the volume of global information has increased a little, and perhaps three grams should be added.

genetic informatics

thrives It is also worth remembering that there is already a programming language for DNA. It was developed in recent years by a group of scientists from the University of Washington in the USA. It is assumed that he controls the operation of the "chemical computer", as the systems used for DNA synthesis are called. The idea is not only to control chemical reactions like automation, robots, etc., but also to control the dosing of drugs. The creation of computer algorithms that make it possible, for example, to adapt artificial DNA molecules to the environment of living tissues in which they are to function, is a serious task. The biological world is much more complex and irregular than the machine world. However, difficult does not mean impossible. “Our idea is to create a universal language that can be used for many different tasks,” explained Georg Zelig from the DNA programming language team. The technology will eventually be used, among other things, to program self-assembly molecules in cells or create biosensors that monitor the state of the body at the cellular level. The algorithm used in DNA sequencing can also help protect against junk that floods the Internet, i.e. spam. The program, called Chung Kwei (from the Chinese feng shui talisman that protects the house from evil spirits), is almost 97 percent effective. It was based on the earlier Tiresias algorithm (Tiresias is the mythical Greek soothsayer) that was developed by bioinformaticians at IBM's Thomas J. Watson Research Center in New York working on DNA sequencing. This program looked for repeating sequences in the records of the genetic code, which usually provide important information. Instead of a genome, the scientists analyzed 65 data using an algorithm. the most common examples of spam. Each email was treated as a string of DNA. We were able to find 6 million repeating (more than one email) sequences of letters and numbers. Then a significant amount of ordinary correspondence (sometimes called ham - “ham” as opposed to spam - “breakfast”) was analyzed. Sequences that were repeated in ham and spam messages have been eliminated. Subsequently, incoming correspondence was analyzed. The higher the number of typical "spam sequences" per kilobyte of email, the more certain it is spam. Only one out of 65 normal emails was mistakenly stopped, and the spam recognition rate reached 96,56%.

Add a comment