Why DNA Is Spelled ATGC

Taking charge. Thomas Insel (left) and Ting-Kai Li.

Of the 16 nucleotide bases that could pair up to make DNA, why do only A, T, G, and C make up the genomic alphabet? Researchers have long put it down to the composition of the primordial soup in which the first life arose. But Dónall Mac Dónaill of Trinity College Dublin says the choice incorporates a tactic for minimizing errors similar to that used by error-coding systems incorporated into credit card numbers, bank accounts, and airline tickets.

In the error-coding theory first developed in 1950 by Bell Telephone Laboratories researcher Richard Hamming, a so-called parity bit is added to the end of digital numbers to make the digits add up to an even number. For example, when transmitting the number 100110, you would add an extra 1 onto the end (100110,1); the number 100001 would have a zero added (100001,0). Because the most likely transmission error--switching a single digit from 1 to 0 or vice versa--causes the sum of the digits to be odd, the recipient of an odd number can assume that an error occurred.

Mac Dónaill asserts, in a forthcoming issue of Chemical Communications, that a similar process was at work in the choice of bases in the genetic alphabet. To demonstrate this, he represented each nucleotide as a four-digit binary number. The first three digits represent the three bonding sites that each nucleotide presents to its partner. Each site is either a hydrogen donor or acceptor; a nucleotide offering donor-acceptor-acceptor sites would be represented as 100 and would only bond with an acceptor-donor-donor nucleotide, or 011. The fourth digit is 1 if the nucleotide is a single-ringed pyrimidine type and 0 if it is a double-ringed purine type. Nucleotides readily bond with members of the other type.

Mac Dónaill noticed that the final digit acts as a parity bit: The four digits of A, T, G, and C all add up to an even number. Banishing all odd-parity nucleotides from the DNA alphabet reduces errors, Mac Dónaill says. For example, nucleotide C (100,1) binds naturally to nucleotide G (011,0), but it might accidentally bind to the odd parity nucleotide X (010,0), because there is just one mismatch. Such a bond would be weak compared to C-G but not impossible. However, C is highly unlikely to bond to any other even-parity nucleotides, such as the idealized amino-adenine (101,0), because there are two mismatches.

"It is a novel idea which should provoke others to explore aspects of informatics in the genetic code," says computational chemist Graham Richards of Oxford University. "Instinctively, one feels that the DNA code should have evolved systems to minimize errors. Mac Dónaill's work shows how this could have been achieved."

Related sites
Mac Dónaill's site

Follow News from Science