As part of the effort to map the human genome in its entirety, a less celebrated but equally important task is to document the portion of a gene's DNA that is transcribed into messenger RNA (mRNA), the information-carrying molecules that direct the synthesis of proteins. It is the mRNA that actually delivers genetic instructions to build proteins, which then carry out the many crucial functions needed for life. By comparing how the genome is expressed in both its coding process (controlled by DNA) and its actions (delivered by mRNA), researchers can better understand how genes work.
But a major obstacle to analyzing genes may be the choice of probes (which contain bits of DNA that encode mRNA) that are used in such experiments. In a paper to be published in Genome Research, a scientific team led by Dr. Victor Jongeneel at the Ludwig Institute for Cancer Research estimates that as many as half of all human genes encode multiple mRNA transcripts that end in different locations.
Relatively short variations are known to exist at the ends of mRNA transcripts, which do not have to be taken into account when preparing these probes for analysis by specialized tests called DNA chips or microarrays. However, the current study suggests that these variations are spread out over a longer distance (greater than 1000 nucleotides) and are driven by multiple signals, making it difficult to capture the full complexity of mRNA through current techniques.
"These findings have profound implications for understanding how genes function," said Dr. Jongeneel, who directs the Office of Information Technology at the Ludwig Institute's Branch in Lausanne, Switzerland. "We need better DNA chips to match the increasing complexity of our genes."
The diversity of mRNA is based on the fact that different parts of the gene can be used to make the mRNA, just as different sentences can be constructed by choosing some words and leaving others out. The mRNA samples--or "tags"--that are used for DNA chips are typically derived from a region close to the poly (A) tail, a type of cut off point that differentiates one messenger transcript from the other. DNA chips can measure thousands of these mRNA tags at a time.
But using just this portion of mRNA misses other transcripts that may deliver different instructions to make proteins. Start to analyze the function of genes with incomplete information, and it's like trying to follow a conversation in the middle of a sentence.
"If you are trying to find how a particular gene is expressed, you will only get a partial answer by looking at just one section of the mRNA," said Dr. Jongeneel.
For the study, which was done in collaboration with the National Cancer Institute, Dr. Jongeneel's team reviewed all publicly available data on the human genome and its corresponding transcriptome. The group analyzed the portion of mRNA found just before the poly (A) tails, which mark the places where mRNA sections end. By examining in detail a portion of human chromosome 21, they found that long-range variations in the mRNA ends affect at least half of the genes found in this region, or up to 20,000 of the possibly 35,000 genes that are currently thought to make up the human genome.
Taking these variations into account could improve DNA chips, and ultimately, lead to new insights on how to identify and counter proteins that cause disease.
"Including more mRNA tags for analysis should boost the amount of information that can be gathered in each experiment, and thus open new avenues for the diagnostic uses of DNA chips," said Dr. Jongeneel.
The paper will be posted June 12th on the Genomic Research web site (www.genome.org) in advance of the July publication date. The Ludwig Institute for Cancer Research is a non-profit global research organization with branches in seven different countries. More than 900 scientists and support staff from around the world conduct basic and clinical research at the Institute, focusing on cancer genetics and genomics, tumor immunology, and cell growth and differentiation.