3.2Transcription of Genes
Types of RNA
RNA is roughly classified into the three types (Table 3-1) of mRNA, rRNA and tRNA, which are involved in protein synthesis using genetic information (Fig. 3-4). A number of small RNA molecules, such as snRNA, are also known (see Fig. 3-10).
mRNA (messenger RNA) transcribes the genetic information for the primary structure of a protein and carries the information to the protein synthesis system. Types of mRNA are as numerous as those of genes, and since the size of proteins varies, the range of mRNA sizes also varies greatly. mRNA makes up less than 1% of the total amount of RNA in a cell.
rRNA (ribosomal RNA) in prokaryotes consists of the three types of 5S, 16S and 23S*1 (Table 3-1), while that in eukaryotes includes 5S, 5.5S, 18S and 28S. Approximately 95% of the RNA found in a cell is rRNA, and it forms complexes called ribosomes with many proteins. Ribosomes function as sites for protein synthesis.
tRNA (transfer RNA) is a small type of RNA with a size of around 4S, consisting of less than 100 nucleotides. There are 40 to 50 known types, which represent approximately 5% of RNA overall. During protein synthesis, they bind to amino acids and carry them to the site of protein synthesis. A particular tRNA binds to a particular amino acid; for example, tRNA bound with phenylalanine is denoted as tRNAPhe, and tRNA bound with methionine is denoted as tRNAMet.
Cells contain these RNA types, and RNA not translated into the primary structure of a protein (i.e., that other than mRNA) is collectively referred to as non-coding RNA. Although the regulation characteristics of RNA synthesis vary among RNA types, the basic method of synthesis is common to all RNA types.
S: The Svedberg unit, which describes the rate of sedimentation by ultracentrifuge. Although higher molecular weights result in higher S values, there is no linear relationship between the molecular weight and the S value (e.g., a doubling of the molecular weight does not mean a doubled S value).
Characteristics of Transcription
In DNA synthesis, the entire sequence of the parental DNA strand is accurately copied from one end to the other, and the entire DNA region is passed on from the parent cell to the daughter cells. On the other hand, RNA transcription occurs for gene regions only, rather than for the whole DNA (Fig. 3-5). The DNA region shown in Figure. 3-5A has five genes (a to e), meaning that five mRNAs are synthesized. Genes c and d in the figure show that the other DNA strand is being read in the reverse direction. In fact, RNA transcription occurs on sections containing information for the amino acid sequence (i.e., the coding regions) as well as the extra portions on both sides of the sections (Fig. 3-5B). A promoter is a DNA region to which RNA polymerase attaches (discussed later).
In transcription, using one of the DNA double strands as a template, nucleotides are connected one by one to form base pairs with the template bases. Since Us are used in RNA in place of the Ts in DNA, base pairs are formed between the As of DNA and the Us of RNA. The RNA synthesis reaction is simply described as:
[NMP] n + NTP ⇄ [NMP] n+1 + PPi
The direction of RNA synthesis is, like that of DNA synthesis, from 5’ to 3’, and is opposite to the direction of the template DNA; in other words, the directions of the RNA strand and the template DNA are opposite. In DNA synthesis, no reaction occurs when n=1, meaning that a primer is required; with RNA synthesis, however, reaction occurs at n=1, so no primer is needed.
In E. coli, RNA is synthesized by one type of RNA polymerase. Eukaryotes have at least three RNA polymerase types (I, II and III), each playing a different role. Type I is mainly involved in the synthesis of rRNA, Type II is involved in the synthesis of mRNA, and Type III is involved in the synthesis of tRNA. Compared with prokaryote enzymes, their structure is much more complex and consists of more than 10 subunits*2.
■Upstream and Downstream of Genes, and Base Sequence Numbers
Relative to a gene’s position, the direction toward and beyond the point at which RNA synthesis is initiated is called the upstream of a gene, and the direction toward which RNA is synthesized is called the downstream of a gene (Fig. 3-5). The first base at which RNA synthesis is initiated is No. 1, followed by Nos. 2, 3, 4, etc. in the downstream direction. Conversely, in the direction toward the upstream of a gene (the promoter side), the bases are numbered -1, -2, -3, etc. It should be noted that there is no base numbered zero. Additionally, base No. 1 is where RNA synthesis starts, but it is not the first codon encoding an amino acid of a protein; the first codon is located further downstream (at a base with a larger number).
■Binding of Polymerase to a Promoter
Promoter regions in eukaryotes (Fig. 3-6) include unique base sequences recognized by general transcription factors*3 (proteins that promote transcription) such as TATA boxes and CCAAT boxes*4. Prokaryotes have several types of protein called σ-factors that promote the binding of RNA polymerase to a particular promoter. The processes generally referred to as recognition and binding mean that a protein and a DNA molecule come close and, if their surface structures fit, connect with each other. Eukaryotes have a more complex mechanism with a higher number of gene types and many kinds of promoter sequence; however, the basic mechanism of eukaryotes and prokaryotes is similar in that both have frequently used basic promoter sequences to which transcription factors bind, thereby recruiting RNA polymerase.
■Roles of Promoters and the Initiation of Transcription
An important role of promoters is to determine the binding location and direction of RNA polymerase. Since RNA is synthesized in the 5’ to 3’ direction, the template DNA strand is read by RNA polymerase in the 3’ to 5’ direction. The basal transcription factors and RNA polymerase complex bound to DNA separates the DNA double strands, initiating RNA synthesis.
■Elongation of Transcription
The 5’-triphosphate of the first nucleotide in the synthesized RNA strand stays connected, and the 5’ end of the RNA is either pppA or pppG. The basal transcription factors involved in the binding of RNA polymerase do not move with the enzyme, and only the polymerase moves on DNA. The RNA strand synthesized is immediately released from DNA, and the two unwound DNA strands reform their original double strand on completion of RNA synthesis.
■Termination of Transcription
A DNA sequence that signals the termination of transcription in prokaryotes is called a terminator. A number of RNA dissociation mechanisms are known, such as synthesized RNA that forms a double-stranded shape (or hairpin structure) within itself, thereby detaching from the template DNA. The termination mechanism in eukaryotes is not clearly understood.
■Genes for rRNA and tRNA
The number of functioning (transcribed) genes in E. coli is over 2,000, a figure believed to be much higher in humans. Based on the information of mRNA transcribed from these genes, proteins - gene products - are all synthesized on ribosomes. There must therefore be a large number of protein synthesis systems in order to deal with the translation of the numerous mRNA molecules generated by all genes. This requires a large number of rRNA and tRNA molecules within cells. These molecules are therefore actively transcribed, and there are many genes for them. It can be said that the genes have been amplified; this is a mechanism with finality.
Subunits: When multiple proteins form a complex and collectively exhibit functions, each constituent protein is called a subunit. The term subunit does not necessarily refer to one protein; as an example, a subunit of a ribosome is a complex of RNA and several dozen proteins.
Basal transcription factors: Proteins needed when RNA polymerase binds to a promoter (transelements). These factors bind to a particular sequence on the promoter, which recruits RNA polymerase to DNA, thereby initiating RNA synthesis.
TATA and CCAAT boxes: DNA sequences in eukaryotes that are necessary when basal transcription factors bind to DNA. TATA boxes have the sequence TATAAA, while CCAAT boxes have the sequence GGCCAATCT, and transcription factors that recognize one of the two boxes exist. Many other sequences also exist.
The Possible Existence of More Non-coding RNA in Eukaryotes
Until very recently, it had been commonly thought that only genes (rather than other regions) were transcribed from the genome DNA. This idea is correct for prokaryotes, since their genome DNA mostly consists of genes. Although one human cell contains approximately 1,000 times as much DNA as E. coli, humans have only five times as many genes as E. coli; genes represent only a small portion of genome DNA in humans. However, it was recently reported as a major revelation that most of the genome of eukaryotes is transcribed. According to a paper published in Science magazine in September 2005, a comprehensive analysis of transcription products in mice, in which the transcription origin of 4.5 million RNA molecules was investigated, showed that they were transcribed from 70 % of the entire DNA. It was surprising that so many RNA molecules were transcribed from DNA regions previously not thought to be genes; if this was the case in mice, it would also hold true for humans. This RNA is believed to be non-coding RNA functioning as expression regulation RNA.