Source: BiochemFFA_7_4.pdf. The entire textbook is available for free from the authors at http://biochem.science.oregonstate.edu/content/biochemistry-free-and-easy
In the preceding sections, we have discussed the replication of the cell's DNA and the mechanisms by which the integrity of the genetic information is carefully maintained. What do cells do with this information? How does the sequence in DNA control what happens in a cell? If DNA is a giant instruction book containing all of the cell's "knowledge" that is copied and passed down from generation to generation, what are the instructions for? And how do cells use these instructions to make what they need?
You have learned in introductory biology courses that genes, which are instructions for making proteins, are made of DNA. You also know that information in genes is copied into temporary instructions called messenger RNAs that direct the synthesis of specific proteins. This description of flow of information from DNA to RNA to protein is often called the central dogma of molecular biology and is a good starting point for an examination of how cells use the information in DNA.
Consider that all of the cells in a multicellular organism have arisen by division from a single fertilized egg and therefore, all have the same DNA. Division of that original fertilized egg produces, in the case of humans, over a trillion cells, by the time a baby is produced from that egg (that's a lot of DNA replication!). Yet, we also know that a baby is not a giant ball of a trillion identical cells, but has the many different kinds of cells that make up tissues like skin and muscle and bone and nerves. How did cells that have identical DNA turn out so different?
The answer lies in gene expression, which is the process by which the information in DNA is used. Although all the cells in a baby have the same DNA, each different cell type uses a different subset of the genes in that DNA to direct the synthesis of a distinctive set of RNAs and proteins. The first step in gene expression is transcription, which we will examine next (Figure 7.52).
Transcription is the process of copying information from DNA sequences into RNA sequences. This process is also known as DNA-dependent RNA synthesis. When a sequence of DNA is transcribed, only one of the two DNA strands is copied into RNA. We will consider what determines which strand of DNA is copied into RNA, later on.
But, apart from copying one, rather than both strands of DNA, how is transcription different from replication of DNA? DNA replication serves to copy all the genetic material of the cell and occurs before a cell divides, so that a full copy of the cell's genetic information can be passed on to the daughter cell. Transcription, by contrast, copies short stretches of the coding regions of DNA to make RNA. Different genes may be copied into RNA at different times in the cell's life cycle. RNAs are, essentially, temporary copies of the information in DNA and different sets of instructions are copied for use at different times and in different cell types.
Cells make several different kinds of RNA:
- mRNAs that code for proteins
- rRNAS that form part of ribosomes
- tRNAs that serve as adaptors between mRNA and amino acids during translation
- Small RNAs that regulate gene expression, including miRNAs and siRNAs
- Other small RNAs that have a variety of functions, including the small nuclear RNAs that are part of the splicing machinery.
- Long noncoding RNAs (lnc RNAs - Figure 7.53)
Building an RNA strand is very similar to building a DNA strand. This is not surprising, knowing that DNA and RNA are very similar molecules. Transcription is catalyzed by the enzyme RNA Polymerase. "RNA polymerase" is a general term for an enzyme that makes RNA. There are several different kinds of RNA polymerases in eukaryotic cells, while in prokaryotes, a single type of RNA polymerase is responsible for all transcription.
Like DNA polymerases, RNA polymerases synthesize new strands only in the 5' to 3' direction, but because they are making RNA, they use ribonucleotides (i.e., RNA nucleotides - Figure 7.54) rather than deoxyribonucleotides. Ribonucleotides are joined in exactly the same way as deoxyribonucleotides, i.e., the 3'OH of the last nucleotide on the growing chain is joined to the 5' phosphate on the incoming nucleotide to make a phosphodiester bond.
One important difference between DNA polymerases and RNA polymerases is that the latter do not require a primer to start making RNA. Once RNA polymerases are in the right place to start copying DNA, they just begin making RNA by joining together RNA nucleotides complementary to the DNA template.
This, of course, brings us to an obvious question- how do RNA polymerases "know" where to start copying on the DNA?
Unlike the situation in replication, where every nucleotide of the parental DNA must eventually be copied, transcription, as we have already noted, only copies selected portions of the DNA into RNA at any given time. Consider the challenge here: in a human cell, there are approximately 6 billion base-pairs of DNA. Much of this is non-coding DNA, meaning that it will not need to be transcribed. The small percentage of the genome that is made up of coding sequences still amounts to between 20,000 and 30,000 genes in each cell. Of these genes, only a small number will need to be expressed at any given time.What indicates to an RNA polymerase where to start copying DNA to make a transcript?
It turns out that patterns in the DNA sequence indicate where RNA polymerase should start and end transcription. These sequences are recognized by the RNA polymerase or by proteins that help RNA polymerase determine where it should bind the DNA to start transcription. A DNA sequence at which the RNA polymerase binds to start transcription is called a promoter. The DNA sequence that indicates the endpoint of transcription, where the RNA polymerase should stop adding nucleotides and dissociate from the template is known as a terminator sequence. The promoter and terminator, thus, bracket the region of the DNA that is to be transcribed.
A promoter is described as being situated upstream of the gene that it controls (Figure 7.57). What this means is that on the DNA strand that the gene is on, the promoter sequence is "before" the gene, or to put it differently, it is on the side of the gene opposite to the direction of transcription. Also notice that the promoter is said to "control" the gene it is associated with. This is because expression of the gene is dependent on the binding of RNA polymerase to the promoter sequence to begin transcription. If the RNA polymerase and its helper proteins do not bind at the promoter, the gene cannot be transcribed and it will therefore, not be expressed.
What is special about a promoter sequence? In an effort to answer this question, scientists examined many genes and their surrounding sequences (Figure 7.57). Because the same RNA polymerase has to bind to many different promoters, it would be predicted that promoters would have some similarities in their sequences. As expected, common sequence patterns were seen to be present in many promoters.
We will first take a look at prokaryotic promoters.
When prokaryotic genes were examined, the following features commonly emerged:
- A transcription start site (this the base in the DNA across from which the first RNA nucleotide is paired), which, by convention, is denoted as +1.
- A -10 sequence: this is a 6 bp region centered about 10 bp upstream of the start site. The consensus sequence at this position is TATAAT. In other words, if you count back from the transcription start site, the sequence found at roughly -10 in the majority of promoters studied is TATAAT.
- A -35 sequence: this is a 6 bp sequence at about 35 basepairs upstream from the start of transcription. The consensus sequence at this position is TTGACA.
It is important to understand that each nucleotide in a consensus sequence is simply the one that appeared at that position in the majority of promoters examined, and does not mean that the entire consensus sequence is found in all promoters. In fact, few promoters have -10 and -35 sequences that exactly match the consensus. The box at the left shows the -10 and -35 sequences by percentage of occurrence of each base in the promoter.
What is the significance of these sequences? It turns out that the sequences at -10 and -35 are necessary for recognition of the promoter region by RNA polymerase (Figure 7.58). The sequences at -10 and -35 may vary a little in individual promoters, as mentioned above, but the extent to which they are different is limited. It is only when the RNA polymerase has stably bound at the promoter that transcription can begin. The process by which the promoter is recognized and bound stably has been well studied for the RNA polymerase of E. coli.
Core polymerase and holoenzyme
The E. coli RNA polymerase is made up of a core enzyme of five subunits (α2ββ’and ω) and an additional subunit called the σ (sigma) subunit. Together, the σ subunit and core polymerase make up what is termed the RNA polymerase holoenzyme. The core polymerase is the part of the RNA polymerase that is responsible for the actual synthesis of the RNA, while the σ subunit is necessary for binding of the enzyme at promoters to initiate transcription.
The core polymerase and σ subunit are not always associated with each other. For the most part, the core polymerase is loosely associated with DNA, although it does not discriminate between promoters and other sequences in DNA, and the DNA strands are not opened up to allow transcription in this state. The role of the σ subunit is to reduce the affinity of the core polymerase for non-specific DNA sequences and to help the enzyme specifically bind to promoter sequences.
It is when the σ subunit associates with the core polymerase that the holoenzyme is able to bind specifically to promoter sequences. The initial binding of the holoenzyme at the promoter results in what is called a “closed” complex, meaning that the DNA template is still double-stranded and has not opened up to allow transcription. This closed complex is then converted to an “open” complex by the separation of the DNA strands to create a transcription bubble about 12-14 base-pairs long (Figure 7.60). The conversion of the closed complex to the open complex also requires the presence of the σ subunit.
Open complex & initiation
Once the open complex has formed, the DNA template can begin to be copied, and the core polymerase adds nucleotides complementary to one strand of the DNA. At this stage, known as initiation, the polymerase adds several nucleotides while still bound to the promoter, and without moving along the DNA template. Initially, short pieces of RNA a few nucleotides long may be made and released, without the polymerase leaving the promoter. Eventually, the enzyme makes the transition to the next stage, elongation, when an RNA of 8-9 bases is made and the enzyme moves beyond the promoter region.
Once elongation commences and the RNA polymerase is moving down the DNA template, the σ subunit is no longer necessary and may dissociate from the core enzyme. The core polymerase can move along the template, unwinding the DNA ahead of it to maintain a transcription bubble of 12-15 base-pairs and synthesizing RNA complementary to one of the strands of the DNA. As already mentioned, an RNA chain, complementary to the DNA template, is built by the RNA polymerase by the joining of the 5' phosphate of an incoming ribonucleotide to the 3'OH on the last nucleotide of the growing RNA strand. Behind the RNA polymerase, the DNA template is rewound, displacing the newly made RNA from its template strand.
As mentioned earlier, a sequence of nucleotides called the terminator is the signal to the RNA polymerase to stop transcription and dissociate from the template. Some terminator sequences, known as intrinsic terminators, allow termination by RNA polymerase without the help of any additional factors, while others, called rho-dependent terminators, require the assistance of a protein factor called rho (ρ).
How does the sequence of the terminator cause the RNA polymerase to stop adding nucleotides and release the transcript?
To understand this, it is useful to know that the terminator sequence precedes the last nucleotide of the transcript. In other words, the terminator is part of the end of the sequence that is transcribed (Figure 7.61).
In intrinsic terminators, this sequence in the RNA has self-complementary regions that can base-pair with each other to form a hairpin structure that contains a GC-rich run in the “stem” of the hairpin. This hairpin is followed by a single-stranded region that is rich in U’s (Figure 7.62). The secondary structure formed by the folding of the end of the RNA into the hairpin causes the RNA polymerase to pause. Meanwhile, the run of U’s at the end of the hairpin permits the RNA-DNA hybrid in this region to come apart, because the base-pairing between A’s in the DNA template and the U’s in the RNA is relatively weak. This allows the transcript to be released from the DNA template and from the RNA polymerase.
In the case of rho-dependent termination, an additional protein factor, rho, is necessary. Rho is a helicase that can separate the transcript from the template it is paired with. As in intrinsic termination, rho-dependent termination requires the formation of a hairpin structure in the RNA that causes pausing of the RNA polymerase. Meanwhile, rho binds to a region of the transcript called the rho utilization site (rut) and moves along the RNA till it reaches the paused RNA polymerase. It then acts on the RNA-DNA hybrid, releasing the transcript from the template.
Coupled transcription and translation
In prokaryotes, which lack a nucleus, the DNA is not separated from the rest of the cell in a separate compartment, so the mRNA is immediately available to the translation machinery, as the transcript is coming off the template DNA. Indeed, in prokaryotic cells, translation of the mRNA can begin before the entire gene has been transcribed. Ribosomes can assemble at the 5’ end of the transcript, as it is displaced from the template, while the 3’ end of the gene is still being copied. The lag time between transcription and translation is thus, very short in prokaryotes.
Transcription in eukaryotes
Although the process of RNA synthesis is the same in eukaryotes as in prokaryotes, there are some additional considerations in eukaryotes. One is that in eukaryotes, the DNA template exists as chromatin, where the DNA is tightly associated with histones and other proteins. The "packaging" of the DNA must therefore be opened up to allow the RNA polymerase access to the template in the region to be transcribed (Figure 7.63). The restructuring of chromatin to allow access to regions of DNA is thus an important factor in determining which genes are expressed.
Multiple RNA polymerases
A second difference is that eukaryotes have multiple RNA polymerases, not just one as in bacterial cells. The different eukaryotic polymerases transcribe different classes of genes. For example, RNA polymerase I transcribes the ribosomal RNA genes, while RNA polymerase III copies tRNA genes. The RNA polymerase we will focus on most is RNA polymerase II, which transcribes protein-coding genes to make mRNAs.
All three eukaryotic RNA polymerases need additional proteins to help them get transcription started. In prokaryotes, RNA polymerase by itself can initiate transcription (the σ subunit is a subunit of the RNA polymerase, not an entirely separate protein). The additional proteins needed by eukaryotic RNA polymerases are referred to as transcription factors. We will see below that there are various categories of transcription factors.
Transcription and translation are de-coupled
Finally, in eukaryotic cells, transcription is separated in space and time from translation. Transcription happens in the nucleus, and the RNAs produced are processed further before they are sent into the cytoplasm.
Protein synthesis (translation) happens in the cytoplasm. As noted earlier, in prokaryotic cells, mRNAs can be translated as they are coming off the DNA template, and because there is no nuclear envelope, transcription and protein synthesis occur in a single cellular compartment. A representative eukaryotic gene, depicted in Figure 7.64 shows that transcription starts some 25 bp downstream of the TATA box, and creates a transcript that begins with a 5’ untranslated region (5’UTR) followed by the coding region which may include multiple introns and ending in a 3’ untranslated region or 3‘UTR (Figure 7.64). As detailed below, the initial transcript is further processed before it is used.
Like genes in prokaryotes, eukaryotic genes also have promoters that determine where transcription will begin. As with prokaryotes, there are specific sequences in the promoter regions that are recognized and bound by proteins involved in the initiation of transcription. We will focus primarily on the genes encoding proteins that are transcribed by RNA polymerase II. Such promoters commonly have a TATA box, a sequence similar to the -10 sequence in prokaryotic promoters. The TATA box is a sequence about 25-35 basepairs upstream of the start of transcription (+1). (Some eukaryotic promoters lack TATA boxes, and have, instead, other recognition sequences, known as DPE, or downstream promoter elements.) Interestingly, the TATA box is not directly recognized and bound by RNA polymerase II. Instead, this sequence is bound by other proteins that, together with the RNA polymerase, form the transcription initiation complex.
Eukaryotic promoters also have, in addition, several other short stretches of sequences, that affect transcription, within about 100 to 200 base-pairs upstream of the transcription start site. These sequences, which are sometimes called upstream elements or promoter-proximal upstream elements, are bound by activator proteins that interact with the transcription complex that forms at the TATA box. Examples of such upstream elements are the CAAT box and the GC box (Figure 7.64).
Making transcripts in eukaryotes
We noted earlier that all eukaryotic RNA polymerases need additional proteins to bind promoters and start transcription. The proteins that help eukaryotic RNA polymerases find promoter sites and initiate RNA synthesis are termed general transcription factors. We will focus on the transcription factors that assist RNA polymerase II, the enzyme that transcribes protein-coding genes. These transcription factors are named TFIIA, TFIIB and so on (TF= transcription factor, II=RNA polymerase II, and the letters distinguish individual transcription factors).
Transcription by RNA polymerase II requires the general transcription factors and the RNA polymerase to form a complex, at the TATA box, called the basal transcription complex or transcription initiation complex (Figure 7.65).
This is the minimum requirement for any gene to be transcribed. The first step in the formation of this complex is the binding of the TATA box by a transcription factor, TFIID. TFIID is made up of several proteins, one of which is called the TATA Binding Protein or TBP. Binding of the TBP causes the DNA to bend at this spot and take on a structure that is suitable for the binding of additional transcription factors and RNA polymerase.
Interestingly, the binding of the TBP is a necessary step in forming a transcription initiation complex even when the promoter lacks a TATA box. The order of binding of additional proteins after binding of the TBP, as determined through in vitro experiments, appears to be TFIIB, followed by TFIIF and RNA polymerase II, then TFIIE. The final step in the assembly of the basal transcription complex is the binding of a general transcription factor called TFIIH. Some evidence suggests that following the binding of the TBP to DNA, the rest of the proteins in the initiation complex may assemble as a very large complex that then binds directly to the DNA. In any case, the presence of all of these general transcription factors and RNA Polymerase II bound at the promoter is necessary for the initiation of transcription.
As in prokaryotic transcription, once the RNA polymerase binds, it can begin to assemble a short stretch of RNA. This must be followed by promoter clearance, in order to move down the template and elongate the transcript. This requires the action of TFIIH. TFIIH is a multifunctional protein that has helicase activity (i.e., it is capable of opening up a DNA double helix) as well as kinase activity. The kinase activity of TFIIH adds phosphates onto the C-terminal domain (CTD) of the RNA polymerase II. This phosphorylation appears to be the signal that releases the RNA polymerase from the basal transcription complex and allows it to move forward on the template, building the new RNA as it goes (Figure 7.66).
Termination of transcription is not as well understood as it is in prokaryotes. Termination does not occur at a fixed distance from the 3’ end of mature RNAs. Rather, it seems to occur hand in hand with the processing of the 3’ end of the primary transcript. The polyadenylation signal in the 3’ untranslated region of the transcript appears to play a role in RNA polymerase pausing, and subsequent release of the completed primary transcript. Recognition of the polyadenylation signal triggers the binding of proteins involved in 3’end processing and termination.
Information Processing: Transcription
HERE & HERE
Figure 7.52 - Overview of eukaryotic transcription
Figure 7.53 - Transcripts may code for protein or may be functional as RNAs
Figure 7.55 - RNA (green) being synthesized from DNA template (blue strand) by T7 RNA polymerase (purple). The non-template DNA strand is in red.
Figure 7.54 - The four ribonucleotides for making RNA
Figure 7.56 - Central dogma - DNA to RNA to protein
HERE & HERE
Figure 7.57 - Sequences upstream of transcription start site in several prokaryotic genes
Image by Martha Baker
T A T A A T
77% 76% 60% 61% 56% 82%
T T G A C A
69% 70% 61% 56% 54% 54%
Figure 7.58 - RNA polymerase promoter binding
Figure 7.59 - A bacterial RNA polymerase (α2ββ’and ω)
Figure 7.60 - Synthesis of RNA in the transcription bubble
Figure 7.61 - Promoter and Terminator sequences determine where transcription starts and ends.
Figure 7.62 Transcription termination by intrinsic (top) and rho-dependent (bottom) mechanisms
Figure 7.63 - Eukaryotic DNA is complexed with proteins in chromatin
HERE & HERE
Figure 7.64 - Region surrounding the transcriptional start site in eukaryotic DNA
Figure 7.65 - Transcription pre-initiation complex in eukaryotes