Skip to main content
Chemistry LibreTexts

Protein-RNA Recognition

RNA-protein interactions are behind a number of vital processes in the cell. Without the ability of particular proteins to bind RNA, the RNA would no longer be able to carry out its important functions as a component of the ribosome1,2 and spliceosome.3 Other examples of important RNA-protein interactions include binding of tRNA to aminoacyl-tRNA synthetases, a process vital to translation of genetic information into proteins necessary for continued biological function4  and regulation of post-transcriptional control of gene expression via the binding of RNA to riobonucleoproteins, or RNPs.5

Although not as well characterized as the binding between DNA and proteins, RNA-protein binding has been a field that has seen a great deal of growth in recent years. Although it was originally expected that RNA-protein binding motifs might fall neatly into categories the way DNA motifs did, the wide range of secondary and tertiary RNA structures that can be recognized by proteins requires more variety in binding motifs of the proteins, and the rules used to categorize them become correspondingly more complex.6 At this time all major families of RNA-binding proteins have been structurally characterized and these characterizations have led to a much better understanding of RNA recognition.7

Major Families of RNA-Binding Proteins

Arginine-Rich Motif

Positively charged amino acid side chains are important in the binding of many proteins to both DNA and RNA. Certain viral and bacterial proteins are known to contain regions densely populated by Arg and Lys residues. These peptides contribute to recognition of the major groove of the RNA and are commonly known as an arginine-rich motif.7 This term is slightly misleading because the positions of the arginines are not conserved between proteins and therefore cannot be a true motif.8 Furthermore, while these arginine-rich sequences within the overall protein are necessary for binding, when removed from the context of the larger protein the fragments tend to become unstructured and are not sufficient for binding.7

One example of such a peptide is Rev, which is involved in the regulation of gene expression and is encoded by human immunodeficiency virus (HIV-1). The binding of Rev to the Rev Responsive Element (RRE) of RNA is an essential step in the HIV life cycle and involves the widening of the major groove to facilitate binding.7,9,10 After binding to the RNA hairpin Rev adopts an α-helical conformation. Conformational changes in the peptide after recognizing the major groove are typical of arginine-rich motifs.7

 
rev-f.jpg
Figure 1. RNA major groove recognition by arginine-rich α-helix of HIV-1 rev peptide-RRE RNA complex. (PDBID:1ETG)
 

Other examples of proteins with arginine-rich motifs in them include Tat, the main binding domain of bovine immunodeficiency virus,11 and N protein, from bacteriophage λ.12 Both of these proteins also form α-helices which are inserted into the major groove of the RNA, although in the case of N protein the helix is distorted.11,12

tat-f.jpg
Figure 2. β-hairpin domain of BIV Tat protein complexed with TAR region RNA. (PDBID:1MNB)

αβ protein domains

There are three major families with αβ motifs that bind to RNA.7 These families are the RNA recognition motif (RRM or RNP), the K-homology (KH) domain, and the double-stranded RNA binding motif (dsRBM).  In addition to these three families, there exist a variety of less common αβ motifs used to bind RNA, such as the Piwi Argonaut and Zwille domain (PAZ).7

RNA Recognition Motif

Proteins with the RRM make up the largest family of RNA binding proteins, and the RRM itself is the most completely characterized binding domain.7 It is composed of 90 amino acids with a sequence of eight conserved residues.13 The motif itself contains a β1α1β2β3α2β4 fold with the first and third β-strands creating an anti-parallel sheet.8 The conserved residues, which are mostly aromatic and positively charged, are located in the β1and β3 strands within the hydrophobic core of the domain.14 Most RRMs are highly sequence specific and have a high binding affinity.13

The RRM directs RNA processing factors to specific RNAs by supplying an RNA binding platform, one of its major functions.The RRM is also involved in post-transciptional events in eukaryotes.15  The function of RRM can be highly varied due to the 30+ structures it is capable of adopting by altering loops, positions of secondary structures, and secondary structures themselves.13 Function can also vary due to its ability to utilize multiple RRM domains, different motif domains, N- and C- terminal extensions, and protein cofactors.13

The protein Human U1A contains the RNA recognition motif RRM, which recognizes the RNA hairpin internal loop AUUGCAC sequence and hydrogen bonds to the bases.16,17 In the specific case of U1A, the β23 loop is inserted into the hairpin loop of the RNA. Prior to binding between the protein and the RNA, both the β23 loop of the protein and the hairpin loop of the RNA are disorganized.14,18 The interaction between the protein and RNA thus can be understood as a mutual adaptation of the structures of the molecules with important inter- and intramolecular interactions.

u1a-f.jpg

Figure 3. Structure of Human U1A protein with bound RNA. β2-β3 loop is inserted into the hairpin loop of the RNA. (PDBID:1URN)
K-Homology Domain
The K-homology (KH) motif is approximately 70 amino acids in length. Proteins typically contain more than one KH domain, although exceptions exist.19  Multiple copies of the KH domain increases specificity and micromolar affinity, which is low when only a single copy is present.20 There are several conserved moieties in the domain, including a variable loop that is 3 to 60 residues long, a GXXG loop, and a hydrophobic residue (Ile or Leu).21  Also conserved are a β-sheet, consisting of three β-strands, and three α-helices abutted to the β-sheet. 

Variation in the β-strands gives rise to two types of KH folds.22 The three β-strands are identified as β1β2, and β' where β1and β2 are parallel and β' is antiparallel.  Type I folds are usually found in eukaryotes with the three β-strands aligned in the order β1β', β2.  Type II folds are usually found in prokaryotes, with the strands aligned as β1β2β'. A second difference between the folds is that type I folds  generally have multiple copies of the KH domain, but type II folds usually have only one copy.21

KH can interact with single strands of RNA or DNA.  The strand is bound between the α-1 and α-2 helices, and between the GXXG loop and the variable loop.21 The KH domain is found primarily in proteins performing transcriptional and translational regulation by binding single stranded RNA. KH only binds to 4 bases, although multiple KH domains can be found to work in conjunction to bind longer sequences.21

kh2-fff.jpg
Figure 4. Crystal structure of Nova-2 protein KH3 domain recognizing single-stranded SELEX RNA sequence 5'-UCAC. (PDBID: 1EC6)

Double Stranded RNA Binding Motif 

The double stranded RNA binding motif (dsRBM) contains an αββα fold which binds to RNA molecules.23 This motif appears with a variable number of copies of the domain in a protein, with the maximum being 5.24  Studies show that dsRBMs are not sequence specific, but do require A-form helical confirmation of RNA.25

The dsRBM can be divided into two categories, based on the absence or presence of a catalytic domain.24  RNaseIII, Dicer, and Drosha enzymes are involved in dsRNA processing in the RNAi/miRNA pathway and all three contain the dsRBM and a catalytic domain.25 RNA helicase A (RHA), a transcriptional cofactor, also contains dsRBM with a catalytic domain.26  The staufen protein and Xenopus laevis RNA-binding protein (Xlrbpa) have the motif without a catalytic domain, and both are involved in RNP localization.27,28 

In targeting RNA, the A-form helix geometry is recognized by residues on loops 2 and 4 of the dsRBM domain. Contact occurs between loops 2 and 4 of the αββα fold and 2'OH groups and phosphate groups in the nearby minor and major grooves.25 Another mode of interaction between dsRBM and its targets involves the first helix of a dsRBM protein recognizing helical or non-helical secondary structure elements that possess a minor groove-type surface.29 

The dsRBM is unusual in that it binds specifically in vivo,30 but it binds non-specifically in vitro,31,32 and the determination of what dictates its binding specificity is an area of ongoing study.

dsRBM-f.jpg
Figure 5. dsRBM of Rnt1p RNase III complexed with 5' terminal RNA hairpin of snR47 precursor. (PDBID:1T4L)

Zinc Fingers

Zinc fingers are a common structural motif in which conserved cysteines and histidines are coordinated with zinc to form small, independently folded domains.33 Each protein typically contains multiple copies of these domains.7,33  The classic Cys-Cys-His-His motif (CCHH) makes up about 3 % of the human genome.7 Zinc fingers are found most often in DNA-binding proteins but it has been discovered that some motifs, such as the CCCH- and CCHC-type, can bind to RNA.

The fold of zinc fingers is composed of two distorted β-hairpins which surround a tryptophan residue and a zinc ion.34,35 Zinc fingers bind to double stranded RNA by way of contact with phosphates and hydrophobic stacking.33 In single stranded RNA, the aromatic sidechains of the motif insert between dinucleotide bases in an AU-rich region.33  Water-mediated hydrogen bonds and hydrogen bonds to protein atoms are involved in binding, though there is no interaction with the phosphate backbone.36

Zinc finger domains can recognize many secondary and tertiary structures of RNA, including internal loops and helical elements closed by hairpin loops.7 Zinc fingers have a micromolar binding affinity for RNA and are sequence specific.36 Sequence-specific recognition is accomplished with exposed bases in loops and through interactions mediated by the backbone.7  

Transcription factor IIIA (TFIIIA) is the main component of the initiation complex needed for the transcription of genes encoding eukaryotic ribosomal 5S RNA.33  It contains nine zinc fingers and is capable of binding to both DNA and the ribosomal 5S RNA. The central 4 through 6 fingers bind to RNA, with only a few contacts to nucleotide bases.37

 

zinc-f.jpg
Figure 6. Structure of three-finger TFIIIA-5S RNA complex demonstrating zinc-finger binding. Zinc ions are shown in magenta. (PDBID:1UN6)

Multimeric Proteins

Multimeric proteins form heteromeric or homomeric structures by employing the use of multiple proteins or by repeating the same integral structural motif.7  In many cases, the details of the actual method of recognition for RNA of these proteins is not well known.7,38

Human Pumilio protein, a translational regulator, utilizes multiple copies of the protein to form its multimeric complex. RNA recognizes and binds to the inner surface of the complex.39

multi-f.jpg
Figure 7. Structure of modular protein Human Pumilio bound with noncognate RNA. (PDBID:3BSX)

RNA-Targeting Enzymes

Given the wide variety of enzymes that act on RNA, considering all such enzymes as one protein family is absurd, though they all share the function of modifying RNA.7  Naturally, modes of recognition will vary greatly depending on the enzyme involved. The enzymatic domain of the protein binds to the RNA, often increasing enzymatic activity.7

One example of an enzyme recognizing and acting on RNA are aminoacyl-tRNA syntheses (aaRSs), which prevent the mistranslation of specific amino acids.40 Many aaRSs have been studied, often with focus on key residues.41  A large amount of variation occurs in the interactions points as well as the nucleotides responsible for specificity.  A common feature conserved in all aaRS studied thus far is contact between the anti-codon stem of tRNA and the sythetase.41  The orientation of the interaction differs depending on the specific aaRS involved.42  Another motif found is 2-loop, which is in the active site and involves binding ATP, an amino acid, and the acceptor end of tRNA.43 It requires glycine for flexibility.44

RNA Structural Motifs

Like the proteins that bind to it, RNA has structural motifs that have roles in interaction and recognition, though many of these roles have not been thoroughly explored.45

In addition to helices, RNA is comprised of various loops which fold characteristically and may also have sequence patterns.46 Two main RNA moieties, aside from helices, are the hairpin loop, which connects two anti-parallel chains within a helix, and the internal loop, which joins two helices.45 Some of the known RNA motifs are explained below.  

The most common motif is a tetraloop, a hairpin loop composed of four conserved residues.45 These residues fall into specific consensus sequence patterns: GNRA, where N is any base and R is G or A, UNCG, and CUUG.46  Assumed functions of the tetraloop include initiation folding of RNA molecules, stabilization of helical stems, and the contribution of recognition elements for protein binding.47  These loops likely play a large role in proteins interactions due to tetraloops always being exposed.46  Tetraloops are an example of 1-loops, which stay within one helix.46

Another common motif is the kink-turn (K-turn), a 15 nucleotide sequence from two different segments of RNA which base pair and create an internal loop and two helices.45  This is an example a 2-loop, in that two helices are joined.46  E-loops are similar to kink-turns, but they join asymmetrical helices.46 Like tetraloops, kink-turns have been found to be significantly involved with bound proteins.46  There are a wide variety of 2-loops, and they can be double stranded, like kink-turns or E-loops, or single stranded, such as Π-turns and Ω-turns.45,46

References

  1. Moore, P.B. The three-dimensional structure of the ribosome and its components. Annu. Rev. Biophys. Biomol. Struct. 1998, 27, 35–58.
  2. Ramakrishnan, V.; White, S.W. Ribosomal protein structures: insights into the architecture, machinery and evolution of the ribosome. Trends Biochem. Sci. 1998, 23, 208–212.
  3. Luhrmann, R.; Kastner, B.; Bach, M. Structure of spliceosomal snRNP’s and their role in pre-mRNA splicing. Biochim. Biophys. Acta. 1990, 1087, 265–292.
  4. Moras, D.  Aminoacyl-tRNA synthetases. Curr. Opin. Struct. Biol. 1992, 2, 138–142.
  5. Varani, G.; Nagai, K. RNA recognition by RNP proteins during RNA processing. Annu. Rev. Biophys. Biomol. Struct. 1998, 27, 407–445.
  6. Tan, R.; Frankel, A.D. Costabilization of Peptide and RNA Structure in an HIV Rev Peptide-RRE Complex. Biochem. 1994, 33, 14579-14585.
  7. Chen, Y.; Variani, G. Protein families and RNA recognition. FEBS J. 2005, 272, 2088-2097.
  8. Draper, D.E. Themes in RNA-Protein Recognition. J. Mol. Biol. 1999, 293, 255-270.
  9. Heaphy, S.; Finch, J.T.; gait, M.J.; Karn, J.; Singh, M. Human immunodeficiency virus type 1 regulator of virion expression, Rev, forms nucleoprotein filaments after binding to a purine-rich ‘bubble’ located within the Rev-response region of viral mRNAs. Proc. Natl. Acad. Sci. USA. 1991, 88, 7366–7370. 
  10. Battiste, J.L.; Mao, H.; Rao, N.S.; Tan, R.; Muhandiram ,D.R.; Kay, L.E.; Frankel, A.D.; Williamson, J.R. a-Helix-RNA major groove recognition in an HIV-1 Rev peptide-RRE RNA complex. Science. 1996, 273, 1547– 1551. 
  11. Puglisi, J.D.; Chen, L.; Blanchard, S.; Frankel, A.D. Solution structure of a bovine immunodeficiency virus Tat-TAR peptide-RNA complex. Science. 1995, 270, 1200–1203. 
  12. Legault, P.; Li, J.; Mogridge, J.; Greenblatt, J.; Kay, L.E. NMR structure of the bacteriophage k N-Peptide ⁄BoxB RNA complex: recognition of a GNRA fold by an arginine-rich motif. Cell. 1998, 93, 289–299. 
  13. Maris, C.; Dominguez, C.; Allain, F.H.T. The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression. FEBS J. 2005, 272, 2118-2131.
  14. Nagai, K.; Oubridge, C.; Jessen, T.H.; Li, J.; Evans, P.R. Crystal structure of the RNA binding domain of the U1 small nuclear ribonucleoprotein A. Nature. 1990, 348, 515-520.
  15. Birney E.; Kumar S.; Krainer A.R. Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors. Nucleic Acids Res. 199321, 5803–5816.
  16. Scherly, D.; Boelens, W.; van Venrooij, W.J.; Dathan, N.A.; Hamm, J.; Mattaj, I.W. Identification of the RNA binding segment of human U1 A protein and definition of its binding site on U1 snRNA. EMBO J. 1989, 8, 4163-4170.
  17. van Gelder, C.W.G.; Gunderson, S.I.; Jansen, E.J.R.; Boelens, W.C.; Polycarpou-Schwartz, M.; Mattaj, I.W.; van Venrooij, W.J. A complex secondary structure in U1A pre-mRNA that binds two molecules of U1A protein is required for regulation of polyadenylation. EMBO J. 1993, 12, 5191-5200.
  18. Oubridge, C.; Ito, N.; Evans, P.R.; Teo, C-H.; Nagai, K. Crystal structure at 1.92 Å resolution of the RNA-binding domain of the U1A spliceosomal protein complexed with an RNA hairpin. Nature. 1994, 372, 432–438.
  19. Siomi H., Matunis M.J., Michael W.M., Dreyfuss G. The pre-mRNA binding K protein contains a novel evolutionarily conserved motif. Nucleic Acids Res. 1993: 21, 1193–1198.
  20. 20. Lunde B.M., Moore C., Varani G. RNA-binding proteins: modular design for efficient function. Nat. Rev. Mol. Cell. Biol. 2007, 8, 479–490
  21. Valverde, R.; Edwards, L.; Regan, L. Structure and function of KH domains. FEBS J. 2008, 275, 2712-2726.
  22. Grishin N.V. KH domain: one motif, two folds. Nucleic Acids Res. 2001, 29, 638–643.
  23. Liao, H.J.; Kobayashi, R.; Mathews, M.B. Activities of adenovirus virus-associated RNAs: Purification and characterization of RNA-binding proteins. Proc. Natl. Acad. Sci. USA. 1998, 95, 8514–8519.
  24. Fierro-Monti, I.; Mathews, M.B. Proteins binding to duplexed RNA: one motif, multiple functions. Trends Biochem. Sci. 2001, 25, 241–246.
  25. Chang, K; Ramos, A. The double-stranded RNA-binding motif, a versatile macromolecular docking platform. FEBS J. 2005, 272, 2109-2117.
  26. Zhang, S.; Grosse, F. Domain structure of human nuclear DNA helicase II (RNA helicase A). J. Biol. Chem. 1997, 272, 11487–11494.
  27. St Johnston, D.; Beuchle, D.; Nusslein-Volhard, C. Staufen, a gene required to localize maternal RNAs in the Drosophila egg. Cell. 1991, 66, 51–63.
  28. Eckmann, C. R.;  Jantsch, M. F. XlrbpA, a double- stranded RNA-binding proteins associated with ribosomes and heterogeneous nuclear RNPs. J. Cell. Biol. 1997, 138, 239–253.
  29. Wu, H.; Henras, A.; Chanfreau, G.; Feigon, J. Structural basis for recognition of the AGNN tetraloop RNA fold by the double-stranded RNA-binding domain of Rnt1p RNase III. Proc. Natl. Acad. Sci. USA. 2004, 101, 8307–8312.
  30. Ferrandon, D.; Elphick, L.; Nusslein-Volhard, C.; St Johnston, D. Staufen protein associates with the 3'UTR of bicoid mRNA to form particles that move in a microtubule-dependent manner. Cell. 1994, 79, 1221–1232.
  31. Bevilacqua, P.C.; Cech, T.R. Minor-groove recognition of the double-stranded RNA-binding domain from the RNA-activated protein kinase PKR. Biochem. 1996, 35, 9983–9994. 
  32. Bevilacqua, P.C.; George, C.X.; Samuel, C.E.; Cech, T.R. Binding of the protein kinase PKR to RNAs with secondary structure defects: role of the tandem A-G mismatch and noncontiguous helixes. Biochem. 1998, 37, 6303–6316.
  33. Brown, R. S. Zinc finger proteins: getting a grip on RNA. Curr. Opin. Struct. Biol. 2005, 15, 94-98
  34. Plambeck, C. A.; Kwan, A.H.; Adams, D.J.; Westman, B.J., van der Weyden, L.; Medcalf, R.L.; Morris, B.J.; Mackay, J.P. The structure of the zinc finger domain from human splicing factor ZNF265 fold. J. Biol. Chem. 2003, 278, 22805–22811.
  35. Wang, B.; Alam, S.L.; Meyer, H.H.; Payne, M.; Stemmler, T.L.; Davis, D.R.; Sundquist, W.I. Structure and ubiquitin interactions of the conserved zinc finger domain of Npl4. J. Biol. Chem. 2003, 278, 20225–20234.
  36. Loughlin, F. E.; Mansfield, R. E.; Vaz, P. M.; McGrath, A. P.; Setiyaputra, S.; Gamsjaeger, R.; Chen, E. S.; Morris, B. J.; Guss, J. M.; Mackay, J. P. The zinc fingers of the SR-like protein ZRANB2 are single-stranded RNA-binding domains that recognize 5′ splice site-like sequences. Proc. Natl. Acad. Sci. USA. 2009, 106, 5581–5586.
  37. Searles, M.A., Lu, D., Klug, A. The role of the central zinc fingers of transcription factor IIIA in binding to 5S RNA. J. Mol. Biol. 2000, 301, 47-60.
  38. Thore, S.; Matyer, C.; Sauter, C.; Weeks, S.; Suck, D.; Crystal structures of the Pyrococcus abyssi Sm core and its complex with RNA. J. Biol. Chem. 2003, 278, 1239–1247.
  39. Wang, X.; McLachlan, J.; Zamore, P.D.; Tanaka-Hall, T.M.; (2002) Modular recognition of RNA by a human Pumilio-homology domain. Cell. 2002, 110, 501–512.
  40. Beebe, K; Mock, M; Merriman, E; Schimmel, P. Distinct domains of tRNA synthetase recognize the same base pair. Nature. 2008, 451, 90-93.
  41. Theobald, A; Springer, M; Grunberg-Manago, M; Ebel, J; Giege, R. Tertiary structure of Escherichia coli tRNA3THR in solution and interaction of the tRNA with the cognate thereonyl-tRNA synthetase. Euro. J. Biochem. 1988, 175, 511-524.
  42. Moras, D.; Lorber, B.; Romby, P.; Ebel, J. P.; Giegie, R.; Lewit-Bentley, A; Roth, M. J. Biomol. Struct. Dyn. 1983, 1, 209-233.
  43. Gruic-Sovulj, I.; Landeka, I.; Söll, D.; Weygand-Durasevic, I. tRNA-dependent amino acid discrimination by yeast seryl-tRNA synthetase. Euro. J. Biochem. 2002, 269, 5271-5279.
  44. Cusack, S.; Yaremchuk, A; Tukalo, M. The crystal structure of the ternary complex of T. thermophilus seryl-tRNA synthetase with tRNASer and a seryl-adenylate analogue reveals a conformational switch in the active site. EMBO J. 1996, 15, 2834–2842.
  45. Ciriello, G.; Gallina, C.; Guerra, C. Analysis of interactions between ribosomal proteins and RNA structural motifs. BMC Bioinformatics. 2010, 11, Suppl 1:S4.
  46. Apostolico, A.; Ciriello, G.; Guerra, C.; Heitsch, C.E.; Hsiao, C.; Williams, L.D. Finding 3D motifs in ribosomal RNA structures. Nucleic Acids Res. 2009, 37.
  47. Hsiao, C.; Mohan, S.; Hershkovitz, E.; Tannenbaum, A.; Williams, L.D. Single nucleotide RNA choreography. Nucleic Acids Res. 2006, 34, 1481–1491.

Contributors

  • Therese Gerbich- Truman State University, Kirksville, MO
  • Ashley Hoaglin- Truman State University, Kirksville, MO