Streptococcus pyogenes CRISPR-Cas9

in Complex with Guide RNA and DNA

  Trevor Manz '17 and Laura Duncan '17


I. Introduction

Clustered regularly interspaced short palindromic repeats (CRISPR) are sections of prokaryotic genomic DNA.  These sections contain repetitions of a DNA sequence separated by small fragments of unique "spacer DNAs."  The DNA spacers are derived from viral phage DNA and integrated into the genome between repeats.  Repeat-spacer arrays are located in relation to Cas (CRISPR-associated) genes.  The CRISPR/Cas system functions as an acquired bacterial immune defense system as Cas proteins associate with transcribed spacer DNA (RNA) to target foreign viral DNA.  If viral DNA is identified via hybridization with the CRISPR RNA, CRISPR/Cas proteins' nuclease activity will cleave the DNA, inducing a double stranded break (DSB) and rendering the phage DNA inactivate.  

The sequence-specific manner in which Cas proteins target and cleave viral DNA led to the discovery and development of CRISPR/Cas systems as a tool for gene editing1The targeting CRISPR RNA (sgRNA) can be synthesized to match a specific eukaryotic genomic locus, enabling the CRISPR/Cas system to cleave the locus of interest.  This effectively provides researchers with a relatively inexpensive and readily programmable tool to generate transgenic cell and animal lines.  The goal of this tutorial is to elucidate how Streptococcus pyogenes Cas9, one of the the most widely used Cas proteins in gene engineering, binds target DNA via sgRNA and executes the chemical reactions required for cleavage.   

II. General Structure

Color Scheme:

Cas9's crystalline structure consists of two lobes: the recognition (REC) lobe, which binds sgRNA and DNA, and the nuclease (NUC) lobe, which cuts target DNA. A positively charged groove between the REC and NUC lobes2 provides a location for the negatively-charged to form.  Three regions make up the : the bridge helix, the Rec1 domain, and the Rec2 domain.  The is also comprised of three regions, the RuvC domain, the HNH domain, and the PAM-interacting domain.

III. Nuclease Activity

The RuvC domain within the NUC lobe contains , Asp10*, Glu762, His983, and Asp986, which allow RuvC to cleave DNA or RNA via a two metal mechanism.  This two metal mechanism incorporates Mg2+ ions to stabilize the breaking of a phosphodiester bond causing a single strand break in DNA or RNA, consistent with the method used by other retroviral integrase superfamily nucleases, as illustrated below.

*mutated to alanine to prevent DNA cleavage during crystallization.


Putative two-step mechanism used by RuvC3 :

The HNH domain, also within the NUC lobe, contains , Asp839, His840**, and Asn863.  Cleavage of DNA or RNA is made possible through a single-metal mechanism, which utilizes a Mg2+ ion to stabilize the breaking of a phosphodiester bond, causing a single stranded break in DNA or RNA.  The putative mechanism for HNH superfamily nucleases is illustrated below.

**mutated to alanine to prevent DNA cleavage during crystallization.

Putative two-step mechanism used by HNH nucleases4:

IV. sgRNA:DNA Heteroduplex

The is comprised of a folded 98-nt sgRNA in complex with a complementary 23-nt DNA sequence via Watson Crick pairing. The sgRNA sequence is synthetic and single stranded. In S. pyogenes, two separate RNAs combine to form this complex, crRNA and tracrRNA. The synthetic sequence contains derivatives of crRNA and tracrRNA linked by an artificial . The bound sgRNA:DNA complex forms a comprised of the repeat:anti-repeat duplex, guide:target heteroduplex, and stem loops 1, 2, and 3

V. sgRNA Recognition

Sequence Independent Recognition:

The guide region of the sgRNA is recognized by the REC and NUC lobe. Arg66, Arg70, Arg74, and Arg78 of the Bridge helix and Arg165 of REC1 form to the sgRNA phosphate backbone. are formed between 2'-hydroxyl groups of G1, C15, U16, and G19 and Val1009, Tyr450, Arg447/Ile448, Thr404, respectively. These interaction expose the Watson-Crick faces of the guide sgRNA to allow for binding to This secondary structure recognition of the sgRNA guide sequence by Cas9 is key for gene engineering. Various sgRNA can be designed to target genomic loci by changing this guide sequence and Cas9 cleavage will not be hindered.  The structure of stem loops 1, 2, and 3 also provide support for this mechanism.

Sequence Specific Recognition:

The sgRNA is also recognized in a primary sequence-specific manner by the REC and NUC lobes. The nucleobases of U23/A49 and A42/G43 of the repeat:anti-repeat sgRNA duplex hydrogen bond with Arg1122 and Phe351, respectively; U44 hydrogen bonds with Tyr325 and His328; and G43 hydrogen bonds with Asp364. The specific coordination of the Cas9 to these residues detail their importance in the sequence. Mutations to bases in the repeat:anti-repeat duplex significantly inhibits Cas9 cleavage efficiency.

VI. RNA-Guided DNA Targeting Mechanism

Cas9 recognizes guide:target heteroduplex in a primary sequence-independent manner. The REC1 lobe (Asn497, Trp659, Arg661, and Gln695), RuvC (Gln926), and PI (Glu1108) domains form with the phosphate backbone of the target DNA, and form between the C2' atoms of the target DNA and REC1 (Leu169, Tyr450, Met495, Met694, His698) and RuvC (Ala728). These interactions are likely what allow Cas9 to discriminate between RNA and DNA, since DNA does not contain a 2'-hydroxyl group. Additional recognition of the terminal base pairs of the guide:target duplex (G1:C20') by the RuvC form a unique . The sgRNA G1 and DNA C20' nucleobases form stacking interactions with Tyr1013 and Val1015, and the phosphate backbone and 2'-hydroxyl groups of G1 from a salt bridge and hydrogen bond and with Gln926 and Val1009, respectively. This unique terminal "capping" is likely what limits the size of the guide sgRNA, and subsequent DNA target, to 17-20 bp.5 Therefore synthetic sgRNA guide sequences engineered to be longer than 20 bp do not improve the targeting specificity of Cas9 and degrade in the cell.

VII.  Implications

The CRISPR/Cas9 system has proven as a successful tool for gene engineering. Whereas other useful gene editing systems exist, such as transcription activator-like effector nucleases (TALENs) and zinc finger nucleases (ZFNs), these systems target and perform a DSB at a genomic locus via multiple DNA-interacting proteins fused with non-specific nucleases.6

TALENs and ZFNs systems can be difficult to engineer and require fusion protein/nucleases to be designed on opposite strands to ensure a DSB. In contrast, the CRISPR/Cas9 system is guided by an sgRNA with homology to either strand of the DNA. Therefore, as long as the genomic sequence is known, a single sgRNA can likely be designed to program CRISPR/Cas9 to target and cleave a locus of interest. The relative simplicity and low-cost (Cas9 plasmids ~$65)7 of the CRISPR/Cas9 system provide a more accessible and powerful tool to the scientific community.

The efficacy of utilizing a new tool, however, must be accompanied by an understanding of the restraints of the technology. For example, promoter adjacent motif (PAM) sequences of DNA (omitted from this crystallization) are recognized by different Cas proteins. The PAM sequence that CRISPR/Cas9 recognizes is 5'-NGG-3', meaning that the guide sgRNA must be designed as the 20 bps proximal to this sequence. This somewhat limits where Cas9 can cut DNA, but typically there is a an 5'-NGG-3' sequence within a locus of interest. The limit of a 20 bp guide is a larger restraint. As seen in the RNA-Guided DNA Targeting Section, Cas9’s end-capping interaction with the guide:target heteroduplex limits the length of the target DNA region to ~20 bps. With approximately 6 billion DNA bps in a human diploid cell
, there are many non-unique 20 bp segments throughout the genome. This undermines the targeting efficiency of CRISPR/Cas9 greatly, as it introduces the potential for multiple off-target mutations. To combat this, several online resources (i.e. crispr.mit.edu) have been developed to identify guide regions within a locus of interest with the lowest probability of off-targeting. In addition, other modified CRISPR/Cas9 systems, such as CRISPR/Cas9 D10A nickase, have been engineered by mutating one residue involved in cleavage to an alanine. This limits Cas9 nickase to cut a single strand of DNA. Two sgRNAs are then designed to target opposite strands, requiring the paired cleavage by Cas nickase to ensure a DSB.   

VIII. References

1. Jinek, Chylinski, Fonfara, Hauer, Doudna, Charpentier. 2012. A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science. 17: 816-821.

2. Nishimasu, Hiroshi, F. Ann Ran, Ptrick D. Hsu, Silvana Konermann, Soraya Shehata, Naoshi Dohmae, Ryuichiro Ishitani, Feng Zhang, Osamu Nureki.  2015.  Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA.  Cell.  156(5): 935-949. pbd 4OO8.

3. Cavanagh, Peter, and Anthony Garrity. "RuvC Nuclease." CRISPRCas9. N.p., n.d. Web. 18 Dec. 2015.

4. Cavanagh, Peter, and Anthony Garrity. "HNH Nuclease." CRISPRCas9. N.p., n.d. Web. 18 Dec. 2015.

5. Ran FA, Hsu PD, Lin CY, Gootenberg JS, Konermann S, Trevino AE, Scott DA, Inoue A, Matoba S, Zhang Y, et al. 2013 Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell. 152: 1173-1183.

6. Gaj T, Gersbach, Barbas. 2013. ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends in Biotechnology. 31: 397-405. 

7. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marraffini LA, Zhang F. 2013. Multiplex genome engineering using CRISPR/Cas systems. Science. 819-823. 

Back to Top