Streptococcus pyogenes CRISPR-Cas9
in Complex with Guide RNA and DNA
Trevor Manz '17 and Laura Duncan '17
Contents:
I. Introduction
Clustered
regularly interspaced short palindromic repeats (CRISPR) are
sections of prokaryotic genomic DNA. These sections contain
repetitions of a DNA sequence separated by small fragments of
unique "spacer DNAs." The DNA spacers are derived from viral
phage DNA and integrated into the genome between repeats.
Repeat-spacer arrays are located in relation to Cas
(CRISPR-associated)
genes.
The CRISPR/Cas system functions as an acquired bacterial immune
defense system as Cas proteins associate with transcribed spacer
DNA (RNA) to target foreign viral DNA. If viral DNA is
identified via hybridization with the CRISPR RNA, CRISPR/Cas
proteins' nuclease activity will cleave the DNA, inducing a double
stranded break (DSB) and rendering the phage DNA inactivate.
The sequence-specific manner
in which Cas proteins target and cleave viral DNA led to the
discovery and development of CRISPR/Cas systems as a tool for gene
editing1.
The targeting CRISPR RNA (sgRNA) can be synthesized to
match a specific eukaryotic genomic locus, enabling the CRISPR/Cas
system to cleave the locus of interest. This effectively
provides researchers with a relatively inexpensive and readily
programmable tool to generate transgenic cell and animal
lines. The goal of this tutorial is to elucidate how Streptococcus
pyogenes Cas9, one of the the most widely used Cas
proteins in gene engineering, binds target DNA via sgRNA and
executes the chemical reactions required for cleavage.
II. General Structure
Color Scheme:
Cas9's crystalline structure consists of two
lobes: the recognition
(REC) lobe, which binds sgRNA and DNA, and
the nuclease
(NUC) lobe, which cuts target DNA. A
positively
charged groove between the REC and NUC lobes2
provides a location for the negatively-charged
to form. Three regions make up the
: the bridge helix,
the Rec1
domain, and the Rec2
domain. The
is also comprised of three regions, the RuvC
domain, the HNH
domain, and the PAM-interacting
domain.
III. Nuclease Activity
The RuvC
domain within the NUC lobe contains
, Asp10*,
Glu762, His983,
and Asp986,
which allow RuvC to cleave DNA or RNA via a two metal
mechanism. This two metal mechanism incorporates Mg2+
ions to stabilize the breaking of a phosphodiester bond
causing a single strand break in DNA or RNA, consistent with
the method used by other retroviral integrase superfamily
nucleases, as illustrated below.
*mutated
to alanine to prevent DNA cleavage during crystallization.
Putative
two-step mechanism used by RuvC3
:
The HNH
domain, also within the NUC lobe, contains
, Asp839, His840**,
and Asn863.
Cleavage of DNA or RNA is made possible through a
single-metal mechanism, which utilizes a Mg2+
ion to stabilize the breaking of a phosphodiester
bond, causing a single stranded break in DNA or RNA. The
putative mechanism for HNH superfamily nucleases is
illustrated below.
**mutated
to alanine to prevent DNA cleavage during
crystallization.
Putative
two-step mechanism used by HNH nucleases4:
IV. sgRNA:DNA Heteroduplex
The
is comprised of a folded 98-nt sgRNA in
complex with a complementary 23-nt DNA
sequence via
Watson Crick pairing. The
sgRNA
sequence is synthetic and single stranded. In S.
pyogenes, two separate RNAs combine to form this
complex, crRNA
and tracrRNA.
The synthetic
sequence contains derivatives of crRNA and tracrRNA linked
by an artificial
. The bound sgRNA:DNA complex forms a
comprised of the repeat:anti-repeat
duplex, guide:target
heteroduplex, and stem loops 1,
2, and 3.
V. sgRNA Recognition
Sequence Independent Recognition:
The guide region of the sgRNA is recognized by the
REC and NUC lobe. Arg66,
Arg70, Arg74,
and Arg78
of the Bridge helix and Arg165
of REC1 form
to the sgRNA phosphate backbone.
are formed between 2'-hydroxyl groups of G1,
C15, U16,
and G19 and
Val1009, Tyr450,
Arg447/Ile448,
Thr404,
respectively. These interaction expose the Watson-Crick
faces of the guide sgRNA to allow for binding to
This secondary structure recognition of the sgRNA guide sequence by
Cas9 is key for gene engineering. Various sgRNA can be
designed to target genomic loci by changing this guide
sequence and Cas9 cleavage will not be hindered. The
structure of stem loops 1, 2, and 3 also provide support for
this mechanism.
Sequence Specific Recognition:
The sgRNA is
also recognized in a primary sequence-specific manner by the
REC and NUC lobes. The nucleobases of U23/A49
and A42/G43
of the repeat:anti-repeat sgRNA duplex hydrogen bond with
Arg1122 and Phe351,
respectively; U44
hydrogen bonds with
Tyr325 and His328;
and G43
hydrogen bonds with Asp364.
The specific coordination of the Cas9 to these residues detail their
importance in the sequence. Mutations to bases in the
repeat:anti-repeat duplex significantly inhibits Cas9 cleavage
efficiency.
VI. RNA-Guided DNA Targeting Mechanism
Cas9 recognizes guide:target
heteroduplex in a primary sequence-independent manner. The
REC1 lobe (Asn497,
Trp659, Arg661,
and Gln695),
RuvC (Gln926),
and PI (Glu1108)
domains form
with the phosphate backbone of the target DNA, and
form between the C2' atoms of the target DNA and REC1 (Leu169,
Tyr450, Met495,
Met694, His698)
and RuvC (Ala728).
These interactions are likely what allow Cas9 to
discriminate between RNA and DNA, since DNA does not contain
a 2'-hydroxyl group. Additional recognition of the terminal
base pairs of the guide:target duplex (G1:C20') by the RuvC
form a unique
. The sgRNA G1 and DNA C20' nucleobases form stacking interactions
with Tyr1013
and Val1015,
and the phosphate backbone and 2'-hydroxyl groups of G1 from
a salt bridge and hydrogen bond and with
Gln926
and Val1009,
respectively. This unique terminal "capping" is likely what
limits the size of the guide sgRNA, and subsequent DNA
target, to 17-20 bp.5
Therefore synthetic sgRNA guide sequences
engineered to be longer than 20 bp do not improve the
targeting specificity of Cas9 and degrade in the cell.
VII. Implications
The
CRISPR/Cas9 system has proven as a successful tool for gene
engineering. Whereas other useful gene editing systems exist, such
as transcription activator-like effector nucleases (TALENs) and
zinc finger nucleases (ZFNs), these systems target and perform a
DSB at a genomic locus via multiple DNA-interacting proteins fused
with non-specific nucleases.6
TALENs and ZFNs systems can
be difficult to engineer and require fusion protein/nucleases to
be designed on opposite strands to ensure a DSB. In contrast, the
CRISPR/Cas9 system is guided by an sgRNA with homology to either
strand of the DNA. Therefore, as long as the genomic sequence is
known, a single sgRNA can likely be designed to program
CRISPR/Cas9 to target and cleave a locus of interest. The relative
simplicity and low-cost (Cas9 plasmids ~$65)7
of the CRISPR/Cas9 system
provide a more accessible and powerful tool to the scientific
community.
The efficacy of utilizing a new tool, however, must be accompanied
by an understanding of the restraints of the technology. For
example, promoter adjacent motif (PAM) sequences of DNA (omitted
from this crystallization) are recognized by different Cas
proteins. The PAM sequence that CRISPR/Cas9 recognizes is
5'-NGG-3', meaning that the guide sgRNA must be designed as the 20
bps proximal to this sequence. This somewhat limits where Cas9 can
cut DNA, but typically there is a an 5'-NGG-3' sequence within a
locus of interest. The limit of a 20 bp guide is a larger
restraint. As seen in the RNA-Guided DNA
Targeting Section, Cas9’s end-capping interaction with the
guide:target heteroduplex limits the length of the target DNA
region to ~20 bps. With approximately 6 billion DNA bps in a human
diploid cell,
there are many non-unique 20 bp segments throughout the genome.
This undermines the targeting efficiency of CRISPR/Cas9 greatly,
as it introduces the potential for multiple off-target mutations.
To combat this, several online resources (i.e. crispr.mit.edu)
have been developed to identify guide regions within a locus of
interest with the lowest probability of off-targeting. In
addition, other modified CRISPR/Cas9 systems, such as CRISPR/Cas9
D10A nickase, have been engineered by mutating one residue
involved in cleavage to an alanine. This limits Cas9 nickase to
cut a single strand of DNA. Two sgRNAs are then designed to target
opposite strands, requiring the paired cleavage by Cas nickase to
ensure a DSB.
VIII. References
1.
Jinek, Chylinski, Fonfara, Hauer,
Doudna, Charpentier. 2012. A
Programmable Dual-RNA–Guided DNA
Endonuclease in Adaptive Bacterial
Immunity. Science.
17: 816-821.
2.
Nishimasu, Hiroshi, F. Ann Ran, Ptrick
D. Hsu, Silvana Konermann, Soraya
Shehata, Naoshi Dohmae, Ryuichiro
Ishitani, Feng Zhang, Osamu
Nureki. 2015. Crystal
Structure of Cas9 in Complex with Guide
RNA and Target DNA. Cell.
156(5): 935-949. pbd 4OO8.
3.
Cavanagh, Peter, and Anthony Garrity.
"RuvC Nuclease." CRISPRCas9. N.p., n.d.
Web. 18 Dec. 2015.
4.
Cavanagh, Peter, and Anthony Garrity.
"HNH Nuclease." CRISPRCas9. N.p., n.d.
Web. 18 Dec. 2015.
5.
Ran FA, Hsu PD, Lin CY, Gootenberg JS,
Konermann S, Trevino AE, Scott DA, Inoue
A, Matoba S, Zhang Y, et al. 2013 Double
nicking by RNA-guided CRISPR Cas9 for
enhanced genome editing specificity. Cell.
152: 1173-1183.
6.
Gaj T, Gersbach, Barbas. 2013. ZFN,
TALEN, and CRISPR/Cas-based methods for
genome engineering. Trends in
Biotechnology. 31: 397-405.
7.
Cong L, Ran FA, Cox D, Lin S, Barretto
R, Habib N, Hsu PD, Wu X, Jiang W,
Marraffini LA, Zhang F. 2013. Multiplex
genome engineering using CRISPR/Cas
systems. Science.
819-823.
Back to Top