The information within the DNA ‘blueprint’ is coded by the sequence of the
four different nitrogenous bases, adenine, guanine, thymine and cytosine, on the
sugar–phosphate backbone (Figure 2.2a).
DNA normally exists as a double-stranded molecule that adopts a helical
arrangement – first described by Watson and Crick in 1953 [1]. Each base is
attracted to its complementary base: adenine always pairs with thymine and cytosine
always pairs with guanine (Figure 2.2b).
Organization of DNA into chromosomes
Within each nucleated human cell there are two complete copies of the genome.
The
genome is ‘the haploid genetic complement of a living organism’ and in humans
contains approximately 3 200 000 000 bp of information, which is organized into 23
chromosomes. Humans contain two sets of chromosomes – one version of each
chromosome inherited from each parent giving a total of 46 chromosomess [2]
(Figure 2.3).
Each chromosome contains one continuous strand of DNA, the largest – chromosome 1 – is approximately 250 000 000 bp long whereas the smallest – chromosome
22 – is approximately 50 000 000 bp [3–5]. In physical terms the chromosomes range
in length from 73 mm to 14 mm. The chromosomes shown in Figure 2.3 are in
the metaphase stage of the cell cycle and are highly condensed – when the cell
is not undergoing division the chromosomes are less highly ordered and are more
diffuse within the nucleus. To achieve the highly ordered chromosome structure, the
DNA molecule is associated with histone proteins, which help the packaging and
organization of the DNA into the ordered chromosome structure.
Great advances have been made in our understanding of the human genome in
recent years, in particular through the work of the Human Genome Project, which
was officially started in 1990 with the central aim of decoding the entire genome.
It involved a collaborative effort involving 20 centres in China, France, Germany,
Great Britain, Japan and the United States. Draft sequences were produced in 2001,
Coding and regulatory sequence
The regions of DNA that encode and regulate the synthesis of proteins are called
genes; at the latest estimate the human genome contains only 20 000–25 000 genes
and only around 1.5% of the genome is directly involved in encoding for proteins [3–5]. Gene structure, sequence and activity are a focus of medical genetics
because of the interest in genetic defects and the expression of genes within cells.
Approximately 23.5% of the genome is classified as genic sequence, but does not
encode proteins. The non-coding genic sequence contains several elements that are
involved with the regulation of genes, including promoters, enhancers, repressors and
polyadenylation signals; the majority of gene-related DNA, around 23%, is made up
of introns, pseudogenes and gene fragments.
Extragenic DNA
Most of the genome, approximately 75%, is extragenic. Around 20% of the genome is
single copy DNA, which in most cases does not have any known function, although
some regions appear to be under evolutionary pressure and presumably play an
important, but as yet unknown, role [12].
The largest portion of the genome – over 50% – is composed of repetitive DNA;
45% of the repetitive DNA is interspersed, with the repeat elements dispersed
throughout the genome. The four most common types of interspersed repetitive
element – short interspersed elements (SINEs), long interspersed elements (LINEs),
long terminal repeats (LTRs) and DNA transposons – account for 45% of the genome
[4, 13]. These repeat sequences are all derived through transposition. The most common interspersed repeat element is the Alu SINE; with over 1 million copies, the
repeat is approximately 300 bp long and makes up around 10% of the genome.
There is a similar number of LINE elements within the genome; the most common
is LINE1, which is between 6 kb and 8 kb long, and is represented in the genome
around 900 000 times; LINEs make up around 21% of the genome [4, 13]. The other
class of repetitive element is tandemly repeated DNA. This can be separated into
three different types: satellite DNA, minisatellites and microsatellites.
Genetic diversity of modern humans
The aim of using genetic analysis for forensic casework is to produce a DNA profile
that is highly discriminating; the ideal would be to generate a DNA profile that is
unique to each individual. This allows biological evidence from the scene of a crime
to be matched to an individual with a high degree of confidence and can be very
powerful forensic evidence.
The ability to produce highly discriminating profiles is dependent on individuals
being different at the genetic level and, with the exception of identical twins, no
two individuals have been found to have the same DNA. However, individuals,
even ones who appear very different, are actually very similar at the genetic level.
Indeed, if we compare the human genome to that of our closest animal cousin, the
chimpanzee, with whom we shared a common ancestor around 6 million years ago,
we find that our genomes have diverged by only around 5%; the DNA sequence
has diverged by only 1.2% [14] and insertions and deletions in both human and
chimpanzee genomes account for another 3.5% divergence [14, 15]. This means
that we share 95% of our DNA with chimps! Modern humans have a much more
recent common history, which has been dated using genetic and fossil data to around
150 000 years ago [16, 17]. In this limited time, nucleotide substitutions have led to
an average of one difference every 1000 bases between every human chromosome,
averaging one difference every 1250 bp [5, 18] – which means that we share around
99.9% of our genetic code with each other. Some additional variation is caused
by insertions, deletions, length polymorphisms and segmental duplications of the
genome [6–10, 19].
There have been attempts to define populations genetically based on their racial
identity or geographical location, and while it has been possible to classify individuals genetically into broad racial/geographic groupings, it has been shown that most
genetic variation, around 85%, can be attributed to differences between individuals
within a population [20, 21]. Differences between regions tend to be geographic
gradients (clines), with gradual changes in allele frequencies [22–27].
From a forensic point of view there is very little rationale in analysing the 99.9%
of human DNA that is common between individuals. Fortunately, there are wellcharacterized regions within the genome that are variable between individuals and
these have become the focus of forensic genetics.
The genome and forensic genetics
With advances in molecular biology techniques it is now possible to analyse any
region within the 3.2 billion bases that make up the human genome. DNA loci
that are to be used for forensic genetics should have some key properties; they
should ideally:
• be highly polymorphic (varying widely between individuals);
• be easy and cheap to characterize;
• give profiles that are simple to interpret and easy to compare between
laboratories;
• not be under any selective pressure; and
• have a low mutation rate.
Tandem repeats
Two important categories of tandem repeat have been used widely in forensic genetics: minisatellites, also referred to as variable number tandem repeats (VNTRs);
and microsatellites, also referred to as STRs. The general structure of mini- and
microsatellites is the same (Figures 2.5 and 2.6). Variation between different alleles
is caused by a different number of the repeat unit, which in turn results in alleles
that are of different lengths; it is for this reason that tandem repeat polymorphisms
are also known as length polymorphisms.
Figure 2.5 The structure of two minisatellite alleles found at the D1S7 locus [28]. The alleles are
both relatively short containing 104 and 134 repeats; alleles at this locus can contain over 2000
repeats. The alleles are composed of several different variants of the 9 bp core repeat; this is a
common feature of minisatellite alleles
Figure 2.6 The structure of a short tandem repeat. This example shows the structure of two alleles
from the locus D8S1179.1 The DNA either side of the core repeats is called flanking DNA. The alleles
are named according to the number of repeats that they contain – hence alleles 8 and 10
Minisatellites
Minisatellites are located predominantly in the subtelomeric regions of chromosomes
and have a core repeat sequence that ranges in size from 6 bp to 100 bp [30, 31].
The core repeats are represented in some alleles thousands of times; the variation in
repeat number creates alleles that range in size from 500 bp to over 30 kb (Figure 2.5).
The number of potential alleles can be very large: the D1S7 locus, for example has
a relatively short and simple core repeat unit of 9 bp with alleles that range from
approximately 1 kb to over 20 kb – which means that there are potentially over 2000
different alleles at this locus [28].
Minisatellites were the first polymorphisms used in DNA profiling [32, 33]
and they were successfully used in forensic casework for several years. The
use of minisatellites was, however, limited by the type of sample that could be
successfully analysed, because a large amount of high molecular weight DNA was
required. Interpreting minisatellite profiles could also be problematic. Their use in
forensic genetics has now been replaced by microsatellites, which are also known
as STRs.
Short tandem repeats
STRs are currently the most commonly analysed genetic polymorphism in forensic
genetics. They were introduced into casework in the mid-1990s and are now the
main tool for just about every forensic laboratory in the world – the vast majority
of forensic genetic casework involves the analysis of STR polymorphisms.
There are thousands of STRs that can potentially be used for forensic analysis. STR
loci are spread throughout the genome, including the 22 autosomal chromosomes and
the X and Y sex chromosomes. They have a core unit of between 1 bp2 and 6 bp
and the alleles typically range from 50 bp to 300 bp. The majority of the loci that are
used in forensic genetics are tetranucleotide repeats, which have a 4 bp repeat motif
(Figure 2.6).
Figure 2.7 A single nucleotide polymorphism (SNP). Two alleles are shown which differ at one
position indicated by the star: the fourth position in allele G is a guanine while in allele A it is an
adenine. In most cases, the mutation event at the specific locus that creates a SNP is a unique
event and only two different alleles (biallelic) are normally found
STRs satisfy all the requirements for a forensic marker: they are robust, leading
to successful analysis of a wide range of biological material; the results generated in
different laboratories are easily compared; they are highly discriminatory, especially
when analysing a large number of loci simultaneously (multiplexing); they are very
sensitive, requiring only a few cells for a successful analysis; it is relatively cheap
and easy to generate STR profiles; and there is a large number of STRs throughout
the genome that do not appear to be under any selective pressure.
Single nucleotide polymorphisms
The simplest type of polymorphism is the single nucleotide polymorphism (SNP):
single base differences in the sequence of the DNA. The structure of a typical SNP
polymorphism is illustrated in Figure 2.7.
SNPs are formed when errors (mutations) occur as the cell undergoes DNA replication during meiosis. Some regions of the genome are richer in SNPs than others [34].
SNPs normally have just two alleles, for example one allele with a guanine and
one with an adenine. This is a purine for a purine, other common changes are
between cytosine and thymine, both of which are pyrimidines. SNPs therefore are
not highly polymorphic and do not fit with the ideal properties of DNA polymorphisms for forensic analysis. However, SNPs are so abundant throughout the genome
that it is theoretically possible to type hundreds of them. This can result in very high
combined power of discrimination. It is estimated that to achieve the same discriminatory power that is achieved using 10 STRs, 50–80 SNPs would have to be
analysed [35, 36]. With current technology, this is much more difficult than analysing
10 to 15 STR loci.
With the exception of the analysis of mitochondrial DNA (see Chapter 13), SNPs
have not been used widely in forensic science to date, and the dominance of tandem
repeated DNA will continue for the foreseeable future [37]. SNPs are however finding
a number of niche applications in forensic science (see Chapter 12).
WWW resources
The Human Genome Project Information: a website funded by the U.S. Department of Energy
which along with and the National Institutes of Health coordinated the project. Contains
resources on all aspects of the Human Genome Project. http://www.ornl.gov/sci/techresources/
HumanGenome/home.shtml .
References
1. Watson, J. and Crick, F. (1953) A structure for deoxyribose nucleic acid. Nature, 171,
737–738.
2. Tjio, J.H. and Leven, A. (1956) The chromosome number of man. Hereditas, 42, 1–6.
3. Collins, F.S., Lander, E.S., Rogers, J. and Waterston, R.H. (2004) Finishing the euchromatic
sequence of the human genome. Nature, 431, 931–945.
4. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J. et al. (2001)
Initial sequencing and analysis of the human genome. Nature, 409, 860–921.
5. Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J. and Sutton, G.G. (2001) The
sequence of the human genome. Science, 291, 1304–1351.
6. Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L. and Walenz, B.P. (2007) The diploid
genome sequence of an individual human. Plos Biology, 5, 2113–2144.
7. Schuster, S.C., Miller, W., Ratan, A., Tomsho, L.P., Giardine, B., Kasson, L.R. et al. (2010)
Complete Khoisan and Bantu genomes from southern Africa. Nature, 463, 943–947.
8. Wang, J., Wang, W., Li, R.Q., Li, Y.R., Tian, G., Goodman, L. et al. (2008) The diploid
genome sequence of an Asian individual. Nature, 456, 60–61.
9. Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A. et al. (2008)
The complete genome of an individual by massively parallel DNA sequencing. Nature, 452,
872–875.
10. Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., Brown, C.G.
et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry.
Nature, 456, 53–59.
11. Jasinska, A. and Krzyzosiak, W.J. (2004) Repetitive sequences that shape the human transcriptome. FEBS Letters, 567, 136–141.
12. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P. et al. (2002)
Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520–562.
13. Li, W.H., Gu, Z.L., Wang, H.D. and Nekrutenko, A. (2001) Evolutionary analyses of the
human genome. Nature, 409, 847– 849.
14. Mikkelsen, T.S., Hillier, L.W., Eichler, E.E., Zody, M.C., Jaffe, D.B., Yang, S.P. et al. (2005)
Initial sequence of the chimpanzee genome and comparison with the human genome. Nature,
437, 69–87.
15. Britten, R.J. (2002) Divergence between samples of chimpanzee and human DNA sequences
is 5%, counting indels. Proceedings of the National Academy of Sciences of the United States
of America, 99, 13633–13635.
16. Cann, R.L., Stoneking, M., Wilson, A.C. et al. (1987) Mitochondrial-DNA and humanevolution. Nature, 325, 31–36.
17. Stringer, C.B. and Andrews, P. (1988) Genetic and fossil evidence for the origin of modern
humans. Science, 239, 1263– 1268.
18. Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G. et al.
(2001) A map of human genome sequence variation containing 1.42 million single nucleotide
polymorphisms. Nature, 409, 928–933.
19. Ahn, S.M., Kim, T.H., Lee, S., Kim, D., Ghang, H., Kim, D.S. et al. (2009) The first Korean
genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome
Research, 19, 1622–1629.
20. Barbujani, G., Magagni, A., Minch, E. and CavalliSforza, L.L. (1997) An apportionment of
human DNA diversity. Proceedings of the National Academy of Sciences of the United States
of America, 94, 4516–4519.
No comments:
Post a Comment