ad

 

Chapter 2 DNA structure and the genome

 2        DNA structure and the genome

Each person’s genome contains a large amount of DNA that is a potential target for DNA profiling. The selection of the particular region of polymorphic DNA to analyse can change with the individual case and also the technology that is available. In this chapter a brief description of the primary structure of the DNA molecule is provided along with an overview of the different categories of DNA that make up the human genome. The criteria that the forensic geneticist uses to select which loci to analyse are also discussed. 

 

The information within the DNA ‘blueprint’ is coded by the sequence of the four different nitrogenous bases, adenine, guanine, thymine and cytosine, on the sugar–phosphate backbone (Figure 2.2a). DNA normally exists as a double-stranded molecule that adopts a helical arrangement – first described by Watson and Crick in 1953 [1]. Each base is attracted to its complementary base: adenine always pairs with thymine and cytosine always pairs with guanine (Figure 2.2b). Organization of DNA into chromosomes Within each nucleated human cell there are two complete copies of the genome. 

 

The genome is ‘the haploid genetic complement of a living organism’ and in humans contains approximately 3 200 000 000 bp of information, which is organized into 23 chromosomes. Humans contain two sets of chromosomes – one version of each chromosome inherited from each parent giving a total of 46 chromosomess [2] (Figure 2.3).

Figure 2.1 The DNA molecule is built up of deoxynucleotide 5’ triphosphates (a). The sugar (b) contains five carbon atoms (labelled C1–C5); one of four different types of nitrogenous base (c) is attached to the 1’ carbon, a hydroxyl group to the 3’ carbon and the phosphate group to the 5’ carbon. Adenine and guanine both have a double ring and are purines, whereas cytosine and thymine have a single ring and are pyrimidines

Figure 2.2 Nucleotides are joined together by phosphodiester bonds to form a single-stranded molecule (a). The DNA molecule in the cell is double-stranded (b) with two complementary singlestranded molecules held together by hydrogen bonds. Adenine and thymine form two hydrogen bonds while guanine and cytosine form three bonds 

Figure 2.3 The male human karyotype pictured contains 46 chromosomes, 22 autosomes and the X and Y sex chromosomes – the female karyotype has two X chromosomes. The chromosomes have been labelled with fluorescent probes allowing them to be identified. (Image provided by Duncan Holdsworth, Westlakes Research Institute, University of Central Lancashire, UK)

Each chromosome contains one continuous strand of DNA, the largest – chromosome 1 – is approximately 250 000 000 bp long whereas the smallest – chromosome 22 – is approximately 50 000 000 bp [3–5]. In physical terms the chromosomes range in length from 73 mm to 14 mm. The chromosomes shown in Figure 2.3 are in the metaphase stage of the cell cycle and are highly condensed – when the cell is not undergoing division the chromosomes are less highly ordered and are more diffuse within the nucleus. To achieve the highly ordered chromosome structure, the DNA molecule is associated with histone proteins, which help the packaging and organization of the DNA into the ordered chromosome structure.

 

The structure of the human genome

Great advances have been made in our understanding of the human genome in recent years, in particular through the work of the Human Genome Project, which was officially started in 1990 with the central aim of decoding the entire genome. It involved a collaborative effort involving 20 centres in China, France, Germany, Great Britain, Japan and the United States. Draft sequences were produced in 2001,

Figure 2.4 The human genome can be classified into different types of DNA based on its structure and function. (Based on Jasinska and Krzyzosiak [11])

 one by the Public Consortium and one by the private organization Celera Genomics, that covered 90% of the euchromatic DNA [4, 5]. This was followed by later versions that described the sequence of 99% of the euchromatic DNA with an accuracy of 99.99% [3]. The first genomes were composites made up from sequence data from different individuals; genomes of several individuals have now been decoded [6–10]. The genome can be divided into different categories of DNA based on the structure and function of the sequence (Figure 2.4).

 


Coding and regulatory sequence

The regions of DNA that encode and regulate the synthesis of proteins are called genes; at the latest estimate the human genome contains only 20 000–25 000 genes and only around 1.5% of the genome is directly involved in encoding for proteins [3–5]. Gene structure, sequence and activity are a focus of medical genetics because of the interest in genetic defects and the expression of genes within cells. Approximately 23.5% of the genome is classified as genic sequence, but does not encode proteins. The non-coding genic sequence contains several elements that are involved with the regulation of genes, including promoters, enhancers, repressors and polyadenylation signals; the majority of gene-related DNA, around 23%, is made up of introns, pseudogenes and gene fragments.

 


Extragenic DNA

    Most of the genome, approximately 75%, is extragenic. Around 20% of the genome is single copy DNA, which in most cases does not have any known function, although some regions appear to be under evolutionary pressure and presumably play an important, but as yet unknown, role [12].

 

    The largest portion of the genome – over 50% – is composed of repetitive DNA; 45% of the repetitive DNA is interspersed, with the repeat elements dispersed throughout the genome. The four most common types of interspersed repetitive element – short interspersed elements (SINEs), long interspersed elements (LINEs), long terminal repeats (LTRs) and DNA transposons – account for 45% of the genome [4, 13]. These repeat sequences are all derived through transposition. The most common interspersed repeat element is the Alu SINE; with over 1 million copies, the repeat is approximately 300 bp long and makes up around 10% of the genome. There is a similar number of LINE elements within the genome; the most common is LINE1, which is between 6 kb and 8 kb long, and is represented in the genome around 900 000 times; LINEs make up around 21% of the genome [4, 13]. The other class of repetitive element is tandemly repeated DNA. This can be separated into three different types: satellite DNA, minisatellites and microsatellites. 

 

f

Genetic diversity of modern humans

    The aim of using genetic analysis for forensic casework is to produce a DNA profile that is highly discriminating; the ideal would be to generate a DNA profile that is unique to each individual. This allows biological evidence from the scene of a crime to be matched to an individual with a high degree of confidence and can be very powerful forensic evidence.

 

   The ability to produce highly discriminating profiles is dependent on individuals being different at the genetic level and, with the exception of identical twins, no two individuals have been found to have the same DNA. However, individuals, even ones who appear very different, are actually very similar at the genetic level. Indeed, if we compare the human genome to that of our closest animal cousin, the chimpanzee, with whom we shared a common ancestor around 6 million years ago, we find that our genomes have diverged by only around 5%; the DNA sequence has diverged by only 1.2% [14] and insertions and deletions in both human and chimpanzee genomes account for another 3.5% divergence [14, 15]. This means that we share 95% of our DNA with chimps! Modern humans have a much more recent common history, which has been dated using genetic and fossil data to around 150 000 years ago [16, 17]. In this limited time, nucleotide substitutions have led to an average of one difference every 1000 bases between every human chromosome, averaging one difference every 1250 bp [5, 18] – which means that we share around 99.9% of our genetic code with each other. Some additional variation is caused by insertions, deletions, length polymorphisms and segmental duplications of the genome [6–10, 19].

 

    There have been attempts to define populations genetically based on their racial identity or geographical location, and while it has been possible to classify individuals genetically into broad racial/geographic groupings, it has been shown that most genetic variation, around 85%, can be attributed to differences between individuals within a population [20, 21]. Differences between regions tend to be geographic gradients (clines), with gradual changes in allele frequencies [22–27].
    From a forensic point of view there is very little rationale in analysing the 99.9% of human DNA that is common between individuals. Fortunately, there are wellcharacterized regions within the genome that are variable between individuals and these have become the focus of forensic genetics.

 


The genome and forensic genetics 

    With advances in molecular biology techniques it is now possible to analyse any region within the 3.2 billion bases that make up the human genome. DNA loci that are to be used for forensic genetics should have some key properties; they should ideally:

• be highly polymorphic (varying widely between individuals); 
• be easy and cheap to characterize; 
• give profiles that are simple to interpret and easy to compare between laboratories; 
• not be under any selective pressure; and 
• have a low mutation rate.

 


Tandem repeats

    Two important categories of tandem repeat have been used widely in forensic genetics: minisatellites, also referred to as variable number tandem repeats (VNTRs); and microsatellites, also referred to as STRs. The general structure of mini- and microsatellites is the same (Figures 2.5 and 2.6). Variation between different alleles is caused by a different number of the repeat unit, which in turn results in alleles that are of different lengths; it is for this reason that tandem repeat polymorphisms are also known as length polymorphisms.

Figure 2.5 The structure of two minisatellite alleles found at the D1S7 locus [28]. The alleles are both relatively short containing 104 and 134 repeats; alleles at this locus can contain over 2000 repeats. The alleles are composed of several different variants of the 9 bp core repeat; this is a common feature of minisatellite alleles 


Figure 2.6 The structure of a short tandem repeat. This example shows the structure of two alleles from the locus D8S1179.1 The DNA either side of the core repeats is called flanking DNA. The alleles are named according to the number of repeats that they contain – hence alleles 8 and 10


Minisatellites

    Minisatellites are located predominantly in the subtelomeric regions of chromosomes and have a core repeat sequence that ranges in size from 6 bp to 100 bp [30, 31]. The core repeats are represented in some alleles thousands of times; the variation in repeat number creates alleles that range in size from 500 bp to over 30 kb (Figure 2.5). The number of potential alleles can be very large: the D1S7 locus, for example has a relatively short and simple core repeat unit of 9 bp with alleles that range from approximately 1 kb to over 20 kb – which means that there are potentially over 2000 different alleles at this locus [28].

 

    Minisatellites were the first polymorphisms used in DNA profiling [32, 33] and they were successfully used in forensic casework for several years. The use of minisatellites was, however, limited by the type of sample that could be successfully analysed, because a large amount of high molecular weight DNA was required. Interpreting minisatellite profiles could also be problematic. Their use in forensic genetics has now been replaced by microsatellites, which are also known as STRs.

 


Short tandem repeats

    STRs are currently the most commonly analysed genetic polymorphism in forensic genetics. They were introduced into casework in the mid-1990s and are now the main tool for just about every forensic laboratory in the world – the vast majority of forensic genetic casework involves the analysis of STR polymorphisms.

 

    There are thousands of STRs that can potentially be used for forensic analysis. STR loci are spread throughout the genome, including the 22 autosomal chromosomes and the X and Y sex chromosomes. They have a core unit of between 1 bp2 and 6 bp and the alleles typically range from 50 bp to 300 bp. The majority of the loci that are used in forensic genetics are tetranucleotide repeats, which have a 4 bp repeat motif (Figure 2.6).

Figure 2.7 A single nucleotide polymorphism (SNP). Two alleles are shown which differ at one position indicated by the star: the fourth position in allele G is a guanine while in allele A it is an adenine. In most cases, the mutation event at the specific locus that creates a SNP is a unique event and only two different alleles (biallelic) are normally found

    STRs satisfy all the requirements for a forensic marker: they are robust, leading to successful analysis of a wide range of biological material; the results generated in different laboratories are easily compared; they are highly discriminatory, especially when analysing a large number of loci simultaneously (multiplexing); they are very sensitive, requiring only a few cells for a successful analysis; it is relatively cheap and easy to generate STR profiles; and there is a large number of STRs throughout the genome that do not appear to be under any selective pressure.

 


Single nucleotide polymorphisms

    The simplest type of polymorphism is the single nucleotide polymorphism (SNP): single base differences in the sequence of the DNA. The structure of a typical SNP polymorphism is illustrated in Figure 2.7.

 

    SNPs are formed when errors (mutations) occur as the cell undergoes DNA replication during meiosis. Some regions of the genome are richer in SNPs than others [34].
    SNPs normally have just two alleles, for example one allele with a guanine and one with an adenine. This is a purine for a purine, other common changes are between cytosine and thymine, both of which are pyrimidines. SNPs therefore are not highly polymorphic and do not fit with the ideal properties of DNA polymorphisms for forensic analysis. However, SNPs are so abundant throughout the genome that it is theoretically possible to type hundreds of them. This can result in very high combined power of discrimination. It is estimated that to achieve the same discriminatory power that is achieved using 10 STRs, 50–80 SNPs would have to be analysed [35, 36]. With current technology, this is much more difficult than analysing 10 to 15 STR loci.

 

    With the exception of the analysis of mitochondrial DNA (see Chapter 13), SNPs have not been used widely in forensic science to date, and the dominance of tandem repeated DNA will continue for the foreseeable future [37]. SNPs are however finding a number of niche applications in forensic science (see Chapter 12).

 


WWW resources

The Human Genome Project Information: a website funded by the U.S. Department of Energy which along with and the National Institutes of Health coordinated the project. Contains resources on all aspects of the Human Genome Project. http://www.ornl.gov/sci/techresources/ HumanGenome/home.shtml .

 


References

1. Watson, J. and Crick, F. (1953) A structure for deoxyribose nucleic acid. Nature, 171, 737–738.
2. Tjio, J.H. and Leven, A. (1956) The chromosome number of man. Hereditas, 42, 1–6.
3. Collins, F.S., Lander, E.S., Rogers, J. and Waterston, R.H. (2004) Finishing the euchromatic sequence of the human genome.     Nature, 431, 931–945.
4. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J. et al. (2001) Initial sequencing and analysis of     the human genome. Nature, 409, 860–921.
5. Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J. and Sutton, G.G. (2001) The sequence of the human genome.         Science, 291, 1304–1351.
6. Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L. and Walenz, B.P. (2007) The diploid genome sequence of an individual     human. Plos Biology, 5, 2113–2144.
7. Schuster, S.C., Miller, W., Ratan, A., Tomsho, L.P., Giardine, B., Kasson, L.R. et al. (2010) Complete Khoisan and Bantu             genomes from southern Africa. Nature, 463, 943–947.
8. Wang, J., Wang, W., Li, R.Q., Li, Y.R., Tian, G., Goodman, L. et al. (2008) The diploid genome sequence of an Asian                 individual. Nature, 456, 60–61.
9. Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A. et al. (2008) The complete genome of an                 individual by massively parallel DNA sequencing. Nature, 452, 872–875.
10. Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., Brown, C.G. et al. (2008) Accurate whole             human genome sequencing using reversible terminator chemistry. Nature, 456, 53–59.
11. Jasinska, A. and Krzyzosiak, W.J. (2004) Repetitive sequences that shape the human transcriptome. FEBS Letters, 567,         136–141.
12. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P. et al. (2002) Initial sequencing and                 comparative analysis of the mouse genome. Nature, 420, 520–562.
13. Li, W.H., Gu, Z.L., Wang, H.D. and Nekrutenko, A. (2001) Evolutionary analyses of the human genome. Nature, 409, 847–    849.
14. Mikkelsen, T.S., Hillier, L.W., Eichler, E.E., Zody, M.C., Jaffe, D.B., Yang, S.P. et al. (2005) Initial sequence of the                 chimpanzee genome and comparison with the human genome. Nature, 437, 69–87.
15. Britten, R.J. (2002) Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels.                 Proceedings of the National Academy of Sciences of the United States of America, 99, 13633–13635.
16. Cann, R.L., Stoneking, M., Wilson, A.C. et al. (1987) Mitochondrial-DNA and humanevolution. Nature, 325, 31–36.
17. Stringer, C.B. and Andrews, P. (1988) Genetic and fossil evidence for the origin of modern humans. Science, 239, 1263–            1268.
18. Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G. et al. (2001) A map of human genome     sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409, 928–933.
19. Ahn, S.M., Kim, T.H., Lee, S., Kim, D., Ghang, H., Kim, D.S. et al. (2009) The first Korean genome sequence and analysis:     full genome sequencing for a socio-ethnic group. Genome Research, 19, 1622–1629.
20. Barbujani, G., Magagni, A., Minch, E. and CavalliSforza, L.L. (1997) An apportionment of human DNA diversity.                     Proceedings of the National Academy of Sciences of the United States of America, 94, 4516–4519.

 

 


No comments:

Post a Comment

Chapter 4. DNA extraction and quantification

 4. DNA extraction and quantification DNA extraction has two main aims: first, to maximizing the yield of DNA from a sample and in sufficien...