Saturday, April 5, 2008

Gene Future: The Promise and Perils of the New Biology

Rapid advancement in an aspect of science usually results in an array of books for the general public that range from clear explanations with a balanced viewpoint to the sensational. This book is a model, as the general reader will find clear explanations of the scientific principles, laid out as needed, in a discussion of the ethical and social issues raised by the actual and potential applications of genetic technology. The discussion of agricultural applications was most informative to me, as a reader who is much more involved with the medical applications of genetics. This area does not receive as much coverage in general scientific journals, as the human genome project seems to be the major attraction for both the general and scientific presses. It is clear that the author has a comprehensive view of biology. There are a few points that could be handled better. The decrease in mortality from infectious disease, resulting in increased attention to genetic disorders, is ascribed to the use of antibiotics. This cannot be correct, because the death rates were decreasing before the introduction of vaccines and antibiotics. Most epidemiologists believe that this change is a result of other social changes and better sanitation. The human karyotype illustration is not reflective of the state of the art. This book is written for a general audience, but could be used as collateral reading for advanced high school biology or college courses. It might also be useful for courses for people majoring in humanities.

--Reviewed by David J. Harris in Science Books and Films, 30/1 (January/February 1994), p. 9.

http://www.project2061.org/publications/rsl/online/Tradebks/REVS/GENEFUTU.HTM

Mitochondrial genome

The human mitochondrial genome, while usually not included when referring to the "human genome", is of tremendous interest to geneticists, since it undoubtedly plays a role in mitochondrial disease. It also sheds light on human evolution; for example, analysis of variation in the human mitochondrial genome has led to the postulation of a recent common ancestor for all humans on the maternal line of descent. (see Mitochondrial Eve)

Due to the lack of a system for checking for copying errors, Mitochondrial DNA (mtDNA) has a more rapid rate of variation than nuclear DNA. This 20-fold increase in the mutation rate allows mtDNA to be used for more accurate tracing of maternal ancestry. Studies of mtDNA in populations have allowed ancient migration paths to be traced, such as the migration of Native Americans from Siberia or Polynesians from southeastern Asia. It has also been used to show that there is no trace of Neanderthal DNA in the European gene mixture inherited through purely maternal lineage.

http://en.wikipedia.org/wiki/Human_genome




Google gives new gene mapping service a bit of spit and polish

Just in time for Christmas, the Silicon Valley startup 23andMe - the name refers to the number of pairs of chromosomes in human DNA - has begun offering a personal genotyping service. For a $1,000 (£483) fee, 23andMe will run a sample of your DNA through a specialised gene-reading microchip that is able to identify, the company says, "nearly 600,000 data points on your genome".

It then uploads your genetic information on to its website, where you can use various "web-based interactive tools" to explore your ancestral origins and your likelihood of contracting hereditary diseases. To provide 23andMe with a DNA sample, you have to send the firm a small test tube filled with saliva. That requirement at first struck me as distasteful, maybe even demeaning. To have one's earthly origin and destiny reduced to a vial of spit seems like an affront to human dignity.

But maybe I'm being too fussy. After all, primitive cultures have long viewed bodily fluids, and saliva in particular, as carriers of secrets about personal identity and vitality. The Cherokee tribe of what is now the southeastern United States considered saliva "a vital element", according to the anthropologist Charles Hudson. Spittle "was to the individual as water in creeks and rivers was to the world", he wrote in a book about Native American religion. "If one's saliva were spoiled, it was a serious matter."

All that 23andMe is doing is giving an information-age twist to this mystical tradition. It's using automated laboratories, supercomputers and the internet to translate ancient symbolism into a practical digital service. Once a spiritual totem, spit is now just another informational medium.

23andMe isn't the only company creating personal gene maps; deCODE Genetics, a biopharmaceutical business in Reykjavik, Iceland, offers a similar service called deCODEme. But 23andMe is worth special note because it has the might of Google behind it. The search engine giant was an early investor in the startup, and it has a personal connection as well: Google co-founder Sergey Brin is married to 23andMe co-founder Anne Wojcicki.

Google's involvement suggests that 23andMe probably has larger ambitions than just providing individuals with gene maps. As its online store of genetic information grows and as customers add personal information, the company could end up with a database of extraordinary value to pharmaceutical firms, medical researchers and insurance companies.

Sorted and analysed with Google's sophisticated data-crunching tools, the database could disclose hidden connections between genes, aptitudes and diseases. In a privacy statement on its site, the company acknowledges that it plans to grant outside groups access to its database, allowing them to search, "without knowing the identities of the individuals involved", for correlations between genetic variations and health conditions. That could well turn into a major business.

The company also says that it will give users "the ability to connect with other 23andMe customers through sharing features". 23andMe could evolve into a social network, a biotech version of MySpace or Facebook where people make connections not with friends but with people who share similar genetic traits. This, too, could provide the basis for a lucrative business. Given that 23andMe tracks its customers' movements with cookies, it may not be long before we see genetically targeted advertising.

If all this seems a little Brave New World, that shouldn't be a surprise. Breakthroughs in genetics, combined with advances in computing and networking, promise to reshape many of our assumptions about our health, our identity and even our fate. We'll face a series of difficult questions.

How much do we really want to know about the path of our eventual decay? Services such as those from 23andMe can provide many benefits, but they also promise to create anxiety and fear.

http://www.guardian.co.uk/technology/2007/nov/22/comment.google

Human Genetics Evolution

See also: Human evolution and Chimpanzee Genome Project

Comparative genomics studies of mammalian genomes suggest that approximately 5% of the human genome has been conserved by evolution since the divergence of those species approximately 200 million years ago, containing the vast majority of genes.[8][9] Intriguingly, since genes and known regulatory sequences probably comprise less than 2% of the genome, this suggests that there may be more unknown functional sequence than known functional sequence. A smaller, yet large, fraction of human genes seem to be shared among most known vertebrates. The chimpanzee genome is 95% identical to the human genome. On average, a typical human protein-coding gene differs from its chimpanzee ortholog by only two amino acid substitutions; nearly one third of human genes have exactly the same protein translation as their chimpanzee orthologs. A major difference between the two genomes is human chromosome 2, which is equivalent to a fusion product of chimpanzee chromosomes 12 and 13.[13]

Humans have undergone an extraordinary loss of olfactory receptor genes during our recent evolution, which explains our relatively crude sense of smell compared to most other mammals. Evolutionary evidence suggests that the emergence of color vision in humans and several other primate species has diminished the need for the sense of smell.[14]

http://en.wikipedia.org/wiki/Human_genome

Finding a Gene on the Chromosome Map

As a scientist, you've noticed that a genetic disorder runs in families, and you want to find the gene responsible for it.

First, you identify a large family, in which some individuals have the disorder, and others don't. After enlisting the family's support and collecting DNA samples from all family members, you're ready to begin looking for the gene. Where do you go from here?

Here's one way to think about genes

blank puzzle What if the genetic information in each family member were like a jigsaw puzzle? Each puzzle piece would represent a set of genes organized in a specific way, similar to a chromosome. Because all humans have the same set of genes, arranged in the same order, every family member would have the same basic set of puzzle pieces. A generic human jigsaw puzzle might look like the picture at the right

But the information carried in genes differs slightly from person to person. This is what makes each of us unique. As a result, the colors of the puzzle pieces would be different between family members. While some relatives might share puzzle pieces of a certain color, other pieces would be different. Only identical twins share the exact same combination of colors and shapes.

What might a family's puzzles look like?

Look at the family of jigsaw puzzles below. Can you see how some of the child's genes are derived from one parent and some from the other parent?

The child receives exactly half of its genetic information from the mother and exactly half from the father.

Looking at things this way, can you see how you might identify a genetic link in paternity suits, where a genetic connection is sought between a child and a possible father? Half of the child's puzzle pieces must be the same as the father's.

http://learn.genetics.utah.edu/units/disorders/pedigree/

Genetic disorders

Most aspects of human biology involve both genetic (inherited) and non-genetic (environmental) factors. Some inherited variation influences aspects of our biology that are not medical in nature (height, eye color, ability to taste or smell certain compounds, etc). Moreover, some genetic disorders only cause disease in combination with the appropriate environmental factors (such as diet). With these caveats, genetic disorders may be described as clinically defined diseases caused by genomic DNA sequence variation. In the most straightforward cases, the disorder can be associated with variation in a single gene. For example, cystic fibrosis is caused by mutations in the CFTR gene, and is the most common recessive disorder in caucasian populations with over 1300 different mutations known. Disease-causing mutations in specific genes are usually severe in terms of gene function, and are fortunately rare, thus genetic disorders are similarly individually rare. However, since there are many genes that can vary to cause genetic disorders, in aggregate they comprise a significant component of known medical conditions, especially in pediatric medicine. Molecularly characterized genetic disorders are those for which the underlying causal gene has been identified, currently there are approximately 2200 such disorders annotated in the OMIM database,[12].

Studies of genetic disorders are often performed by means of family-based studies. In some instances population based approaches are employed, particularly in the case of so-called founder populations such as those in Finland, French-Canada, Utah, Sardinia, etc. Diagnosis and treatment of genetic disorders are usually performed by a geneticist-physician trained in clinical/medical genetics. The results of the Human Genome Project are likely to provide increased availability of genetic testing for gene-related disorders, and eventually improved treatment. Parents can be screened for hereditary conditions and counselled on the consequences, the probability it will be inherited, and how to avoid or ameliorate it in their offspring.

As noted above, there are many different kinds of DNA sequence variation, ranging from complete extra or missing chromosomes down to single nucleotide changes. It is generally presumed that much naturally occurring genetic variation in human populations is phenotypically neutral, i.e. has little or no detectable effect on the physiology of the individual (although there may be fractional differences in fitness defined over evolutionary time frames). Genetic disorders can be caused by any or all known types of sequence variation. To molecularly characterize a new genetic disorder, it is necessary to establish a causal link between a particular genomic sequence variant and the clinical disease under investigation. Such studies constitute the realm of human molecular genetics.

With the advent of the Human Genome and International HapMap Project, it has become feasible to explore subtle genetic influences on many common disease conditions such as diabetes, asthma, migraine, schizophrenia, etc. Although some causal links have been made between genomic sequence variants in particular genes and some of these diseases, often with much publicity in the general media, these are usually not considered to be genetic disorders per se as their causes are complex, involving many different genetic and environmental factors. Thus there may be disagreement in particular cases whether a specific medical condition should be termed a genetic disorder.

http://en.wikipedia.org/wiki/Human_genome

What are genetic markers?

Markers themselves usually consist of DNA that does not contain a gene, however they can tell a researcher the identity of the person a DNA sample came from. This makes markers extremely valuable for tracking inheritance of traits through generations of a family, and markers have also proven useful in criminal investigations and other forensic applications.

Although there are several different types of genetic markers, the type most used on genetic maps today is known as a microsatellite map. However, maps of even higher resolution are being constructed using single-nucleotide polymorphisms, or SNPs (pronounced "snips"). Both types of markers are easy to use with automated laboratory equipment, so researchers can rapidly map a disease or trait in a large number of family members.

The development of high-resolution, easy-to-use genetic maps, coupled with the HGP's successful sequencing and physical mapping of the entire human genome, has revolutionized genetics research. The improved quality of genetic data has reduced the time required to identify a gene from a period of years to, in many cases, a matter of months or even weeks. Genetic mapping data generated by the HGP's laboratories is freely accessible to scientists through databases maintained by the National Institutes of Health and the National Library of Medicine's National Center for Biotechnology Information (NCBI) [ncbi.nih.gov].

http://www.genome.gov/10000715

human genetic variation

Most studies of human genetic variation have focused on single nucleotide polymorphisms (SNPs), which are substitutions in individual bases along a chromosome. Most analyses estimate that SNPs occur on average somewhere between every 1 in 100 and 1 in 1,000 base pairs in the euchromatic human genome, although they do not occur at a uniform density. Thus follows the popular statement that "we are all, regardless of race, genetically 99.9% the same",[10] although this would be somewhat qualified by most geneticists. For example, a much larger fraction of the genome is now thought to be involved in copy number variation.[11] A large-scale collaborative effort to catalog SNP variations in the human genome is being undertaken by the International HapMap Project.

The genomic loci and length of certain types of small repetitive sequences are highly variable from person to person, which is the basis of DNA fingerprinting and DNA paternity testing technologies. The heterochromatic portions of the human genome, which total several hundred million base pairs, are also thought to be quite variable within the human population (they are so repetitive and so long that they cannot be accurately sequenced with current technology). These regions contain few genes, and it is unclear whether any significant phenotypic effect results from typical variation in repeats or heterochromatin.

Most gross genomic mutations in germ cells probably result in inviable embryos; however, a number of human diseases are related to large-scale genomic abnormalities. Down syndrome, Turner Syndrome, and a number of other diseases result from nondisjunction of entire chromosomes. Cancer cells frequently have aneuploidy of chromosomes and chromosome arms, although a cause and effect relationship between aneuploidy and cancer has not been established.

http://en.wikipedia.org/wiki/Human_genome


How do researchers create a genetic map?

To produce a genetic map, researchers collect blood or tissue samples from family members where a certain disease or trait is prevalent. Using various laboratory techniques, the scientists isolate DNA from these samples and examine it for the unique patterns of bases seen only in family members who have the disease or trait. These characteristic molecular patterns are referred to as polymorphisms, or markers.

Before researchers identify the gene responsible for the disease or trait, DNA markers can tell them roughly where the gene is on the chromosome. This is possible because of a genetic process known as recombination. As eggs or sperm develop within a person's body, the 23 pairs of chromosomes within those cells exchange - or recombine - genetic material. If a particular gene is close to a DNA marker, the gene and marker will likely stay together during the recombination process, and be passed on together from parent to child. So, if each family member with a particular disease or trait also inherits a particular DNA marker, chances are high that the gene responsible for the disease lies near that marker.

The more DNA markers there are on a genetic map, the more likely it is that one will be closely linked to a disease gene - and the easier it will be for researchers to zero-in on that gene. One of the first major achievements of the HGP was to develop dense maps of markers spaced evenly across the entire collection of human DNA.

http://www.genome.gov/10000715

Other DNA

Protein-coding sequences (specifically, coding exons) comprise less than 1.5% of the human genome.[3] Aside from genes and known regulatory sequences, the human genome contains vast regions of DNA the function of which, if any, remains unknown. These regions in fact comprise the vast majority, by some estimates 97%, of the human genome size. Much of this is composed of:

repeat elements


transposons


pseudogenes

However, there is also a large amount of sequence that does not fall under any known classification.

Much of this sequence may be an evolutionary artifact that serves no present-day purpose, and these regions are sometimes collectively referred to as "junk" DNA. There are, however, a variety of emerging indications that many sequences within are likely to function in ways that are not fully understood. Recent experiments using microarrays have revealed that a substantial fraction of non-genic DNA is in fact transcribed into RNA,[7] which leads to the possibility that the resulting transcripts may have some unknown function. Also, the evolutionary conservation across the mammalian genomes of much more sequence than can be explained by protein-coding regions indicates that many, and perhaps most, functional elements in the genome remain unknown.[8] The investigation of the vast quantity of sequence information in the human genome whose function remains unknown is currently a major avenue of scientific inquiry.[9]

http://en.wikipedia.org/wiki/Human_genome

What is genetic mapping?

Developing new and better tools to make gene hunts faster, cheaper and practical for any scientist was a primary goal of the Human Genome Project (HGP).

One of these tools is genetic mapping, the first step in isolating a gene. Genetic mapping - also called linkage mapping - can offer firm evidence that a disease transmitted from parent to child is linked to one or more genes. It also provides clues about which chromosome contains the gene and precisely where it lies on that chromosome.

Genetic maps have been used successfully to find the single gene responsible for relatively rare inherited disorders, like cystic fibrosis and muscular dystrophy. Maps have also become useful in guiding scientists to the many genes that are believed to interact to bring about more common disorders, such as asthma, heart disease, diabetes, cancer and psychiatric conditions.

http://www.genome.gov/10000715

Chromosomes and Genes

Chromosomes

The human genome is composed of 23 pairs of chromosomes (46 in total), each of which contain hundreds of genes separated by intergenic regions.  Intergenic regions may contain regulatory sequences and non-coding DNA.
The human genome is composed of 23 pairs of chromosomes (46 in total), each of which contain hundreds of genes separated by intergenic regions. Intergenic regions may contain regulatory sequences and non-coding DNA.

There are 24 distinct human chromosomes: 22 autosomal chromosomes, plus the sex-determining X and Y chromosomes. Chromosomes 1–22 are numbered roughly in order of decreasing size. Somatic cells usually have 23 chromosome pairs: one copy of chromosomes 1–22 from each parent, plus an X chromosome from the mother, and either an X or Y chromosome from the father, for a total of 46.

Genes

There are estimated 20,000–25,000 human protein-coding genes.[1]

Surprisingly, the number of human genes seems to be less than a factor of two greater than that of many much simpler organisms, such as the roundworm and the fruit fly. However, human cells make extensive use of alternative splicing to produce several different proteins from a single gene, and the human proteome is thought to be much larger than those of the aforementioned organisms.

Most human genes have multiple exons, and human introns are frequently much longer than the flanking exons.

Human genes are distributed unevenly across the chromosomes. Each chromosome contains various gene-rich and gene-poor regions, which seem to be correlated with chromosome bands and GC-content. The significance of these nonrandom patterns of gene density is not well understood. In addition to protein coding genes, the human genome contains thousands of RNA genes, including tRNA, ribosomal RNA, microRNA, and other non-coding RNA genes.

Regulatory sequences

The human genome has many different regulatory sequences which are crucial to controlling gene expression. These are typically short sequences that appear near or within genes. A systematic understanding of these regulatory sequences and how they together act as a gene regulatory network is only beginning to emerge from computational, high-throughput expression and comparative genomics studies.

Identification of regulatory sequences relies in part on evolutionary conservation. The evolutionary branch between the human and mouse, for example, occurred 70–90 million years ago.[4] So computer comparisons of gene sequences that identify conserved non-coding sequences will be an indication of their importance in duties such as gene regulation.[5]

Another comparative genomic approach to locating regulatory sequences in humans is the gene sequencing of the puffer fish. These vertebrates have essentially the same genes and regulatory gene sequences as humans, but with only one-eighth the "junk" DNA. The compact DNA sequence of the puffer fish makes it much easier to locate the regulatory genes.[6]

http://en.wikipedia.org/wiki/Human_genome

Human genome

The human genome is the genome of Homo sapiens, which is stored on 24 distinct chromosomes (22 autosomal + X + Y) containing an estimated 20,000–25,000 genes[1]. The entire human genome occupies a total of just over 3 billion DNA base pairs, and has a data size of approximately 750 Megabytes[2], which is slightly larger than the capacity of a standard Compact Disc.

The Human Genome Project has produced a reference sequence of the euchromatic human genome, which is used worldwide in biomedical sciences. The human genome had fewer genes than expected, with only about 1.5% coding for proteins, and the rest comprised by RNA genes, regulatory sequences, introns and controversially so-called junk DNA.[3]

A graphical representation of the normal human karyotype.
A graphical representation of the normal human karyotype.

http://en.wikipedia.org/wiki/Human_genome

Gene Mapping

Gene mapping involves determining the locations of genes within specific chromosomes. Two methods of gene mapping are used:

  • genetic mapping
  • physical mapping

Genetic mapping is used to determine the relative position of genes within a chromosome. This is measured by whether or not two genes are "linked". If both genes are inherited together they are considered linked. By determining which genes are linked, the relative positions of genes can be worked out.

Physical mapping involves determining the exact position of a specific gene within a chromosome. There are multiple techniques for accomplishing this, including creating cell hybrids for mapping out DNA in specific chromosomes.

http://www.genesolutions.com/page6.html

Genome

In biology the genome of an organism is its whole hereditary information and is encoded in the DNA (or, for some viruses, RNA). This includes both the genes and the non-coding sequences of the DNA. The term was coined in 1920 by Hans Winkler, Professor of Botany at the University of Hamburg, Germany. The Oxford English Dictionary suggests the name to be a portmanteau of the words gene and chromosome, however many related -ome words already existed, such as biome and rhizome, forming a vocabulary into which genome fit all too well.[1]

More precisely, the genome of an organism is a complete genetic sequence on one set of chromosomes; for example, one of the two sets that a diploid individual carries in every somatic cell. The term genome can be applied specifically to mean that stored on a complete set of nuclear DNA (i.e., the "nuclear genome") but can also be applied to that stored within organelles that contain their own DNA, as with the mitochondrial genome or the chloroplast genome. When people say that the genome of a sexually reproducing species has been "sequenced," typically they are referring to a determination of the sequences of one set of autosomes and one of each type of sex chromosome, which together represent both of the possible sexes. Even in species that exist in only one sex, what is described as "a genome sequence" may be a composite read from the chromosomes of various individuals. In general use, the phrase "genetic makeup" is sometimes used conversationally to mean the genome of a particular individual or organism. The study of the global properties of genomes of related organisms is usually referred to as genomics, which distinguishes it from genetics which generally studies the properties of single genes or groups of genes.

Both the number of base pairs and the number of genes vary widely from one species to another, and there is little connection between the two. At present, the highest known number of genes is around 60,000, for the protozoan causing trichomoniasis (see List of sequenced eukaryotic genomes), almost three times as many as in the human genome.

An analogy to the human genome stored on DNA is that of instructions stored in a book:

  • The book over one billion words long.
  • The book is bound 5000 volumes, each 300 pages long.
  • The book fits into a cell nucleus the size of a pinpoint.
  • A copy of the book (all 5000 volumes) is contained in every cell (except red blood cells) as a strand of DNA over two metres in length.

Types

Most biological entities are more complex than a virus sometimes or always carry additional genetic material besides that which resides in their chromosomes. In some contexts, such as sequencing the genome of a pathogenic microbe, "genome" is meant to include information stored on this auxiliary material, which is carried in plasmids. In such circumstances then, "genome" describes all of the genes and information on non-coding DNA that have the potential to be present.

In eukaryotes such as plants, protozoa and animals, however, "genome" carries the typical connotation of only information on chromosomal DNA. So although these organisms contain mitochondria that have their own DNA, the genes in this mitochondrial DNA are not considered part of the genome. In fact, mitochondria are sometimes said to have their own genome, often referred to as the "mitochondrial genome".


Genomes and genetic variation

Note that a genome does not capture the genetic diversity or the genetic polymorphism of a species. For example, the human genome sequence in principle could be determined from just half the information on the DNA of one cell from one individual. To learn what variations in genetic information underlie particular traits or diseases requires comparisons across individuals. This point explains the common usage of "genome" (which parallels a common usage of "gene") to refer not to the information in any particular DNA sequence, but to a whole family of sequences that share a biological context.

Although this concept may seem counter intuitive, it is the same concept that says there is no particular shape that is the shape of a cheetah. Cheetahs vary, and so do the sequences of their genomes. Yet both the individual animals and their sequences share commonalities, so one can learn something about cheetahs and "cheetah-ness" from a single example of either.

Genome projects

For more details on this topic, see Genome project.

The Human Genome Project was organized to map and to sequence the human genome. Other genome projects include mouse, rice, the plant Arabidopsis thaliana, the puffer fish, bacteria like E. coli, etc. In 1976, Walter Fiers at the University of Ghent (Belgium) was the first to establish the complete nucleotide sequence of a viral RNA-genome (bacteriophage MS2). The first DNA-genome project to be completed was the Phage Φ-X174, with only 5368 base pairs, which was sequenced by Fred Sanger in 1977 . The first bacterial genome to be completed was that of Haemophilus influenzae, completed by a team at The Institute for Genomic Research in 1995.

In May 2007, the New York Times announced that the full genome of DNA pioneer James D. Watson had been recorded.[1] The article noted that some scientists believe this to be the gateway to upcoming personalized genomic medicine.

Many genomes have been sequenced by various genome projects. The cost of sequencing continues to drop.

Comparison of different genome sizes

Main article: Genome size
Organism Genome size (base pairs) Note
Virus, Bacteriophage MS2 3569 b First sequenced RNA-genome[2]
Virus, SV40 5224 b [3]
Virus, Phage Φ-X174; 5386 b First sequenced DNA-genome[4]
Virus, Phage λ 50 kb
Bacterium, Haemophilus influenzae 1.83 Mb First genome of living organism, July 1995[5]
Bacterium, Carsonella ruddii 160 kb Smallest non-viral genome, Feb 2007
Bacterium, Buchnera aphidicola 600 kb
Bacterium, Wigglesworthia glossinidia 700 kb
Bacterium, Escherichia coli 4 Mb [6]
Amoeba, Amoeba dubia 670 Gb Largest known genome, Dec 2005
Plant, Arabidopsis thaliana 157 Mb First plant genome sequenced, Dec 2000.[7]
Plant, Genlisea margaretae 63.4 Mb Smallest recorded flowering plant genome, 2006.[7]
Plant, Fritillaria assyrica 130 Gb
Plant, Populus trichocarpa 480 Mb First tree genome, Sept 2006
Yeast,Saccharomyces cerevisiae 20 Mb [8]
Fungus, Aspergillus nidulans 30 Mb
Nematode, Caenorhabditis elegans 98 Mb First multicellular animal genome, December 1998[9]
Insect, Drosophila melanogaster aka Fruit Fly 130 Mb [10]
Insect, Bombyx mori aka Silk Moth 530 Mb
Insect, Apis mellifera aka Honey Bee 1.77 Gb
Fish, Tetraodon nigroviridis, type of Puffer fish 385 Mb Smallest vertebrate genome known
Mammal, Homo sapiens 3.2 Gb
Fish, Protopterus aethiopicus aka Marbled lungfish 130 Gb Largest vertebrate genome known

Note: The DNA from a single human cell has a length of ~1.8 m (but at a width of ~2.4 nanometers).

Since genomes and their organisms are very complex, one research strategy is to reduce the number of genes in a genome to the bare minimum and still have the organism in question survive. There is experimental work being done on minimal genomes for single cell organisms as well as minimal genomes for multicellular organisms (see Developmental biology). The work is both in vivo and in silico.

Genome evolution

Genomes are more than the sum of an organism's genes and have traits that may be measured and studied without reference to the details of any particular genes and their products. Researchers compare traits such as chromosome number (karyotype), genome size, gene order, codon usage bias, and GC-content to determine what mechanisms could have produced the great variety of genomes that exist today (for recent overviews, see Brown 2002; Saccone and Pesole 2003; Benfey and Protopapas 2004; Gibson and Muse 2004; Reese 2004; Gregory 2005).

Duplications play a major role in shaping the genome. Duplications may range from extension of short tandem repeats, to duplication of a cluster of genes, and all the way to duplications of entire chromosomes or even entire genomes. Such duplications are probably fundamental to the creation of genetic novelty.

Horizontal gene transfer is invoked to explain how there is often extreme similarity between small portions of the genomes of two organisms that are otherwise very distantly related. Horizontal gene transfer seems to be common among many microbes. Also, eukaryotic cells seem to have experienced a transfer of some genetic material from their chloroplast and mitochondrial genomes to their nuclear chromosomes.

http://en.wikipedia.org/wiki/Genome

A Gene Map of the Human Genome

The Human Genome Project is expected to produce a sequence of DNA representing the functional blueprint and evolutionary history of the human species. However, only about 3% of this sequence is thought to specify the portions of our 50,000 to 100,000 genes that encode proteins. Thus an important part of basic and applied genomics is to identify and localize these genes in a process known as transcript mapping. When genes are expressed, their sequences are first converted into messenger RNA transcripts, which can be isolated in the form of complementary DNAs (cDNAs). Approximately half of all human genes had been sampled as of 15 June, 1996.

A small portion of each cDNA sequence is all that is needed to develop unique gene markers, known as sequence tagged sites or STSs, which can be detected in chromosomal DNA by assays based on the polymerase chain reaction (PCR). To construct a transcript map, cDNA sequences from a master catalog of human genes were distributed to mapping laboratories in North America, Europe, and Japan. These cDNAs were converted to STSs and their physical locations on chromosomes determined on one of two radiation hybrid (RH) panels or a yeast artificial chromosome (YAC) library containing human genomic DNA. This mapping data was integrated relative to the human genetic map and then cross-referenced to cytogenetic band maps of the chromosomes. (Further details are available in the accompanying article in the 25 October issue of SCIENCE).

The histograms reflect the distributions and densities of genes along the chromosomes. Because the individual genes (>16,000) are too numerous to represent, images have been chosen to illustrate the myriad aspects of human biology, pathology, and relationships with other organisms that can be revealed by analysis of genes and their protein products.

http://www.ncbi.nlm.nih.gov/SCIENCE96/

Genomics

Genomics is the study of an organism's entire genome. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of single genes, their functions and roles, something very common in today's medical and biological research, and a primary focus of molecular biology, does not fall into the definition of genomics, unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.

History of the field

Genomics was established by Fred Sanger when he first sequenced the complete genomes of a virus and a mitochondrion. His group established techniques of sequencing, genome mapping, data storage, and bioinformatic analyses in 1970-1980s. A major branch of genomics is still concerned with sequencing the genomes of various organisms, but the knowledge of full genomes has created the possibility for the field of functional genomics, mainly concerned with patterns of gene expression during various conditions. The most important tools here are microarrays and bioinformatics. Study of the full set of proteins in a cell type or tissue, and the changes during various conditions, is called proteomics.

In 1972, Walter Fiers and his team at the Laboratory of Molecular Biology of the University of Ghent (Ghent, Belgium) were the first to determine the sequence of a gene: the gene for Bacteriophage MS2 coat protein.[1] In 1976, the team determined the complete nucleotide-sequence of bacteriophage MS2-RNA.[2] The first DNA-based genome to be sequenced in its entirety was that of bacteriophage Φ-X174; (5,368 bp), sequenced by Frederick Sanger in 1977[3]. The first free-living organism to be sequenced was that of Haemophilus influenzae (1.8 Mb) in 1995, and since then genomes are being sequenced at a rapid pace. A rough draft of the human genome was completed by the Human Genome Project in early 2001, creating much fanfare.

As of September 2007, the complete sequence was known of about 1879 viruses [4], 577 bacterial species and roughly 23 eukaryote organisms, of which about half are fungi. [5] Most of the bacteria whose genomes have been completely sequenced are problematic disease-causing agents, such as Haemophilus influenzae. Of the other sequenced species, most were chosen because they were well-studied model organisms or promised to become good models. Yeast (Saccharomyces cerevisiae) has long been an important model organism for the eukaryotic cell, while the fruit fly Drosophila melanogaster has been a very important tool (notably in early pre-molecular genetics). The worm Caenorhabditis elegans is an often used simple model for multicellular organisms. The zebrafish Brachydanio rerio is used for many developmental studies on the molecular level and the flower Arabidopsis thaliana is a model organism for flowering plants. The Japanese pufferfish (Takifugu rubripes) and the spotted green pufferfish (Tetraodon nigroviridis) are interesting because of their small and compact genomes, containing very little non-coding DNA compared to most species. [6] [7] The mammals dog (Canis familiaris), [8] brown rat (Rattus norvegicus), mouse (Mus musculus), and chimpanzee (Pan troglodytes) are all important model animals in medical research.


Bacteriophage Genomics

Bacteriophages have played and continue to play a key role in bacterial genetics and molecular biology. Historically, they were used to define gene structure and gene regulation. Also the first genome to be sequenced was a bacteriophage. However, bacteriophage research did not lead the genomics revolution, which is clearly dominated by bacterial genomics. Only very recently has the study of bacteriophage genomes become prominent, thereby enabling researchers to understand the mechanisms underlying phage evolution. Bacteriophage genome sequences can be obtained through direct sequencing of isolated bacteriophages, but can also be derived as part of microbial genomes. Analysis of bacterial genomes has shown that a substantial amount of microbial DNA consists of prophage sequences and prophage-like elements. A detailed database mining of these sequences offers insights into the role of prophages in shaping the bacterial genome.[9]

Cyanobacteria Genomics

At present there are 24 cyanobacteria for which a total genome sequence is available. 15 of these cyanobacteria come from the marine environment. These are six Prochlorococcus strains, seven marine Synechococcus strains, Trichodesmium erythraeum IMS101 and Crocosphaera watsonii [[WH8501. Several studies have demonstrated how these sequences could be used very successfully to infer important ecological and physiological characteristics of marine cyanobacteria. However, there are many more genome projects currently in progress, amongst those there are further Prochlorococcus and marine Synechococcus isolates, Acaryochloris and Prochloron, the N2-fixing filamentous cyanobacteria Nodularia spumigena, Lyngbya aestuarii and Lyngbya majuscula, as well as bacteriophages infecting marine cyanobaceria. Thus, the growing body of genome information can also be tapped in a more general way to address global problems by applying a comparative approach. Some new and exciting examples of progress in this field are the identification of genes for regulatory RNAs, insights into the evolutionary origin of photosynthesis, or estimation of the contribution of horizontal gene transfer to the genomes that have been analyzed.[10]

http://en.wikipedia.org/wiki/Genomics

Gene Mapping

"Gene mapping" refers to the mapping of genes to specific locations on chromosomes. It is a critical step in the understanding of genetic diseases. There are two types of gene mapping:

Genetic Mapping - using linkage analysis to determine the relative position between two genes on a chromosome.

Physical Mapping - using all available techniques or information to determine the absolute position of a gene on a chromosome.

The ultimate goal of gene mapping is to clone genes, especially disease genes. Once a gene is cloned, we can determine its DNA sequence and study its protein product. For example, cystic fibrosis (CF) is the most common lethal inherited disease in the United States. As many as 1 in 2500 Americans of Northern European descent carry a gene with CF. In 1985, the gene was mapped to chromosome 7q31-q32 by linkage analysis. Four years later, it was cloned by Francis Collins and his co-workers. We now know that the disease is caused by the defect of a chloride channel (web link) - the protein product of this disease gene.

Linkage analysis

The genetic mapping is based on the linkage between "loci" (locations of genes). If two loci are usually inherited together, they are said to be "linked". Two loci on different chromosomes are not linked, because they are usually separated by independent assortment.

A locus (singular of loci) may have different sequences, referred to as alleles. Consider two loci A and B, each having two alleles (one from mother, another from father). A1 and A2 are the two alleles of locus A ; B1 and B2 are the two alleles of locus B. Initially, A1 and B1 are located on the same chromosome. A2 and B2 are located on a different chromosome.

Figure 10-A-1. Illustration of recombination between two loci A and B. (a) Two pairs of sister chromatids align during meiosis. A1 and B1 are located on the same chromosome. A2 and B2 are located on a different chromosome. (b) DNA crossover leads to recombination if the chiasma is located between the two loci. (c) DNA crossover does not lead to recombination if the chiasma is not located between the two loci.

The DNA crossover may cause recombination of loci A and B. Namely, A1 and B2 (or A2 and B1) are located on the same chromosome. The recombination frequency depends on the distance between the two loci and the position of crossover (the chiasma). The closer they are, the less likely the recombination will occur, because recombination occurs only when the chiasma is located between the two loci.

To apply this basic principle to map a disease gene, we need to analyze the pedigree and estimate recombination frequency.

http://www.web-books.com/MoBio/Free/Ch10A.htm

Genetic linkage

Genetic linkage occurs when particular genetic loci or alleles for genes are inherited jointly. Genetic loci on the same chromosome are physically connected and tend to segregate together during meiosis, and are thus genetically linked. Alleles for genes on different chromosomes are usually not linked, due to independent assortment of chromosomes during meiosis.

Because there is some crossing over of DNA when the chromosomes segregate, alleles on the same chromosome can be separated and go to different daughter cells. There is a greater probability of this happening if the alleles are far apart on the chromosome, as it is more likely that a cross-over will occur between them.

The relative distance between two genes can be calculated using the offspring of an organism showing two linked genetic traits, and finding the percentage of the offspring where the two traits do not run together. The higher the percentage of descendants that does not show both traits, the further apart on the chromosome they are.

Among individuals of an experimental population or species, some phenotypes or traits occur randomly with respect to one another in a manner known as independent assortment. Today scientists understand that independent assortment occurs when the genes affecting the phenotypes are found on different chromosomes or separated by a great enough distance on the same chromosome that recombination occurs at least half of the time.

An exception to independent assortment develops when genes appear near one another on the same chromosome. When genes occur on the same chromosome, they are usually inherited as a single unit. Genes inherited in this way are said to be linked, and are referred to as "linkage groups." For example, in fruit flies the genes affecting eye color and wing length are inherited together because they appear on the same chromosome.

But in many cases, even genes on the same chromosome that are inherited together produce offspring with unexpected allele combinations. This results from a process called crossing over. At the beginning of normal meiosis, a chromosome pair (made up of a chromosome from the mother and a chromosome from the father) intertwine and exchange sections or fragments of chromosome. The pair then breaks apart to form two chromosomes with a new combination of genes that differs from the combination supplied by the parents. Through this process of recombining genes, organisms can produce offspring with new combinations of maternal and paternal traits that may contribute to or enhance survival.

Genetic linkage was first discovered by the British geneticists William Bateson and Reginald Punnett shortly after Mendel's laws were rediscovered.

Linkage mapping

The observations by Thomas Hunt Morgan that the amount of crossing over between linked genes differs led to the idea that crossover frequency might indicate the distance separating genes on the chromosome. Morgan's student Alfred Sturtevant developed the first genetic map, also called a linkage map.

Sturtevant proposed that the greater the distance between linked genes, the greater the chance that non-sister chromatids would cross over in the region between the genes. By working out the number of recombinants it is possible to obtain a measure for the distance between the genes. This distance is called a genetic map unit (m.u.), or a centimorgan and is defined as the distance between genes for which one product of meiosis in 100 is recombinant. A recombinant frequency (RF) of 1 % is equivalent to 1 m.u. A linkage map is created by finding the map distances between a number of traits that are present on the same chromosome, ideally avoiding having significant gaps between traits to avoid the inaccuracies that will occur due to the possibility of multiple recombination events.

Linkage mapping is critical for identifying the location of genes that cause genetic diseases. In an ideal population, genetic traits and markers will occur in all possible combinations with the frequencies of combinations determined by the frequencies of the individual genes. For example, if alleles A and a occur with frequency 90% and 10%, and alleles B and b at a different genetic locus occur with frequencies 70% and 30%, the frequency of individuals having the combination AB would be 63%, the product of the frequencies of A and B, regardless of how close together the genes are. However, if a mutation in gene B that causes some disease happened recently in a particular subpopulation, it almost always occurs with a particular allele of gene A if the individual in which the mutation occurred had that variant of gene A and there have not been sufficient generations for recombination to happen between them (presumably due to tight linkage on the genetic map). In this case, called linkage disequilibrium, it is possible to search potential markers in the subpopulation and identify which marker the mutation is close to, thus determining the mutation's location on the map and identifying the gene at which the mutation occurred. Once the gene has been identified, it can be targeted to identify ways to mitigate the disease.

Linkage map

A linkage map is a chromosome map of a species or experimental population that shows the position of its known genes and/or markers relative to each other in terms of recombination frequency, rather than as specific physical distance along each chromosome.

A genetic map is a map based on the frequencies of recombination between markers during crossover of homologous chromosomes. The greater the frequency of recombination (segregation) between two genetic markers, the farther apart they are assumed to be. Conversely, the higher the frequency of association between the markers, the smaller the physical distance between them. Historically, the markers originally used were detectable phenotypes (enzyme production, eye color) derived from coding DNA sequences; eventually, confirmed or assumed noncoding DNA sequences such as microsatellites or those generating restriction fragment length polymorphisms (RFLPs) have been used.

Genetic maps help researchers to locate other markers, such as other genes by testing for genetic linkage of the already known markers.

A genetic map is not a gene map.

LOD score method for estimating recombination frequency

The lod score (logarithm (base 10) of odds, also called logit by mathematicians) is a statistical test often used for linkage analysis in human populations, and also in animal and plant populations. The test was developed by Newton E. Morton. Computerized lod score analysis is a simple way to analyze complex family pedigrees in order to determine the linkage between mendelian traits (or between a trait and a marker, or two markers).

The method is described in greater detail by Strachan and Read [1]. Briefly, it works as follows:

  1. Establish a pedigree
  2. Make a number of estimates of recombination frequency
  3. Calculate a lod score for each estimate
  4. The estimate with the highest Lod score will be considered the best estimate

The Lod score is calculated as follows:

  \begin{align} LOD = Z & = \log{10} \frac{ \mbox{probability of birth sequence with a given linkage value} }{ \mbox{probability of birth sequence with no linkage} } \\  & = \log{10} \frac{(1-\theta)^{NR} \times \theta^R}{ 0.5^{(NR + R)} } \end{align}

NR denotes the number of non-recombinant offspring, and R denotes the number of recombinant offspring. The reason 0.5 is used in the denominator is that any alleles that are completely unlinked (e.g. alleles on separate chromosomes) have a 50% chance of recombination, due to independent assortment.

In practice, lod scores are looked up in a table which lists lod scores for various standard pedigrees and various values of recombination frequency.

By convention, a lod score greater than 3.0 is considered evidence for linkage. (A score of 3.0 means the likelihood of observing the given pedigree if the two loci are not linked is less than 1 in 1000). On the other hand, a lod score less than -2.0 is considered evidence to exclude linkage. Although it is very unlikely that a LOD score of 3 would be obtained from a single pedigree, the mathematical properties of the test allow data from a number of pedigrees to be combined by summing the LOD scores.

Recombination frequency

Recombination frequency (θ) is when crossing-over will take place between two loci (or genes) during meiosis. Recombination frequency is a measure of genetic linkage and is used in the creation of a genetic linkage map. A centimorgan (cM) is a unit that describes a recombination frequency of 1%.

During meiosis, chromosomes assort randomly into gametes, such that the segregation of alleles of one gene is independent of alleles of another gene. This is stated in Mendel's Second Law and is known as the law of independent assortment. The law of independent assortment always holds true for genes that are located on different chromosomes, but for genes that are on the same chromosome, it does not always hold true.

As an example of independent assortment, consider the crossing of the pure-bred homozygote parental strain with genotype AABB with a different pure-bred strain with genotype aabb. A and a and B and b represent the alleles of genes A and B. Crossing these homozygous parental strains will result in F1 generation offspring with genotype AaBb. The F1 offspring AaBb produces gametes that are AB, Ab, aB, and ab with equal frequencies (25%) due to the law of independent assortment. Note that 2 of the 4 gametes (50 %)—Ab and aB—were not present in the parental generation. These gametes represent recombinant gametes. Recombinant gametes are those gametes that differ from both of the haploid gametes that made up the diploid cell. In this example, the recombination frequency is 50% since 2 of the 4 gametes were recombinant gametes.

The recombination frequency will be 50% when two genes are located on different chromosomes or when they are widely separated on the same chromosome. This is a consequence of independent assortment.

When two genes are close together on the same chromosome, they do not assort independently and are said to be linked. Whereas genes located on different chromosomes assort independently and have a recombination frequency of 50%, linked genes have a recombination frequency that is less than 50%.

As an example of linkage, consider the classic experiment by William Bateson and Reginald Punnett. They were interested in trait inheritance in the sweet pea and were studying two genes—the gene for flower color (P, purple, and p, red) and the gene affecting the shape of pollen grains (L, long, and l, round). They crossed the pure lines PPLL and ppll and then self-crossed the resulting PpLl lines. According to Mendelian genetics, the expected phenotypes would occur in a 9:3:3:1 ratio of PL:Pl:pL:pl. To their surprise, they observed an increased frequency of PL and pl and a decreased frequency of Pl and pL (see chart below).

Bateson and Punnett experiment
Phenotype and genotype Observed Expected from 9:3:3:1 ratio
Purple, long (P_L_) 284 216
Purple, round (P_ll) 21 72
Red, long (ppL_) 21 72
Red, round (ppll) 55 24

Their experiment revealed linkage (or coupling) between the P and L alleles and the p and l alleles. The frequency of P occurring together with L and with p occurring together with l is greater than that of the recombinant Pl and pL. The recombination frequency cannot be computed directly from this experiment, but intuitively it is less than 50%.

The progeny in this case received two dominant alleles linked on one chromosome (referred to as coupling or cis arrangement). However, after crossover, some progeny could have received one parental chromosome with a dominant allele for one trait (eg Purple) linked to a recessive allele for a second trait (eg round) with the opposite being true for the other parental chromosome (eg red and Long). This is referred to as repulsion or a trans arrangement. The phenotype here would still be purple and long but a test cross of this individual with the recessive parent would produce progeny with much greater proportion of the two crossover phenotypes. While such a problem may not seem likely from this example, unfavorable repulsion linkages do appear when breeding for disease resistance in some crops.

When two genes are located on the same chromosome, the chance of a crossover producing recombination between the genes is directly related to the distance between the two genes. Thus, the use of recombinantion frequencies has been used to develop linkage maps or genetic maps.

http://en.wikipedia.org/wiki/Genetic_linkage