Reference genomes provide anchors for biological information

The availability of the human genome, along with numerous completed model organism genomes, presents challenges in the areas of efficient data storage, retrieval, and update when newer versions become available. On the other hand, genomes are natural data mining entry points, where genome assemblies can serve as common scaffolds upon which various biological datasets can be anchored and, thus, easily cross-referenced to each other. As a result, systematic chromosomal views of genomewide biology, including gene expression, chromosomal amplifications and deletions, SNPs, and evolutionary relationships to other species, become possible.

With the completion of the genome, all human genes can be accurately positioned on their chromosomes, enabling a high-resolution map in which the chromosomal positions of human expressed sequence tags (ESTs) ( and mRNA belonging to UniGene clusters ( db=unigene) are readily identified. Human transcriptome maps, integrating gene maps with genomewide messenger RNA expression profiles, can provide whole-genome views of gene expression to aid in identifying overexpressed or silenced chromosomal loci in cancer samples (30,31). These expression maps include those based on EST abundance (dbEST, and on serial analysis of gene expression tags (SAGE database, measured in various normal or diseased tissues.

Because genomic changes are believed to be the primary causes of cancer, the characterization of gene amplification and deletion through the measurement of DNA copy-number changes in tumors is important for the basic understanding of cancer (32,33), identification of therapeutic targets (34-37), and cancer diagnosis (38). Array-based comparative genomic hybridization (38) (CGH) has been developed for genomewide detection of chromosomal imbalances in tumor samples. The human genome sequence and genome-based high-resolution gene maps have greatly enhanced our ability to map DNA copy-number changes. In addition, the public CGH database ( sky/) has been set up to serve as a research platform for investigators to share and compare their datasets.

A key aspect of research in genetics is associating sequence variations with heritable phenotypes (39). The most common variations are SNPs, which occur approximately once every 100-300 bases in the human genome. Comprehensive SNP maps (The Cold Spring Harbor SNP collection; dbSNP, SNP/) can facilitate the cataloging and profiling of the unique sets of changes in different diseases. The availability of high-quality and high-density SNP maps has been enabled genome-scale correlations studies between SNPs and precancerous conditions (40,41), drug resistance in chemotherapy (42,43), cancer susceptibility (44,45), and drug response (46-49). This approach has the promise of significantly advancing our abilities to understand and treat cancer. A comprehensive review of the current SNP-related resources can be found at Human Genome Variation Database website (

Thanks to technological advances, large-scale sequencing projects can efficiently generate high-quality and high-volume genome sequences, from yeast to chimpanzees, at reasonable costs. Comparative analysis between human and model organism genomes can reveal regions of similarity and difference helping scientists to better understand the structure and function of human genes. All of the genome browsers (12,50,51), including Ensembl, UCSC genome browser, and NCBI map viewer, are organizing and integrating multispecies datasets of fine-grained DNA-DNA alignments, orthologous protein information, and large-scale synteny. Because the systematic comparison of genomic sequences from different organisms has become a central focus of current genome analysis, automatic comparative genome analysis and visualization will be a major focus of development for these genome browser projects.

0 0

Post a comment