From sage to longsage incorporating the genomic level

Serial analysis of gene expression was devised to allow the analysis of mRNA expression without prior sequence information of the genes subjected to analysis (30; reviewed in ref. 31). A number of genes and pathways relevant for tumor biology were identified using SAGE (7,31-35). The 14-bp tag, including four fixed positions (CATG), used during conventional SAGE is sufficient to distinguish 1,048,576 different mRNAs. Because human or murine mRNA populations contain 30,000-50,000 different mRNAs, the 14-bp SAGE tag allows one to identify and quantify the correct, corresponding cDNA as shown by numerous validations of differential SAGE-tag expression using independent methods as Northern blotting and quantitative real-time polymerase chain reaction (PCR) (e.g., ref. 36). Nonetheless, some ambiguities occurred when using SAGE because in some cases single SAGE tags match several different mRNAs. Furthermore, the 14-bp tag is not sufficient to map the SAGE tag to the genome and thereby determine position and exons of previously unknown cDNAs (37). cDNAs representing genes expressed at low levels, which might have important regulatory functions, are often not represented in the expressed sequence tag (EST) databases and would be missed in a SAGE screen. Therefore, several techniques using the SAGE tag as a primer in an anchored PCR reaction to identify the cDNA in a gene-by-gene manner were proposed (34,38). It was estimated that approx 15,000 exons have not been confirmed through EST sequencing projects to date (37). Furthermore, calculations showed that a SAGE tag of 21-bp length would provide sufficient information to allow the direct mapping of the SAGE tag to the genome with a certainty of 99.83% (37). Accordingly, the SAGE protocol was modified by changing the type IIS restriction enzyme, which is used to release the LongSAGE tag from the 3' ends of cDNAs, from BsfmItoMmeI. This allowed one to retrieve SAGE tags of21-bp size (37). In a pilot experiment, a LongSAGE library of27,737 tags was generated from a colorectal cancer cell line (37). This library represented 3336 genes annotated in the Human Genome Project. However, an additional 1503 tags matched to exons, which had not been previously annotated, with 583 tags matching to internal exons and 920 to novel genes. In order to validate these results, the expression of 129 candidate genes was determined by reverse transcription (RT)-PCR. Thereby, the expression of123 predicted genes was confirmed (37). These results show that LongSAGE is a useful tool for the identification of novel genes over-expressed in cancer, which could include tumor markers or drug targets. An overview of the Long-SAGE approach is depicted in Fig. 1.

Fig. 1. Schematic ofthe LongSAGE method. Comparison oftag localizations to previously annotated genes can provide expression evidence for predicted genes and identify novel internal exons and previously uncharacterized genes. See text for details. (From ref. 37 with permission from Dr. Victor Velculescu, Nature publishing group: www.nature.com.)

Fig. 1. Schematic ofthe LongSAGE method. Comparison oftag localizations to previously annotated genes can provide expression evidence for predicted genes and identify novel internal exons and previously uncharacterized genes. See text for details. (From ref. 37 with permission from Dr. Victor Velculescu, Nature publishing group: www.nature.com.)

0 0

Post a comment