|
|
||||||||
|
Plant Physiology 134:890-897 (2004) © 2004 American Society of Plant Biologists Robust-LongSAGE (RL-SAGE): A Substantially Improved LongSAGE Method for Gene Discovery and Transcriptome Analysis1,[w]Department of Plant Pathology, Ohio State University, Columbus, Ohio 43210 (M.G., C.J., G.-L.W.); and Fungal Genomics Laboratory, Department of Plant Pathology, North Carolina State University, Raleigh, North Carolina 27695 (R.A.D.)
Serial analysis of gene expression (SAGE) is a widely used technique for large-scale transcriptome analysis in mammalian systems. Recently, a modified version called LongSAGE (S. Saha, A.B. Sparks, C. Rago, V. Akmaev, C.J. Wang, B. Vogelstein, K.W. Kinzler [2002] Nat Biotechnol 20: 508-512) was reported by increasing tag length up to 21 bp. Although the procedures for these two methods are similar, a detailed protocol for LongSAGE library construction has not been reported yet, and several technical difficulties associated with concatemer cloning and purification have not been solved. In this study, we report a substantially improved LongSAGE method called Robust-LongSAGE, which has four major improvements when compared with the previously reported protocols. First, a small amount of mRNA (50 ng) was enough for a library construction. Second, enhancement of cDNA adapter and ditag formation was achieved through an extended ligation period (overnight). Third, only 20 ditag polymerase chain reactions were needed to obtain a complete library (up to 90% reduction compared with the original protocols). Fourth, concatemers were partially digested with NlaIII before cloning into vector (pZEro-1), greatly improving cloning efficiency. The significant contribution of Robust-LongSAGE is that it solved the major technical difficulties, such as low cloning efficiency and small insert sizes associated with existing SAGE and LongSAGE protocols. Using this protocol, one can generate two to three libraries, each containing over 4.5 million tags, within a month. We recently have constructed five libraries from rice (Oryza sativa), one from maize (Zea mays), and one from the rice blast fungus (Magnaporthe grisea).
Genome sequencing is becoming an emerging technology for large-scale gene discovery, and many prokaryotic and eukaryotic genomes have been completely sequenced in the last few years. Two model plant species have been sequenced recently: Arabidopsis for dicots (Arabidopsis Genome Initiative, 2000
Exhaustive sequencing of expressed sequence tags (ESTs) was the first method used for rapid identification of expressed genes and gene expression profiling (Adams et al., 1995
Compared with microarrays, serial analysis of gene expression (SAGE) allows both qualitative and quantitative evaluation of thousands of genes without any prior information (Velculescu et al., 1995
LongSAGE, a modified version of the conventional SAGE, was developed recently for both gene expression and genome annotation studies (Saha et al., 2002
Improvement in Initial mRNA Quantity and Ditag Formation
The major modifications of the RL-SAGE protocol are presented in Table I and briefly described as follows. The RL-SAGE libraries were constructed using a small amount of mRNA (50 ng), compared with 2 to 5 µg mRNA used in conventional SAGE and LongSAGE protocols. The synthesized cDNA was digested with NlaIII for 2.5 h at 37°C as compared with 1 h in conventional SAGE and LongSAGE. We used PCR primers specific for the rice actin (Act1) gene (McElroy et al., 1990
During the initial optimization stage, over 300 ditag PCR amplifications were performed, pooled, and precipitated according to instructions in the I-SAGE kit (Invitrogen). The precipitated ditag PCR products were electrophoresed on a 12% (w/v) polyacrylamide gel as reported in conventional SAGE papers and suggested in the I-SAGE kit protocol. Unexpectedly, it was found that the ditag (136 bp) and linker (100 bp) bands were not separated clearly for gel excision of ditags (data not shown). However, when ditag PCR products from each reaction were loaded directly on a 12% (w/v) polyacrylamide gel without pooling and precipitation, both ditag and linker bands were separated clearly (Fig. 2A). The increased concentration of acrylamide also helped in easy excision of ditag band. Complete digestion of ditags with NlaIII for 3 h was performed as compared with 1.5 h in conventional SAGE. Digested ditags were initially resolved on a 12% (w/v) polyacrylamide gel as recommended by the I-SAGE kit and conventional SAGE protocols. Again, the linker and ditag bands were not separated well for purification. Increased amounts of acrylamide to 16% (w/v) yielded clear separation of both ditag and linker bands (Fig. 2B). This higher acrylamide concentration also might have contributed to decrease the linker contamination and increase the cloning efficiency of concatemers in the subsequent ligation step. Purified ditags (40 bp) were further purified using half of the amount of streptavidin beads by vigorous mixing for 30 min without performing any other steps to remove contaminated linkers from the ditags, as recommended by Powell (1998
In RL-SAGE procedures, ditags (40 bp) with the NlaIII CATG overhangs were purified and selfligated for 3 h to produce longer molecules called "concatemers." Initially, we performed 300 ditag PCR amplifications and only obtained a library of 100 to 200 clones with an average insert size of 300 to 400 bp. The concatemer ligation mixture was then heated for 15 min at 65°C and quickly chilled on ice for 10 min, as recommended by Kenzelmann and Muhlemann (1999 We suspected that most of the concatemers became circular during the concatenation process and, thus, were not clonable in the pZEro-1 vector. To release the circularized concatemers, we partially digested them with NlaIII enzyme (37°C, 1 min), purified these concatemers on a 6% (w/v) polyacrylamide gel (Fig. 2C), and cloned into the SphI site of pZEro-1. Interestingly, the partial digestion significantly improved both ligation and transformation efficiency. Subsequently, we scaled down to 20 ditag PCR reactions, as compared with the over 300 ditag PCR reactions indicated in most of the SAGE publications. We obtained an average of 1.0 kb (approximately 50 tags) inserts from the >0.5-kb concatemer fraction (Fig. 2D) and 400 bp from the 0.3- to 0.5-kb concatemer fraction (data not shown). In total, we obtained 2.5 million tags (50,000 clones) from the >0.5-kb fraction and 2 million tags (100,000 clones) from the 0.3- to 0.5-kb fraction. Therefore, about 4.5 million tags could be captured in the library if all clones have been sequenced. We usually sequenced 5,000 to 7,000 individual clones per library because of high cost of sequencing. Using RL-SAGE protocol, one can construct two to three libraries simultaneously within a month from only 20 ditag PCR reactions, in comparison with 2 to 3 months required to construct just one conventional SAGE or LongSAGE library from over 300 ditag PCR reactions.
Three randomly selected RL-SAGE clones from the rice, maize, and blast fungus libraries (Fig. 2, E1-E3) were sequenced and analyzed. From the high-quality sequence of each clone, 40, 32, and 38 unique tags were extracted from the rice, maize, and blast fungus clone, respectively (Supplemental Table I). Except for one tag (5'-CATGTAACAGCGAGCAGGGCC-3', matched to Ramy1, accession no. AY072712) from the rice clone and one tag (5'-CATGGGATGGCCGGTTGTTAT-3', matched to EST accession no. CA408239) from the blast fungus clone had two identical copies, all other tags were unique. BLAST search in the GenBank showed that most of the tags had matches to either ESTs or genomic sequences or both (Supplemental Table I). About 34 of 40 and 29 of 38 of the tags from rice and blast fungus matched the ESTs or genomic sequences in the GenBank, respectively. In contrast, only 19 of 32 of tags derived from the maize library matched sequences in the NCBI database because fewer genomic and EST sequences from maize are available in the database. About 26 of 40, 18 of 32, and 16 of 38 of rice, maize, and blast fungus tags matched corresponding ESTs in the GenBank, respectively (Table II; Supplemental Table I), suggesting that at least 35% to 55% of the RL-SAGE tags from these libraries could be novel genes that have not been identified in the existing EST collections.
The SAGE transcript profiling method has enhanced the depth of transcriptome analysis 25- to 50-fold and reduced sequencing costs tremendously in comparison with the EST approach. In the last several years, it has been used widely in the biomedical community but underutilized in the plant community. There have been only few published conventional SAGE reports available for plants to date (Matsumura et al., 1999
The first change we made was to reduce the initial amount of mRNA required for cDNA synthesis. To overcome the high input requirement for initial RNA, several groups reported an alternative way to solve this problem such as SAGE-Lite (Peters et al., 1999
We found that a large quantity of mRNA used for cDNA synthesis may lead to an incomplete digestion of cDNA with both NlaIII and MmeI, which can generate multiple tags from the same transcript. If this occurs, it is difficult to distinguish these false tags generated by incomplete digestion from transcript variants such as splicing, antisense, etc. (Patankar et al., 2001
We discovered that concatemers became circular during concatenation process, which has not been reported or addressed previously. The present study resolved this problem by partial digestion of concatemers with NlaIII. The incubation period and amount of NlaIII were critical factors in the partial digestion. From the partial digestion, we obtained RL-SAGE libraries with average insert (concatemers) sizes of 1.0 kb (approximately 50 tags), which is equivalent to 70 tags per concatemer in conventional SAGE. Most conventional SAGE publications reported an average of 22 tags per concatemer (Powell, 1998
Another major problem in SAGE library construction is the high percentage of clones with small inserts (<200 bp) or empty clones. In many conventional SAGE publications, a tedious method of colony PCR screening of clones was followed to remove undesirable clones for sequencing. For example, Fujii and Amrein (2002
The RL-SAGE strategy (Figs. 1 and 2) is not only superior to conventional and LongSAGE, but also has some advantages over a novel transcriptome profiling method called massive parallel signature sequencing (MPSS; Brenner et al., 2000a
At present, RL-SAGE has two significant limitations. One is the high cost of sequencing of RL-SAGE clones, which prevents large-scale sequencing of an entire library. For example, a library of 20,000 clones (about 1 million tags) will cost at least $120,000 (assuming $6 per clone). This limitation could be solved in the near future with the improvement of DNA sequencing technology. Hopefully, novel technologies like sequencing by hybridization (Halperin et al., 2003 In summary, we have made several useful modifications that improved the efficiency of PCR amplification, ditag formation, and concatemer cloning. These modifications have greatly accelerated the RL-SAGE library construction. Twenty PCR reactions (50 µL) are sufficient to generate 4.5 million transcript tags from each RL-SAGE library. The partial digestion of concatemers has reduced number of ditag PCR reactions by over 90% as compared with the original protocol. Using this protocol, we generated five libraries from rice, one from maize, and one from blast fungus. Preliminary sequence analysis of three randomly selected clones from these libraries indicated that at least 35% to 55% of SAGE tags are novel. We believe that our RL-SAGE protocol will facilitate plant transcriptome analysis and will also accelerate discovery of novel genes and annotation of sequenced plant genomes such as rice and Arabidopsis.
Tissue and RNA Isolation
The rice (Oryza sativa) cv Nipponbare, whose genome has been sequenced, was used for RL-SAGE library construction (Goff et al., 2002
Because no detailed protocol has been published for LongSAGE library construction, we adopted procedures from conventional SAGE (Velculescu et al., 1995 Purified ditag cassettes were ligated together (16°C, 3 h) to generate longer molecules (concatemers). In addition, concatemers were partially digested with 10 units of NlaIII (37°C, 1 min), followed by immediate inactivation of the enzyme (75°C, 20 min). Digested concatemers were resolved on a 6% (w/v) polyacrylamide gel, and concatemer fractions ranging from 0.3 to 0.5 kb and over 0.5 kb were purified separately. To avoid DNA damage by UV, marker lanes were cut out from the gel and stained separately with ethidium bromide. The marker lanes were then UV photographed and aligned to their original positions for checking the size of concatemers in the unstained lane. The purified concatemers were cloned into the SphI site of the pZEro-1 plasmid (Invitrogen). The ligated mixture was transformed into TOP10F' electrocompetent cells (Invitrogen). Positive transformants were selected by plating on low-salt Luria-Bertani plates supplemented with Zeocin (50 µg mL-1; overnight, 37°C). The average concatemer's size was detected by PCR using M13 forward and reverse primers.
Each RL-SAGE library quality was checked by sequencing randomly selected clones at the Plant and Microbe Genome Facility (Ohio State University, Columbus). The sequence chromatographs were processed with Sequencher 4.1 (Gene Codes, Ann Arbor, MI) software. Ditags (40 bp) were extracted from a high-quality concatemer's sequence. Tag sizes (21 bp) were isolated manually from ditags, and a database homology search was performed using NCBI EST and genomic DNA sequences.
We thank John J. Dunn (Biology Department, Brookhaven National Laboratory, Upton, NY) for his valuable suggestions during the cloning of concatemers. We also thank Kenneth W. Kinzler and Victor E. Velculescu (Howard Hughes Medical Institute and the Sidney Kimmel Comprehensive Cancer Center, Baltimore) for valuable discussions during construction of RL-SAGE libraries. We are thankful to all members of our laboratory for their valuable help and discussion during this work. Critical reading of the manuscript by Rebecca Nelson (Cornell University, Ithaca, NY) is highly appreciated. Received October 6, 2003; returned for revision October 23, 2003; accepted November 6, 2003.
http://www.plantphysiol.org/cgi/doi/10.1104/pp.103.034496.
1 This work was supported by the National Science Foundation-Plant Genome Research Program (grant no. 115642).
[w] The online version of this article contains Web-only data. * Corresponding author; e-mail wang.620{at}osu.edu; fax 614-292-4455.
Adams MD, Kerlavage AR, Fleischmann RD, Fuldner RA, Bult CJ, Lee NH, Kirkness EF, Weinstock KG, Gocayne JD, White O et al. (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377: 3-174[Medline] Aldaz CM (2003) Serial analysis of gene expression (SAGE) in cancer research. In M Ladanyi, WL Gerald, eds, Expression Profiling of Human Tumors: Diagnostic and Research Applications. Humana Press, Totowa, NJ, pp 47-60 Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796-815[CrossRef][Medline]
Boon K, Osorio EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, Morin PJ, Buetow KH, Strausberg RL, De Souza SJ et al. (2002) An anatomy of normal and malignant gene expression. Proc Natl Acad Sci USA 99: 11287-11292 Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M et al. (2000a) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol 18: 630-634[CrossRef][ISI][Medline]
Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S et al. (2000b) In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. Proc Natl Acad Sci USA 97: 1665-1670
Chen J, Sun M, Lee S, Zhou G, Rowley JD, Wang SM (2002) Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags. Proc Natl Acad Sci USA 99: 12257-12262 Cho Y, Walbot V (2001) Computational methods for gene annotation: the Arabidopsis genome. Curr Opin Biotechnol 12: 126-130[CrossRef][ISI][Medline]
Datson NA, van der Perk-de Jong J, van den Berg MP, de Kloet ER, Vreugdenhil E (1999) MicroSAGE: a modified procedure for serial analysis of gene expression in limited amounts of tissue. Nucleic Acids Res 27: 1300-1307 Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM (1999) Expression profiling using cDNA microarrays. Nat Genet 21: 10-14[CrossRef][ISI][Medline] Fujii S, Amrein H (2002) Genes expressed in the Drosophila head reveal a role for fat cells in sex-specific physiology. EMBO J 21: 5353-5363[CrossRef][ISI][Medline] Gibbings JG, Cook BP, Dufault MR, Madden SL, Khuri S, Turnbull CJ, Dunwell JM (2003) Global transcript analysis of rice leaf and seed using SAGE technology. Plant Biotechnol J 1: 271-285[CrossRef][Medline]
Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92-100 Halperin E, Halperin S, Hartman T, Shamir R (2003) Handling long targets and errors in sequencing by hybridization. J Comput Biol 10: 483-497[Medline] Jones SJ, Riddle DL, Pouzyrev AT, Velculescu VE, Hillier L, Eddy SR, Stricklin SL, Baillie DL, Waterston R, Marra MA (2002) Changes in gene expression associated with developmental arrest and longevity in Caenorhabditis elegans. Genome Res 11: 1346-1352 Jung SH, Lee JY, Lee DH (2003) Use of SAGE technology to reveal changes in gene expression in Arabidopsis leaves undergoing cold stress. Plant Mol Biol 52: 553-567[CrossRef][ISI][Medline]
Kenzelmann M, Muhlemann K (1999) Substantially enhanced cloning efficiency of SAGE (serial analysis of gene expression) by adding a heating step to the original protocol. Nucleic Acids Res 27: 917-918 Lorenz WW, Dean JF (2002) SAGE Profiling and demonstration of differential gene expression along the axial developmental gradient of lignifying xylem in loblolly pine (Pinus taeda). Tree Physiol 22: 301-310[ISI][Medline] Matsumura H, Nirasawa S, Terauchi R (1999) Technical advance: transcript profiling in rice (Oryza sativa L.) seedlings using serial analysis of gene expression (SAGE). Plant J 20: 719-726[CrossRef][ISI][Medline]
McElroy D, Zhang W, Cao J, Wu R (1990) Isolation of an efficient actin promoter for use in rice transformation. Plant Cell 2: 163-171 Mitchell TK, Thon MR, Jeong J-S, Brown D, Deng J, Dean RA (2003) The rice blast pathosystem as a case study for the development of new tools and raw materials for genome analysis of fungal plant pathogens. New Phytol 159: 53-61[CrossRef] Neilson L, Andalibi A, Kang D, Coutifaris C, Strauss JF 3rd, Stanton JA, Green DP (2000) Molecular phenotype of the human oocyte by PCR-SAGE. Genomics 63: 13-24[CrossRef][ISI][Medline]
Patankar S, Munasinghe A, Shoaibi A, Cummings LM, Wirth DF (2001) Serial analysis of gene expression in Plasmodium falciparum reveals the global expression profile of erythrocytic stages and the presence of anti-sense transcripts in the malarial parasite. Mol Biol Cell 12: 3114-3125
Peters DG, Kassam AB, Yonas H, O'Hare EH, Ferrell RE, Brufsky AM (1999) Comprehensive transcript analysis in small quantities of mRNA by SAGE-Lite. Nucleic Acids Res 27: e39
Powell J (1998) Enhanced concatemer cloning: a modification to the SAGE (serial analysis of gene expression) technique. Nucleic Acids Res 26: 3445-3446 Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20: 508-512[CrossRef][ISI][Medline]
Tomkins JP, Davis G, Main D, Yim Y, Duru N, Musket T, Goicoechea JL, Frisch DA, Coe EH Jr, Wing RA (2002) Construction and characterization of a deep-coverage bacterial artificial chromosome library for maize. Crop Sci 42: 928-933
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270: 484-487
Vilain C, Libert F, Venet D, Costagliola S, Vassart GR (2003) Small amplified RNA-SAGE: an alternative approach to study transcriptome from limiting amount of mRNA. Nucleic Acids Res 31: e24
Virlon B, Cheval L, Buhler JM, Billon E, Doucet A, Elalouf JM (1999) Serial microanalysis of renal transcriptomes. Proc Natl Acad Sci USA 96: 15286-15291
Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79-92
Yuan Q, Quackenbush J, Sultana R, Pertea M, Salzberg SL, Buell CR (2001) Rice bioinformatics: analysis of rice sequence data and leveraging the data to other plant species. Plant Physiol 125: 1166-1174
Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, Vogelstein B, Kinzler KW (1997) Gene expression profiles in normal and cancer cells. Science 276: 1268-1272 This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||