|
Plant Physiol, October 2001, Vol. 127, pp. 386-389
SCIENTIFIC CORRESPONDENCE
Development and Characterization of Genome-Wide Single
Nucleotide Polymorphism Markers in the Green Alga Chlamydomonas
reinhardtii1
Valentina S.
Vysotskaia,*
Damian E.
Curtis,
Alexander V.
Voinov,
Pushpa
Kathir,2
Carolyn D.
Silflow, and
Paul A.
Lefebvre
Exelixis, Inc., 170 Harbor Way, P.O. Box 511, South San Francisco,
California 94083-0511 (V.S.V., D.E.C., A.V.V.); and Departments of
Genetics, Cell Biology and Development and Plant Biology, University of
Minnesota, 1445 Gortner Avenue, St. Paul, Minnesota 55108 (P.K.,
C.D.S., P.A.L.)
 |
INTRODUCTION |
Chlamydomonas
reinhardtii is a unicellular green alga, with a
genome estimated to be around 100 Mbp (Harris, 1989 ). Similar to yeast,
C. reinhardtii has well-understood haploid
genetics, but unlike yeast it has both a chloroplast and flagella.
These unique properties make it a powerful model system for studying fundamental cellular and molecular biology questions concerning photosynthesis, flagellar motility, and basal body function (for review, see Lefebvre and Silflow, 1999 ; Grossman, 2000 ). The
photosynthetic mechanisms for CO2 fixation in
C. reinhardtii are very similar to those used in
vascular plants. However, unlike higher plants, C. reinhardtii can be grown heterotrophically on acetate as a sole source of carbon. As a result, mutants with defective
photosynthesis are readily isolated and characterized. When growing
heterotrophically, dark-grown cells exhibit normal photosynthetic
capability and chloroplast development. In addition, C. reinhardtii can be used as a model system to identify the
molecular targets of herbicides and metabolic inhibitors (Harris,
1989 ).
Currently, genomic and genetic resources for C. reinhardtii include around 65,000 expressed sequence tags
(ESTs; http://www.kazusa.or.jp/en/plant/chlamy/EST/; http://www.biology.duke.edu/chlamygenome/EST.html), a
bacterial artificial chromosome library (Lefebvre et al., unpublished
data), and an RFLP map aligned with the genetic map (Silflow, 1998 ;
Kathir et al., unpublished data). The complete genomic sequence is not available, and construction of a physical map linked to the genetic map
is in progress (Lefebvre and Silflow, 1999 ). The RFLP linkage map was
constructed from a mapping population generated by crossing the
standard laboratory strain of C. reinhardtii (Smith 137C, isolated in Massachusetts in 1945) and the Minnesota isolate S1D2 (Gross et al., 1988 ). This map includes 250 markers, identifying each
of the 17 linkage groups (Silflow, 1998 ). However, genotyping with RFLP
markers is labor intensive and time consuming. To develop a
high-throughput, inexpensive system for genetic mapping, we converted
RFLP markers to single nucleotide polymorphism (SNP) markers. SNP
markers can be assayed rapidly and easily using a wide variety of
techniques, including a template-directed dye-terminator incorporation
assay with fluorescence polarization detection (Chen et al., 1999 ),
pyrosequencing (Alderborn et al., 2000 ), oligonucleotide-specific ligation (Tobe et al., 1996 ), molecular beacons (Marras et al., 1999 ), dynamic allele-specific hybridization (Prince et al., 2001 ), the
Taq-Man system (Livak, 1999 ), mass spectrometry (Stoerker et al.,
2000 ), and oligonucleotide arrays (Hirschhorn et al., 2000 ; Pastinen et
al., 2000 ).
Here, we report the development of a collection of 186 SNP markers in
C. reinhardtii. Sequence information and further
details concerning these markers are available on the web site of Duke University/the Chlamydomonas Genetics Center
(http://www. biology.duke.edu/chlamy/). We also characterized DNA
polymorphisms between the mapping strains 137C and S1D2 and evaluated
C. reinhardtii EST data as a source for
additional SNP markers.
 |
ASSESSMENT OF POLYMORPHISMS BETWEEN C.
reinhardtii STRAINS |
The 137C and S1D2 strains of C. reinhardtii used to
generate the RFLP map are strains commonly used by the C. reinhardtii research community. We evaluated the level of
sequence polymorphism between these strains by comparing short
stretches of DNA sequence. We used 137C genomic and cDNA sequence
information from GenBank to design 34 sequence-tagged sites (STSs),
ranging between 300 and 500 bp and representing exons, introns, and 3'
untranslated regions (UTR). Using genomic DNA from both strains as
templates, STSs were amplified, sequenced using the PCR primers,
and analyzed by BLAST (Altschul et al., 1990 ) or the phred/phrap/consed
package (Ewing et al., 1998 ; Gordon et al., 1998 ). For each STS entry, gene name, accession number, gene region, and STS size as well as
number of polymorphic sites, single-base changes (SNPs), larger substitutions (affecting more then 1 base), small
(6 bp) deletions/insertions are shown in Table
I. To ensure high accuracy,
sequence variations were confirmed by visual inspection of the traces
from both strains. The results shown in Table I indicate that the S1D2
strain of C. reinhardtii is highly polymorphic with the 137C
strain. In total, we detected 248 polymorphic loci out of 11,651 total
bases examined, representing an average of one sequence variation per 47 bp. More differences were observed in non-coding regions (introns and 3'UTR) than in coding regions. Because most STSs were designed from
3'UTR and only a few from exons, a statistical analysis of the
polymorphism distribution between coding/non-coding regions cannot be
performed. The majority of polymorphic loci were single-nucleotide substitutions. The ratio between transitions and transversions was
roughly 2:1. Only two (6%) of 34 analyzed STSs showed large deletions/insertions (43 and 140 bp, respectively). SNPs were not
randomly distributed across the STS loci. The number of SNPs per STS
ranged from none at three loci to 33 at the gene encoding gamete lytic
enzyme, which is a zinc metalloprotease mediating digestion of
the cell walls during mating (Kinoshita et al., 1992 ). Such local
variation in polymorphism rate may arise because some loci are
inherently more mutable than others. This phenomenon has been described
in Caenorhabditis elegans (Koch et al., 2000 ), Mus
musculus (Lindblad-Toh et al., 2000 ), and Drosophila
melanogaster (Hoskins et al., 2001 ). To develop SNP markers
from all available RFLP probes, we concentrated only on single
nucleotide substitutions and preferentially designed STSs from the
3'UTR to avoid possible cross-amplification from closely related
genes.
DEVELOPING SNP MARKERS FROM RFLP PROBES
The molecular map of the C. reinhardtii
genome includes 250 RFLP markers. Specific genes and random genomic and
cDNA clones have been used as RFLP probes to develop these markers. At
the time this project was initiated, sequence information from 68 of
the 250 RFLP probes was known and available from GenBank. These sequences represent genomic clones and cDNAs of known genes from the
C. reinhardtii strain 137C. To identify SNPs from
these 68 RFLP markers, we designed short (on average 250 bp) STSs using available sequence information. We also sequenced 100 RFLP probes representing the 137C random genomic DNA or cDNA clones and then designed STSs based on the obtained sequence information. Approximately 60% of the primer pairs designed from the 137C sequences yielded robust specific products in the S1D2 strain. Presumably PCR success is
lowered because of the high level of polymorphism between C. reinhardtii strains. Often we had to redesign primers in
order to obtain an amplified product from the S1D2 strain. The
extensive sequence polymorphisms present a difficulty in using a
locus-specific amplification approach for SNP discovery. This approach
requires the oligonucleotide primer synthesis for each locus based on
sequence information from the laboratory strain, and many primers would fail for a polymorphic strain with such a high level of sequence variation. Alternative approaches such as the reduced representation shotgun (Altshuler et al., 2000 ) and EST analysis (Picoult-Newberg et
al., 1999 ) do not require previous sequence knowledge or PCR. However,
these alternative approaches require the developed SNPs to be
genetically mapped, whereas the RFLP probes have been mapped previously.
We started with the phred/phrap/consed package (Ewing et al., 1998 ;
Gordon et al., 1998 ) for identification of sequence variations, but we
found many cases in which the 137C and S1D2 reads corresponding to the same locus could not be assembled by phrap due to significant nucleotide sequence differences between strains. To solve this problem,
we switched from phrap to ClustalW (Thompson et al., 1994 ). A multiple
alignment composed of the 137C and S1D2 reads was scanned by a custom
script for the presence of sequence variations. A fuzzy set theory
approach (Zadeh, 1975 ) was used to discern whether the variations
represent an SNP or sequencing error. Potential SNPs were confirmed by
visual inspection of the traces from both strains. A total of 156 RFLP
markers were converted to SNP markers, generating an average marker
density of one SNP marker per 500 kb.
SNP IDENTIFICATION AND VALIDATION FROM PUBLIC EST DATA
To increase the density of the SNP markers, publicly available EST
data were scanned for sequence variations. At the time this project was
initiated, the C. reinhardtii EST database
consisted of 21,971 and 1,550 reads for 137C and S1D2 strains,
respectively. EST reads were clustered into 539 contigs, which
contained at least one read for both 137C and S1D2 strains. Of the 539 contigs, 170 contained more than one S1D2 read. For SNP
identification we focused on contigs containing at least one read from
both strains, 137C and S1D2. We identified approximately 200 contigs
with potential SNPs. Because traces were not available, we could not
distinguish true SNPs from false positives caused by sequencing errors.
To assess the accuracy of SNP discovery without read quality
information, we randomly selected 48 SNP-containing contigs for
experimental confirmation. Regions surrounding the putative SNPs were
PCR-amplified from 137C and S1D2 genomic DNA and resequenced. Amplified
PCR products from 35 of 48 (73%) primer pairs were evaluated by direct sequencing. Introns or sequencing errors may have prevented the other
primer pairs from producing product since the primers were designed
from the EST sequences. Among the 35 successful PCR products, 30 (86%)
contained SNPs at the predicted positions. In many cases more than one
SNP per PCR product was detected. The 30 SNPs identified by this
approach result in an overall yield of 62%. These results indicate
that ESTs currently available in GenBank could provide more then 125 additional SNP markers, which can be mapped genetically. Information on the 30 EST markers is available at
http://www.biology.duke.edu/chlamy/. We conclude that the growing EST
database for C. reinhardtii will be very useful
for identifying new SNPs.
CONCLUDING REMARKS
Genome-wide SNP markers are now being developed in model
organisms such as M. musculus (Lindblad-Toh et al., 2000 ),
C. elegans (Koch et al., 2000 ), Arabidopsis (Cho et al.,
1999 ; http://www.Arabidopsis.org/Cereon/index.html), and D. melanogaster (Hoskins et al., 2001 ). SNPs are undoubtedly an
important tool for modern genetic analyses in any organism and
significantly increase the efficiency of map-based cloning of genes of
interest. In this study, 156 genome-wide SNP markers have been
developed in C. reinhardtii by analyzing RFLP
markers with known map position. This approach automatically provides map positions for identified SNPs. This collection will be of immediate
value to the C. reinhardtii research community
and is an important first step toward the production of a larger map. It would be valuable to increase the density of SNPs by 2- to 3-fold to
obtain dense coverage throughout the genome and to cover existing gaps
on the map. To develop additional SNP markers, we evaluated publicly
available EST data as a potential source for SNP discovery. Based on
our results, the current set of SNP markers could be nearly doubled
with minimal effort. Increasing the number of S1D2 ESTs would also
identify additional SNPs. The C. reinhardtii community has shown strong enthusiasm for sequencing the entire genome;
thus, mapping of discovered SNPs will be no problem in the near future.
 |
ACKNOWLEDGMENTS |
We would like to thank the Exelixis sequencing group and Plant
Genetics group for their support. We are also grateful to Drs. John
Davies, Andreas Gnirke, Karin Schmitt, and Nancy Federspiel for
helpful comments on this manuscript.
 |
FOOTNOTES |
Received May 31, 2001; accepted June 15, 2001.
1
This work was supported by the National
Institutes of Health (grant no. GM34437 to P.A.L.) and the National
Science Foundation (grant no. NSF/MCB-9975765 to P.A.L. and
C.D.S.).
2
Present address: 10704 Dundas Oak Court, Burke, VA 22015.
*
Corresponding author; e-mail vvs{at}exelixis.com; fax
650- 837-8204.
www.plantphysiol.org/cgi/doi/10.1104/pp.010485.
 |
LITERATURE CITED |
-
Alderborn A, Kristofferson A, Hammerling U
(2000)
Genome Res
10: 1249-1258[Abstract/Free Full Text]
-
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ
(1990)
J Mol Biol
215: 403-410[CrossRef][ISI][Medline]
-
Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, Lander ES
(2000)
Nature
407: 513-516[CrossRef][Medline]
-
Chen X, Levine L, Kwok P-Y
(1999)
Genome Res
9: 492-498[Abstract/Free Full Text]
-
Cho RJ, Mindrinos M, Richards DR, Sapolsky RJ, Anderson M, Drenkard E, Dewdney J, Reuber TL, Stammers M, Federspiel N
(1999)
Nat Genet
23: 203-207[CrossRef][ISI][Medline]
-
Ewing B, Hillier L, Wendl MC, Green P
(1998)
Genome Res
8: 175-185[Abstract/Free Full Text]
-
Gordon D, Abajian C, Green P
(1998)
Genome Res
8: 195-202[Abstract/Free Full Text]
-
Gross CH, Ranum LP, Lefebvre PA
(1988)
Curr Genet
13: 503-508[CrossRef][Medline]
-
Grossman A
(2000)
Curr Opin Plant Biol
3: 132-137[CrossRef][Medline]
-
Harris E
(1989)
The Chlamydomonas Sourcebook. Academic Press, New York
-
Hirschhorn JN, Sklar P, Lindblad-Toh K, Lim YM, Ruiz-Gutierrez M, Bolk S, Langhorst B, Schaffner S, Winchester E, Lander ES
(2000)
Proc Natl Acad Sci USA
97: 12164-12169[Abstract/Free Full Text]
-
Hoskins RA, Phan AC, Naeemuddin M, Mapa FA, Ruddy DA, Ryan JJ, Young LM, Wells T, Kopczynski C, Ellis MC
(2001)
Genome Res
11: 1100-1113[Abstract/Free Full Text]
-
Kinoshita T, Fukuzawa H, Shimada T, Saito T, Matsuda Y
(1992)
Proc Natl Acad Sci USA
89: 4693-4697[Abstract/Free Full Text]
-
Koch R, van Luenen HG, van Der Horst M, Thijssen KL, Plasterk RH
(2000)
Genome Res
10: 1690-1696[Abstract/Free Full Text]
-
Lefebvre P, Silflow C
(1999)
Genetics
151: 9-14[Free Full Text]
-
Lindblad-Toh K, Winchester E, Daly MJ, Wang DG, Hirschhorn JN, Laviolette JP, Ardlie K, Reich DE, Robinson E, Sklar P
(2000)
Nat Genet
24: 381-386[CrossRef][ISI][Medline]
-
Livak KJ
(1999)
Genet Anal
14: 143-149[Medline]
-
Marras SA, Kramer FR, Tyagi S
(1999)
Genet Anal
14: 151-156[Medline]
-
Pastinen T, Raitio M, Lindroos K, Tainola P, Peltonen L, Syvanen AC
(2000)
Genome Res
10: 1031-1042[Abstract/Free Full Text]
-
Picoult-Newberg L, Ideker TE, Pohl MG, Taylor SL, Donaldson MA, Nickerson DA, Boyce-Jacino M
(1999)
Genome Res
9: 167-174[Abstract/Free Full Text]
-
Prince JA, Feuk L, Howell WM, Jobs M, Emahazion T, Blennow K, Brookes AJ
(2001)
Genome Res
11: 152-162[Abstract/Free Full Text]
-
Silflow CD
(1998)
Organization of the nuclear genome.
In
J-D Rochaix, M Goldschmidt-Clermont, S Merchant, eds, The Molecular Biology of Chloroplasts and Mitochondria in Chlamydomonas. Kluwer Academic Publishers, Dordrecht, The Netherlands, pp 25-40
-
Stoerker J, Mayo JD, Tetzlaff CN, Sarracino DA, Schwope I, Richert C
(2000)
Nat Biotechnol
18: 1213-1216[CrossRef][ISI][Medline]
-
Thompson JD, Higgins DG, Gibson TJ
(1994)
Nucleic Acids Res
22: 4673-4680[Abstract/Free Full Text]
-
Tobe VO, Taylor SL, Nickerson DA
(1996)
Nucleic Acids Res
24: 3728-3732[Abstract/Free Full Text]
-
Zadeh L
(1975)
Info Sci
8: 199-249[CrossRef]
© 2001 American Society of Plant Physiologists
This article has been cited by other articles:

|
 |

|
 |
 
K. Metfies and L. K. Medlin
Feasibility of Transferring Fluorescent In Situ Hybridization Probes to an 18S rRNA Gene Phylochip and Mapping of Signal Intensities
Appl. Envir. Microbiol.,
May 1, 2008;
74(9):
2814 - 2821.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. R. Grossman
Paths toward Algal Genomics
Plant Physiology,
February 1, 2005;
137(2):
410 - 427.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. A. Rymarquis, J. M. Handley, M. Thomas, and D. B. Stern
Beyond Complementation. Map-Based Cloning in Chlamydomonas reinhardtii
Plant Physiology,
February 1, 2005;
137(2):
557 - 566.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. R. Grossman, E. E. Harris, C. Hauser, P. A. Lefebvre, D. Martinez, D. Rokhsar, J. Shrager, C. D. Silflow, D. Stern, O. Vallon, et al.
Chlamydomonas reinhardtii at the Crossroads of Genomics
Eukaryot. Cell,
December 1, 2003;
2(6):
1137 - 1150.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. K. Bowers, J. A. Keller, and S. K. Dutcher
Molecular Markers for Rapidly Identifying Candidate Genes in Chlamydomonas reinhardtii: ERY1 and ERY2 Encode Chloroplast Ribosomal Proteins
Genetics,
August 1, 2003;
164(4):
1345 - 1353.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. Kathir, M. LaVoie, W. J. Brazelton, N. A. Haas, P. A. Lefebvre, and C. D. Silflow
Molecular Map of the Chlamydomonas reinhardtii Nuclear Genome
Eukaryot. Cell,
April 1, 2003;
2(2):
362 - 379.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Shrager, C. Hauser, C.-W. Chang, E. H. Harris, J. Davies, J. McDermott, R. Tamse, Z. Zhang, and A. R. Grossman
Chlamydomonas reinhardtii Genome Project. A Guide to the Generation and Use of the cDNA Information
Plant Physiology,
February 1, 2003;
131(2):
401 - 408.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. K. Dutcher, N. S. Morrissette, A. M. Preble, C. Rackley, and J. Stanga
epsilon -Tubulin Is an Essential Component of the Centriole
Mol. Biol. Cell,
November 1, 2002;
13(11):
3859 - 3869.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. A. Swan, D. E. Curtis, K. B. McKusick, A. V. Voinov, F. A. Mapa, and M. R. Cancilla
High-Throughput Gene Mapping in Caenorhabditis elegans
Genome Res.,
July 1, 2002;
12(7):
1100 - 1105.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. R. Mahjoub, B. Montpetit, L. Zhao, R. J. Finst, B. Goh, A. C. Kim, and L. M. Quarmby
The FA2 gene of Chlamydomonas encodes a NIMA family kinase with roles in cell cycle progression and microtubule severing during deflagellation
J. Cell Sci.,
April 15, 2002;
115(8):
1759 - 1768.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. R. Hicks, C. M. Hironaka, D. Dauvillee, R. P. Funke, C. D'Hulst, S. Waffenschmidt, and S. G. Ball
When Simpler Is Better. Unicellular Green Algae for Discovering New Genes and Functions in Carbohydrate Metabolism
Plant Physiology,
December 1, 2001;
127(4):
1334 - 1338.
[Full Text]
[PDF]
|
 |
|
|
|