|
|
||||||||
|
Plant Physiology 136:3009-3022 (2004) © 2004 American Society of Plant Biologists Transcriptional Similarities, Dissimilarities, and Conservation of cis-Elements in Duplicated Genes of Arabidopsis1,[w]Munich Information Center for Protein Sequences, Institute for Bioinformatics, GSF Research Center for Environment and Health, 85758 Neuherberg, Germany (G.H., T.H., K.M.); and Department of Plant and Soil Sciences, Delaware Biotechnology Institute, Newark, Delaware 19711 (B.C.M.)
In plants, duplication of individual genes, long chromosomal regions, and complete genomes provides a major source for evolutionary innovation. We investigated two different types of duplications, tandem and segmental duplications, in Arabidopsis for correlation, conservation, and differences of expression characteristics by making use of large genome-wide expression data as measured by the massively parallel signature sequencing method. Our analysis indicates that large fractions of duplicated gene pairs still share transcriptional characteristics. However, our results also indicate that expression divergence occurs frequently between duplicated gene pairs, a process which frequently might be employed for the retention of sequence redundant gene pairs. Preserved overall similarity between promoters of duplicated genes as well as preservation of individual cis-elements within the respective promoters indicates that the process of transcriptional neo- and subfunctionalization is restricted to only a fraction of cis-elements. We show that sequence similarities and shared regulatory properties within duplicated promoters provide a powerful means to undertake large-scale cis-regulatory element identification by applying an intragenomic phylogenetic footprinting approach. Our work lays a foundation for future comparative studies to elucidate the molecular manifestation of regulatory similarities and dissimilarities of duplicated genes.
Plant genomes are rich in duplicated genes. Various mechanisms can lead to duplication of individual genes or longer chromosomal regions. In addition to the duplication of individual chromosomal regions, polyploidization, and subsequent reorganization of the genome, tandem duplication and the generation of dispersed, duplicated genes account for the generation of sequence redundant copies. Within Arabidopsis, the evolutionary history of the modern genome structure has been exhaustively analyzed. Large fractions of the genome are derived from ancient polyploidization events, and as much as 17% of the genome reportedly is composed of tandemly repeated genes (The Arabidopsis Genome Initiative, 2000
Gene duplications are regarded as a major source for the generation of evolutionary novelties (Ohno, 1970
Effects and consequences of mutations within the coding portion of genes have been exhaustively studied (Lynch and Conery, 2000
The retention rate of duplicates within many eukaryotic genomes is significantly higher than expected from predictions by classical models (Prince and Pickett, 2002
Mutable subfunctions of a duplicated pair potentially affect various sites governing the functional characteristics of the respective gene. Besides protein domains, splice sites or cis-regulatory elements can be affected. Evolution of transcriptional regulation as a crucial contributor to evolutionary changes and speciation has long been hypothesized (Britten and Davidson, 1969
In contrast to exhaustive studies on the divergence of duplicated genes and their encoded proteins, little is known about how changes in gene expression affect the evolutionary fates of duplicated pairs (Wray et al., 2003
At least partial redundancy has been reported for several double or triple mutants of homologous genes, like the floral meristem identity genes AP1/CAULIFLOWER/FRUITFUL and the SHATTERPROOF genes which regulate fruit dehiscence (Ferrandiz et al., 2000
Using microarray expression data, studies in yeast have estimated the extent to which changes in gene expression contribute to the evolutionary fates of duplications (Wagner, 2000
In higher eukaryotes (including plants), expression divergence of duplicated genes thus far has only been studied on selected examples, and genome-scale analyses haven't been reported to date. In this study, we investigate the correlation of expression of duplicated gene pairs in Arabidopsis, with an emphasis on the transcriptional characteristics of tandem and segmental duplications. We analyzed the transcriptional fate of duplicated genes by making use of genome-wide expression data derived by massively parallel signature sequencing (MPSS; Brenner et al., 2000b
Identification of Tandemly and Segmentally Duplicated Genes in Arabidopsis
To analyze similarities and dissimilarities in transcriptional characteristics of duplicated genes in Arabidopsis, we selected appropriate duplicated gene groups from the Arabidopsis genome (see "Materials and Methods"). These groups were either within segmental duplications stemming from an ancient polyploidization event that took place approximately 38 to 70 million years ago (Simillion et al., 2002
We selected 2,399 groups of genes comprising 7,425 genes from previously identified segmentally duplicated regions (The Arabidopsis Genome Initiative, 2000
To gain insight into the evolutionary relationship and the duplication age of the selected duplicated genes, the synonymous substitution rate (KS) was calculated for the corresponding gene pairs (Fig. 1). For the selected segmentally duplicated genes, the frequency distribution showed a clear peak for KS values of 0.7 to 0.8 as well as a Gaussian distribution. This confirmed that our filters successfully identified duplicated gene pairs originating from the most recent polyploidization event during the evolution of Arabidopsis (Vision et al., 2000
For tandemly duplicated genes, we computed KS for all pairwise combinations of each group. For groups containing more than two members, this might impose a bias toward an overrepresentation of older pairs. To address this problem, we separately analyzed the distribution of tandem groups consisting of two genes. However, there was no striking difference between the distributions of KS between these datasets (data not shown). In both analyses, the average age of duplicates is significantly lower than that of segmentally duplicated genes. We found a pronounced peak for KS values between 0.3 and 0.4. Thus, the average age of tandem duplications is approximately one-half the age of the segmental duplications.
For expression measurements, we made use of data generated by MPSS (Brenner et al., 2000a
The positions of all MPSS tags matching within the Arabidopsis genome were determined, and these tags were associated with annotated genes from the MAtDB (Schoof et al., 2004
For each pair of tandemly and segmentally duplicated genes with diagnostic and informative tags, the Pearson correlation r of their expression was computed. Application of our quality criteria restricted the analysis to 849 pairwise comparisons for tandemly duplicated and 777 pairwise comparisons for the segmentally duplicated genes. Negative control experiments consisted of gene pairs obtained by randomly shuffling sets of evolutionarily unrelated genes containing appropriate tags. Ten random shuffles were generated, each equal in size to the duplicated datasets. These sets represent the background level of correlated expression expected to be observed by chance. The distribution of the two duplicate classes is significantly different from the random shuffles as tested by the
Tandemly and Segmentally Duplicated Genes Exhibit Significant Similarities within Their Promoters Gene duplication most likely is not restricted to the coding or transcribed portion of genes but also comprises the respective promoters. Our observations on the significant transcriptional similarities between large portions of duplicated gene pairs prompted us to undertake a systematic analysis of promoters associated to duplicated genes.
Therefore, we determined the average similarity of promoters associated with duplicated genes. This analysis was performed by aligning the promoter sequences and comparing these alignments to the expected background, obtained by the analysis of a negative control set. To avoid alignments between 5' untranslated regions (UTRs), which frequently exhibit a high degree of similarity for duplications (G. Haberer and K.F.X. Mayer, unpublished data), we restricted our analysis to promoter pairs for which both genes have associated full-length cDNA information. Alignments of promoter sets of variable lengths were computed using DiAlign2-1 (Morgenstern, 1999
Promoter and Protein Divergence Correlate within Tandem Duplications
To test whether promoter evolution is coupled with coding sequence divergence, we correlated the promoter similarities and the KS. To exclude potential pseudogenes and 5' UTRs from our analysis, we restricted the dataset to pairs of genes with associated full-length cDNA information. We analyzed for potential correlations between the age of a duplicated pair measured as KS and the divergence of the respective promoters as defined by the promoter similarities (Fig. 4). As segmental duplications likely arose from an ancient polyploidization event, a time-dependent correlation can't be expected. This is confirmed by our findings of only a weak positive correlation of low significance (r = 0.095;
In contrast to segmental duplications, divergence times of tandemly repeated genes are distributed over a larger time range. Reflective of this, we found a highly significant, strong negative correlation between protein and promoter divergence (r = 0.45, P < 106) for tandemly repeated gene pairs. These findings underpin that functional selection constraints on coding sequences are more pronounced than on promoter regions and that on average more mutations within coding sequences are eliminated by negative selection than is the case for promoter regions. This is consistent with the specific structural features of cis-elements and promoters as a whole, e.g. the degenerate nature of cis-elements and the only small fraction of nucleotides constituting cis-elements within the entire promoter region.
As shown above, promoters associated with duplicated genes still share significant similarity. In addition, Pearson correlation of tandemly and segmentally duplicated genes revealed gene pairs that showed pronounced similarities in their expression characteristics. To test whether promoter relationships (as measured by the extent of alignable regions) correlate with expression characteristics, we plotted promoter similarity versus the Pearson coefficient r for the respective duplicated gene pairs (Fig. 5). Correlation coefficients are r = 0.17 (P < 0.01) for tandem and r = 0.12 for segmental genes (P < 0.05; Fig. 5). While the probability for segmentally duplicated genes implies only a nonsignificant correlation, we find a weak, yet significant positive correlation for tandemly duplicated pairs. This indicates that for tandemly duplicated genes, similarities in expression characteristics correlate with the similarities detected within the promoter regions, an observation which is not supported for segmentally duplicated genes. This might be attributable to the on average shorter evolutionary distance and point toward an ongoing degeneration process within promoters of tandemly duplicated genes.
Divergence Time and Expression Characteristics Are Not Correlated in Tandemly and Segmentally Duplicated Genes Similarity within promoters of tandemly repeated genes was found to be correlated to both the divergence time as well as to expression similarities. This suggests a continuous divergence of expression characteristics between duplicated genes. To verify this, we plotted KS against the Pearson correlation (Fig. 6). For both datasets, pronounced scattering was observed, and no significant correlation was found for both tandemly (r = 0.09, P < 0.05) as well as segmentally (r = 0.05, P < 0.2) duplicated pairs. Thus, from the divergence time of duplicated genes, conclusions on coherent expression characteristics between duplicated gene pairs cannot be derived.
cis-Regulatory Element Detection by Intragenomic Footprinting cis-Regulatory elements are the major constituents driving gene expression. Although not restricted to these regions, these elements are predominantly located within the 5' upstream sequence, and, consequently, analytical approaches predominantly focus on surveying these regions.
Our analysis demonstrates that duplicated promoters show both significant similarity and significant expression correlation between duplicated genes. To analyze this in more detail, we undertook several case studies, selecting duplicated pairs for which experimentally verified cis-regulatory elements have been reported. We carried out a phylogenetic footprinting analysis for the respective gene pairs. The selected promoters were subjected to an analysis by (1) detecting conserved (e.g. alignable) regions and (2) searching for statistically overrepresented sequence elements by a Gibbs sampling-based method to identify potential cis-regulatory elements (Thijs et al., 2001
We next asked the question of whether the Pearson correlation of transcriptional characteristics influences the performance of the intragenomic phylogenetic footprinting analysis. We selected duplicated gene pairs with high (KIN1 and COR6.6; r = 0.96), moderate (COR15a and COR15b; r = 0.56), and low (SCARECROW-like; r = 0.48) transcriptional correlation and subjected the respective promoters to a phylogenetic footprinting analysis (Fig. 8). Again, duplicated pairs were organized either in tandem arrays (KIN1 and COR6.6; COR15a and COR15b) or within segmental duplications (SCARECROW-like). KIN1 and COR6.6 are both up-regulated by cold treatment and abscisic acid (ABA) application. Promoters of both genes harbor sites that match consensus sequences of known cis-regulatory elements conferring ABA- and cold-responsiveness (CRT; Baker et al., 1994
Duplication of individual genes, large genomic segments, or complete genomes provides a mechanism to introduce functional innovation and novelty into genomes. However, fixation of duplicated loci within the respective genome requires sub- or neofunctionalization of the respective loci. Besides evolution of the protein sequence and change of the biochemical properties of the protein, changes in the transcriptional regulation of duplicated genes have long been hypothesized to play an important role for the fixation of duplicated genes (Britten and Davidson, 1969
Most duplicated gene pairs located within segmental duplications originated from a duplication event at a defined time point during the evolution of the Arabidopsis genome. Consistent with findings by other research groups, the age distribution within the segmentally duplicated genes shows a Gaussian distribution with a pronounced peak at KS in the range of 0.7 to 0.8 (Vision et al., 2000
The tandemly repeated genes we analyzed circumvent a broad range of divergence times. We made use of MPSS data generated from five different libraries representing different plant tissues. The MPSS method measures relative expression values on a genome scale (Brenner et al., 2000a
The analysis of MPSS expression data shows that a significant portion of duplicated genes retained a high degree of highly similar expression characteristics. Correlations against a negative control set are substantially different: 26.3% (20.1%) of the tandem repeats and 19.4% (12.4%) of genes located within segmental duplications revealed a Pearson correlation of above 0.8 (above 0.9). It is noteworthy that the degree of duplicated genes with highly unrelated expression, i.e. negative Pearson coefficients (close to 1.0), is significantly lower than expected from a random distribution (Fig. 2), indicating that highly dissimilar expression patterns are underrepresented between duplicated genes. These observations suggest a significant conservation of expression characteristics between duplicated pairs. However, our findings also indicate that the majority of duplicated genes already experienced a significant divergence in their expression characterization. This is consistent with expectations implied by the DDC model (Force et al., 1999 Our study has been carried out by using MPSS data derived from particular tissues. These data describe only a few dimensions of regulatory complexity and, for example, expression differences at the cellular level or in response to internal and external stimuli are not captured and reflected within the dataset. Thus, any subfunctionalization caused by such cues would necessarily escape detection.
The DDC model proposes partitioning of ancient subfunctions as a major mechanism to retain duplicated genes in a genome. Although not limited to the subfunctionalization of regulatory regions, due to their short sequences and modular organization, cis-regulatory elements are particularly well-suited candidates for mechanisms predicted by the DDC model. Thus, expression divergence potentially is a major constituent driving a degeneration and complementation process. Repeated rounds of this process potentially generate increasingly divergent expression patterns, and these would ultimately lead to neofunctionalization on a transcriptional level. Consequently, sub-, neo-, and nonfunctionalization of duplicated gene pairs are time-dependent processes. The average divergence time or evolutionary distance among tandemly repeated genes is lower compared to duplicate gene pairs residing within segmental duplications. Supportive of a time-dependent increase in expression divergence, the proportion of highly correlated pairs is significantly larger in tandemly repeated genes in comparison to segmental duplications. However, correlation analysis between divergence time KS and expression similarity measured by the Pearson coefficient revealed no correlation for duplicated genes at a significance level
Several observations suggest continuous degeneration within regulatory regions and are consistent to the DDC model. Repeated cycles of degeneration and complementation should lead to an increasing divergence within regulatory regions and consequently to an increase in expression divergence. On the other hand, recent duplicate promoter pairs will still share significant similarity within their regulatory regions. Both aspects are reflected in our analysis results. We analyzed promoters from duplicated genes with respect to the degree of similarity that is still retained. We found significantly higher similarities between promoters from both tandemly and segmentally duplicated gene pairs in comparison to random expectations. This supports the hypothesis that duplicated genes share common regions within their regulatory sequences. In addition, promoter similarities between tandemly duplicated genes showed a highly significant negative correlation against the divergence time of the proteins. This suggests a continuous degeneration within regulatory regions. A very weak positive correlation has been detected between promoter similarities of tandemly duplicated genes and their expression characteristics. These findings are not unexpected as many genes within the dataset exhibit divergence times above KS
In Arabidopsis, partial redundancy and overlapping functionality for duplicated genes (e.g. SHATTERPROOF 1 and 2, AP1/CAULIFLOWER, PHOT1 and 2) have been reported (Ferrandiz et al., 2000
We subjected duplicated gene pairs to a phylogenetic footprinting analysis. Both colinearity criteria as well as heuristic criteria were used for their detection (Morgenstern, 1999 To test the applicability of the approach to duplicated gene pairs with similar transcriptional characteristics (e.g. high Pearson correlation) and duplicated gene pairs with dissimilar transcriptional characteristics (low Pearson coefficient), we selected duplicated gene pairs over a range of Pearson correlation coefficients. For all examples, clear and pronounced signals that depict potential conserved cis-elements have been found. Again for these examples, at least in part, individual cis-elements have already been described. The respective elements proved to be conserved, along with numerous additional conserved cis-elements. High-scoring candidate cis-elements are not restricted to housekeeping genes like histone H4 or Rubisco but are also apparent in duplicated genes responding to specific stimuli like cold-inducible COR15a/b and KIN1/COR6.6.
Phylogenetic footprinting of orthologous genes has been proven to be a powerful tool for the detection of cis-regulatory elements (Wasserman et al., 2000
Computational Methods Computations were performed on an IBM workstation with a 2.4 GHz Pentium 4 processor running on a LINUX operating system. Python scripts (http://www.python.org) were used for all analysis and can be obtained on request. Images for the motif detection were obtained with the Python Imaging Library, the residual figures with the python module PYCHART.
Genome annotation data stored within the Munich Information Center for Protein Sequence (MIPS) Arabidopsis database (MAtDB) were used for the analysis (Schoof et al., 2004
The synonymous (KS) and nonsynonymous substitution (KN) rates for the selected tandem and segmental pairs were computed using the PAML program package (Yang, 1997
We used MPSS data from five different untreated organs or tissues, including untreated silique, callus, leaves, roots, and inflorescence. All plant material was from Arabidopsis, ecotype Col-0. Callus was initiated from seeds grown on media containing 0.5x Murashige and Skoog salts, 3% Suc in presence of 2,4-dichlorophenoxyacetic acid (0.5 mg/L), indole-3-acetic acid (2 mg/L), and kinetin (0.1 mg/L). Floral buds (up to Stage 11/12) were harvested from plants grown under 16 h of light for 5 weeks. Developing siliques (Stage 16/17) were harvested from plants grown under conditions identical to the floral library. For the leaf and root libraries, plants were grown in 16 h of light for 21 d under sterile conditions in vermiculite and perlite. For each library, total RNA was isolated using TRIzol (Invitrogen, Carlsbad, CA). For tissues derived from whole plants, samples were taken approximately 2 h after dark.
MPSS was performed as described by Brenner et al. (2000b)
MPSS tags were anchored to the Arabidopsis genome and mapped to the transcribed part of the genome. In addition to the coding region, 5' and 3' UTRs were included into the assignment. Based on full-length cDNA information for 46.6% (12,314 out of 26,444) of all annotated genes, a 3' UTR was determined. For genes with no or only incomplete cDNA information, a default length of 200 bp has been used. For genes lacking cDNA information for the 5' UTR, the default length used was 100 bp. Tags matching more than one gene were treated as ambiguous, while uniquely occurring tags were regarded as diagnostic. According to technical aspects of the MPSS method (Brenner et al., 2000b
To determine the similarities and dissimilarities between expression characteristics of duplicated genes, only pairs for which both members had an informative, diagnostic tag were considered. The correlation of expression between the respective gene pairs was determined using Pearson's coefficient r:
To compare the results against the background expectation, we analyzed negative control sets consisting of pairwise comparisons of randomly selected expression profiles from all Arabidopsis genes associated with an informative tag. Random shuffles with gene pairs overlapping with the datasets of tandemly and segmentally duplicated genes were excluded. Expression correlations were analyzed in analogous manner for 10 different random sets of equal size to the datasets of tandemly and segmentally duplicated genes.
Significance was tested using the
Pairwise alignments of promoters from tandem and segmental duplications were computed using DiAlign2 (Morgenstern, 1999
As it can't be assumed that the promoter similarities follow a normal distribution, we carried out a nonparametric ranking test (Mann-Whitney-Wilcoxon test) to evaluate differences between means. Ties between ranks were resolved as midranks. For large data sizes (as analyzed in this study), the distribution of the ranks is approximately normally distributed (Bickel and Doksum, 1977
Two-dimensional samples were tested for correlations determining the Spearman rank correlation coefficient. Significance levels for each tested correlations are given in the text. Details about the tests are provided in the supplemental material.
A literature survey identified several tandemly and segmentally duplicated genes for which functional regulatory elements have been reported. Consensus sequences or the respective sites were extracted from these reports, and their positions were determined within the respective promoters. DiAlign2-1 (Morgenstern, 1999
We thank Shin-Han Shiu and Michael Mader for comments and critical reading of the manuscript. Received May 14, 2004; returned for revision July 28, 2004; accepted July 29, 2004.
1 This work was supported by the Deutsche Forschungsgemeinschaft (grant no. MA 2522/11 to K.F.X.M.) and by the National Science Foundation Plant Genome Research Program (DBI0110528 to B.C.M.).
[w] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.104.046466. * Corresponding author; e-mail k.mayer{at}gsf.de; fax 498931873585.
The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815[CrossRef][Medline] Baker SS, Wilhelm KS, Thomashow MF (1994) The 5'-region of Arabidopsis thaliana cor15a has cis-acting elements that confer cold-, drought- and ABA-regulated gene expression. Plant Mol Biol 24: 701713[CrossRef][ISI][Medline] Bickel PJ, Doksum KA (1977) Mathematical Statistics. Holden-Day, Oakland, CA Biesgen C, Weiler EW (1999) Structure and regulation of OPR1 and OPR2, two closely related genes encoding 12-oxophytodienoic acid-10,11-reductases from Arabidopsis thaliana. Planta 208: 155165[CrossRef][ISI][Medline]
Blanc G, Hokamp K, Wolfe KH (2003) A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res 13: 137144 Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433438[CrossRef][Medline] Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, et al (2000b) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol 18: 630634[CrossRef][ISI][Medline]
Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, et al (2000a) In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. Proc Natl Acad Sci USA 97: 16651670
Britten RJ, Davidson EH (1969) Gene regulation for higher cells: a theory. Science 165: 349357 Carroll SB (2000) Endless forms: the evolution of gene regulation and morphological diversity. Cell 101: 577580[CrossRef][ISI][Medline] Chaubet N, Flenet M, Clement B, Brignon P, Gigot C (1996) Identification of cis-elements regulating the expression of an Arabidopsis histone H4 gene. Plant J 10: 425435[CrossRef][Medline]
Doebley J, Lukens L (1998) Transcriptional regulators and the evolution of plant form. Plant Cell 10: 10751082 Donald RG, Cashmore AR (1990) Mutation of either G box or I box sequences profoundly affects expression from the Arabidopsis rbcS-1A promoter. EMBO J 9: 17171726[ISI][Medline] Ferrandiz C, Gu Q, Martienssen R, Yanofsky MF (2000) Redundant regulation of meristem identity and plant architecture by FRUITFULL, APETALA1 and CAULIFLOWER. Development 127: 725734[Abstract]
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151: 15311545 Green PJ, Yong MH, Cuozzo M, Kano-Murakami Y, Silverstein P, Chua NH (1988) Binding site requirements for pea nuclear protein factor GT-1 correlate with sequences required for light-dependent transcriptional activation of the rbcS-3A gene. EMBO J 7: 40354044[ISI][Medline] Gu Z, Nicolae D, Lu HH, Li WH (2002) Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet 18: 609613[CrossRef][ISI][Medline] Gu Z, Steinmetz LM, Gu X, Scharfe C, Davis RW, Li WH (2003) Role of duplicate genes in genetic robustness against null mutations. Nature 421: 6366[CrossRef][Medline]
Guo H, Moose SP (2003) Conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution. Plant Cell 15: 11431158 He Y, Gan S (2001) Identical promoter elements are involved in regulation of the OPR1 gene by senescence and jasmonic acid in Arabidopsis. Plant Mol Biol 47: 595605[CrossRef][ISI][Medline] Kinoshita T, Doi M, Suetsugu N, Kagawa T, Wada M, Shimazaki K (2001) Phot1 and phot2 mediate blue light regulation of stomatal opening. Nature 414: 656660[CrossRef][Medline]
Kofuji R, Sumikawa N, Yamasaki M, Kondo K, Ueda K, Ito M, Hasebe M (2003) Evolution and divergence of the MADS-box gene family based on genome-wide expression analyses. Mol Biol Evol 20: 19631977 Liljegren SJ, Ditta GS, Eshed Y, Savidge B, Bowman JL, Yanofsky MF (2000) SHATTERPROOF MADS-box genes control seed dispersal in Arabidopsis. Nature 404: 766770[CrossRef][Medline] Long M, Thornton K (2001) Gene duplication and evolution. Science 293: 1551
Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 11511155
Makova KD, Li WH (2003) Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res 13: 16381645
Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S (2004) The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res 14: 16411653
Moore RC, Purugganan MD (2003) The early stages of duplicate gene evolution. Proc Natl Acad Sci USA 100: 1568215687
Morgenstern B (1999) DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15: 211218 Ohno S (1970) Evolution by Gene Duplication. Springer-Verlag, New York Ohno S (1971) An argument for the genetic simplicity of man and other mammals. J Hum Evol 1: 651662
Parenicova L, de Folter S, Kieffer M, Horner DS, Favalli C, Busscher J, Cook HE, Ingram RM, Kater MM, Davies B, et al (2003) Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world. Plant Cell 15: 15381551
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85: 24442448 Pinyopich A, Ditta GS, Savidge B, Liljegren SJ, Baumann E, Wisman E, Yanofsky MF (2003) Assessing the redundancy of MADS-box genes during carpel and ovule development. Nature 424: 8588[CrossRef][Medline] Prince VE, Pickett FB (2002) Splitting pairs: the diverging fates of duplicated genes. Nat Rev Genet 3: 827837[CrossRef][ISI][Medline]
Schoof H, Ernst R, Nazarov V, Pfeifer L, Mewes HW, Mayer KF (2004) MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics. Nucleic Acids Res 32 (Database issue): D373D376
Simillion C, Vandepoele K, Van Montagu MC, Zabeau M, Van de Peer Y (2002) The hidden duplication past of Arabidopsis thaliana. Proc Natl Acad Sci USA 99: 1362713632
Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouze P, Moreau Y (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17: 11131122 Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouze P, Moreau Y (2002) A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J Comput Biol 9: 447464[CrossRef][ISI][Medline] Thomashow MF (1999) Plant cold acclimation: freezing tolerance genes and regulatory mechanisms. Annu Rev Plant Physiol Plant Mol Biol 50: 571599[CrossRef][ISI] |