|
|
||||||||
|
First published online December 7, 2007; 10.1104/pp.107.109603 Plant Physiology 146:377-386 (2008) © 2008 American Society of Plant Biologists OPEN ACCESS ARTICLE
AffyTrees: Facilitating Comparative Analysis of Affymetrix Plant Microarray Chips1,[C],[OA]Australian Research Council Centre of Excellence for Integrative Legume Research and Bioinformatics Laboratory, Genomic Interactions Group, Research School of Biological Sciences, Australian National University, Canberra, Australian Capital Territory 2601, Australia (T.F., G.W.); and The Samuel Roberts Noble Foundation, Ardmore, Oklahoma 73401 (V.A.B., M.U.)
Microarrays measure the expression of large numbers of genes simultaneously and can be used to delve into interaction networks involving many genes at a time. However, it is often difficult to decide to what extent knowledge about the expression of genes gleaned in one model organism can be transferred to other species. This can be examined either by measuring the expression of genes of interest under comparable experimental conditions in other species, or by gathering the necessary data from comparable microarray experiments. However, it is essential to know which genes to compare between the organisms. To facilitate comparison of expression data across different species, we have implemented a Web-based software tool that provides information about sequence orthologs across a range of Affymetrix microarray chips. AffyTrees provides a quick and easy way of assigning which probe sets on different Affymetrix chips measure the expression of orthologous genes. Even in cases where gene or genome duplications have complicated the assignment, groups of comparable probe sets can be identified. The phylogenetic trees provide a resource that can be used to improve sequence annotation and detect biases in the sequence complement of Affymetrix chips. Being able to identify sequence orthologs and recognize biases in the sequence complement of chips is necessary for reliable cross-species microarray comparison. As the amount of work required to generate a single phylogeny in a nonautomated manner is considerable, AffyTrees can greatly reduce the workload for scientists interested in large-scale cross-species comparisons.
Microarray experiments have made it possible to rapidly quantify the expression of large numbers of genes for a given experimental condition. The rapidity and ease of use of this technology has enabled research into complex aspects of growth and development involving multiple genes at a time. However, it remains difficult to extend findings from one organism to another, as it is often not known which of the spots on different microarray chips measure the expression of comparable (i.e. orthologous) genes.
The basic idea of using model organisms is that the knowledge gained from studying such an organism will, to a large extent, be transferable to other species. Taking the regulatory feedback loop controlling branching in Arabidopsis (Arabidopsis thaliana) as an example, validating analyses needed to be performed in a range of other species to determine to what extent this mechanism was conserved and how far the knowledge gained in Arabidopsis could be applied to other plants (Johnson et al., 2006
Approaches to validate such regulatory networks range from crudely determining whether the necessary genes might be present in another genome and then assuming the complete network of gene interaction to be conserved, to quantifying the expression of the corresponding genes under comparable experimental conditions and verifying that the genes actually do behave in a similar manner. The former is a crude but quick, cheap, and easy approach, while the latter is more refined, but work intensive, expensive, and complicated. Data-mining available microarray data may provide an intermediate solution to the problem. Microarray data repositories such as the Gene Expression Omnibus (Edgar et al., 2002
Regardless of the approach used, it is necessary to know which genes can be compared between organisms. In many cases, available gene annotation or best BLAST (Altschul et al., 1997
A number of tools and databases exist that attempt to determine which genes are orthologous and therefore comparable across organisms (e.g. COG [Tatusov et al., 1997
Our Web-based software tool provides a quick and easy way of assessing the orthology of protein-coding genes for a variety of plant microarray chips, irrespective of whether the genome of the organism is completed or not. We focused on Affymetrix chips, as the overwhelming majority of microarray data present in public repositories is based on these (Gene Expression Omnibus; Edgar et al., 2002
The National Center for Biotechnology Information (NCBI) nonredundant protein database "nr" and 6-frame translations of the plant microarray chip consensus sequences provided by Affymetrix provide the set of sequences on which we base our predictions. The 6-frame translations of the consensus sequences provide information as to what proteins are represented on the various microarray chips. The "nr" database contains a wide variety of species suitable as outgroups for the phylogenies and provides sequences that may have failed to be included on the microarray chips of the various organisms. The latter are of special importance, as they provide critical data when attempting to assess whether two sequences are orthologous or paralogous (Fig. 1 ).
PhyloGenie is used to automatically search for sequence homologs and infer phylogenetic trees for all consensus sequences on a chip. This tool was originally developed to generate and analyze phylomes in regards to gene duplications and lateral gene transfers and can be briefly described as follows. Each microarray consensus sequence is compared against the above-mentioned databases using BLAST. The result of these sequence similarity searches is used to identify potential sequence homologs. BLAST high-scoring segment pairs (HSPs) with greater than 70% coverage of the query and E values better than 1e-5 are extracted and aligned to one another. These parameters were chosen lax enough to detect nontrivial sequence similarities yet stringent enough to exclude high-scoring local similarities that would, by themselves, not warrant the assignment of two sequences as being orthologous. The resulting alignment contains the sequence regions we regard as homologous to the query. Hmmer (http://hmmer.janelia.org/) is used to derive an HMM from this alignment and search the full-length sequences of all BLAST-HSPs with E values better than 1. Deriving an HMM from the above alignment gives a better representation of the sequence family. Using this HMM to search against full-length sequences of even marginal BLAST hits allows detection of more of the distant sequence homologs and better defines the start and end of homologous sequence regions than a single BLAST search could. Sequence regions matching the full-length HMM with E values better than 1e-5 are combined to a multiple sequence alignment. A phylogenetic tree with 100 bootstrap replicates is inferred from this alignment. Due to limited computational resources, we use neighbor-joining (Saitou and Nei, 1987 The set of trees generated by PhyloGenie provides the basis of our prediction of sequence orthologs. The actual prediction requires a number of user-specified parameters and is performed on-the-fly, allowing for a high degree of flexibility. Detection of sequence orthologs is based on the number of nodes separating the query sequence, i.e. the sequence for which a tree was derived, from sequences of any given species in the tree. In the following examples, we assume that the user selected the Arabidopsis ATH1-121501 chip and was attempting to find sequence orthologs in Medicago truncatula. Determining sequence orthologs is done in the following manner (Fig. 2 ). The number of nodes separating each M. truncatula sequence (yellow) from the query (purple) is determined (minimum no. 4, SD 2.87). An additional scaling factor (default, 0.5) allows the user to specify the range in which he is willing to accept M. truncatula sequences as potential sequence orthologs. Increasing this value causes the program to take into account more distant sequence relatives as potential orthologs, while decreasing this value causes the program to focus on the most closely related sequences only. In the presented analysis, we used a value of 0.5, as this allowed us to determine orthologs for most of the chip sequences while not causing too many of the query sequences to be assigned multiple orthologs in the other species. The distance within which sequences are accepted as potential sequence orthologs is referred to as the permissive range in this manuscript. The permissive range is calculated as the minimal number of nodes separating the query sequence from a M. truncatula homolog in the tree plus the SD multiplied by the scaling factor. The SD reflects the dispersal pattern of M. truncatula sequences throughout the tree. The more clades in a tree containing M. truncatula sequences, the greater the uncertainty about which of these clades contains sequences orthologous to the query. We therefore use the SD of the number of nodes separating M. truncatula sequences from the query as a measure for how uncertain we are that the sequences closest to each other, in number of nodes, really are the sequence orthologs. For the tree shown in Figure 2, the permissive range is highlighted in green and encompasses all sequences less than six nodes removed from the query. Affymetrix Arabidopsis ATH1-121501 sequences less than six nodes removed from the query are regarded as sequence paralogs to the query (260439_at). M. truncatula sequences within the permissive range are regarded as potential sequence orthologs (Mtr.28509.1.S1_at, Mtr.17370.1.S1_at, and Mtr.21922.1.S1_at).
For each of the potential orthologs, we subsequently perform a reverse lookup. We calculate the minimum and SD of the number of nodes separating each potential ortholog from the Affymetrix Arabidopsis ATH1-121501 sequences present in the tree. As the minimum and SD are greatly influenced by the position in the tree of the sequence for which the values are being calculated, the permissive ranges of the potential orthologs may be quite different from one another. A red and blue line show the permissive ranges for two of our three potential orthologs. The query sequence does not lie within the permissive range of Mtr.21922.1.S1_at (blue line). This sequence is therefore removed from the set of potential orthologs, as it appears much more closely related to the Affymetrix Arabidopsis sequence "257728_at" than to the query. Mtr.28509.1.S1_at (red line) and Mtr.17370.1.S1_at (not shown) recover the query sequence in their permissive ranges, and both are retained as sequence orthologs to the query. Analysis of this tree therefore tells us that our query sequence "245641_at" has a sequence paralog (260439_at) on the Affymetrix Arabidopsis ATH1-121501 chip and two sequence orthologs (or co-orthologs) on the Affymetrix M. truncatula chip. The aim of this tool is two-fold: it offers a fully automated way of retrieving sequence orthologs for microarray consensus sequences from a wide variety of species and provides the results of a BLAST search, multiple sequence alignment, and phylogenetic inference for every consensus sequence on a chip. This allows manual validation of any dubious orthology predictions by comparing the various intermediate results leading to the phylogeny against the corresponding phylogenetic trees and alignments. In addition, the large number of alignments generated in the process of constructing the phylogenies are a useful resource on which to base further analyses, as they provide sets of aligned sequence homologs for every consensus sequence on a chip.
The user interface has five Web pages. The home page allows querying of individual genes and links to the remaining pages, some help, and supplemental data. The other four pages of the interface deal with batch requests, analysis of chip phylomes, generation of phylogenies for sequences provided by the user, and prediction of sequence orthologs between the consensus sequences represented on a chip and other species. The results of an individual query are shown in Figure 3 . Tabs at the top of the page allow navigation between the results of a BLAST search (BLAST), alignment of HSPs (CLN), the derived HMM (HMM), results of the HMM search (HMS), alignment of high-scoring HMM hits (HLN), and either a textual or applet-based representation of a Neighbor-Joining tree (TRE). The tabs allow the user to retrace every step leading from query sequence to phylogeny and are very useful to gain a better understanding of why two genes were regarded as homologous, included in the same tree, or predicted to be sequence orthologs. To facilitate interpretation of batch requests and complete phylome analyses, intermediate pages can be generated that gather the results, order them, and link to the results pages of the various genes. Prediction of sequence orthologs between microarray chip consensus sequences and a species of choice generates a tab-delimited list containing information about which sequences on the chip could be assigned sequence orthologs in another species, which sequences should be regarded as co-orthologous or paralogous, and which other homologous sequences were present in the phylogenies but could not be assigned a more precise relationship.
Supplemental data, providing further information about the programs used, the individual steps performed to generate the data, as well as the parameters the user can tweak, are available at http://bioinfoserver.rsbs.anu.edu.au/utils/affytrees/help.php. Results of phylome analyses, custom phylogenetic trees, and orthology predictions are stored for a week and can be accessed by referring to the job identifier provided in the results. This tool differs from other databases and programs in a number of ways. It provides the data on which tree inference and orthology prediction is based and thereby allows the user to retrace each step of the decision process. Our trees include sequences from the "nr" database that greatly facilitate correct rooting and interpretation. In addition, this allows us to potentially detect sequence orthologs for any species represented in "nr" instead of being limited to those species for which complete genomes or proteomes are available. The use of a user-defined "scaling factor" avoids problems co-orthologous genes cause for approaches relying solely on reciprocal best hits between genomes. If, for example, a species has a gene of interest, gene A, that was duplicated in another species, giving rise to genes B and B', reciprocal best hit approaches may identify genes A and B or A and B' as reciprocal best hits and assign them as sequence orthologs. However, if A appears most similar to B but B' appears most similar to A, a possible scenario if nonsymmetric scoring schemes such as employed by BLAST are used, then no reciprocal best hits can be determined and no sequence orthologs are assigned. All of the above cases produce an incorrect assignment of gene orthology, as B and B' are co-orthologous to A (i.e. duplicates derived from a gene that was orthologous to A) and should be treated as such. Another part of this tool allows the user to search through the trees of a given species or chip for those corresponding to specific topological selection criteria. For example, to find all trees in which a clade contains at least one M. truncatula and Arabidopsis sequence, but no sequences from the Arabidopsis ATH1-121501 chip, the selection string "((Medicago truncatula & Arabidopsis) & !Arabidopsis ATH1-121501)" could be used. Trees containing such clades could identify sequences present in M. truncatula, the orthologs of which cannot be measured using the Affymetrix Arabidopsis ATH1-121501 chip, as no sequence orthologs are present on that chip. As an example of such a case (Fig. 4 ), we show a tree derived for a hypothetical protein from M. truncatula, the ortholog of which was not included on the ATH1-121501 chip, even though orthologous sequences are present in the Arabidopsis genome as well as throughout the plant, fungal, and animal kingdoms.
Future developments include, as a first step, extending this tool beyond the currently available seven chips to include all publicly available Affymetrix plant microarray chips. Because this system is not limited as to what species can be analyzed, provided some sequence information for the species is available, it is conceivable that the system may be extended to cover all available Affymetrix microarray chips. Beyond that, the aim will be to develop and implement methods that further facilitate comparative analysis of microarray expression data across species.
To determine whether the AffyTrees orthology predictions were comparable to, less, or more accurate than reciprocal best BLAST hits, the most widely used method to identify sequence orthologs, we compared the orthology predictions generated by both methods. Phylogenetically orthologous sequences are generally expected to fulfill the same function in different species, and functionally orthologous sequences are expected to be similarly expressed across different species. Therefore, phylogenetic orthologs can be expected to show a certain degree of similarity in their expression across species. We based our comparison on prediction of sequence orthologs between the Arabidopsis ATH1-121501 and M. truncatula Affymetrix chips. These species were chosen specifically, because sets of comparable microarray experiments were available and provided us with the opportunity to test whether and how well sequence orthology, as predicted by reciprocal best BLAST hits and AffyTrees, was reflected in similarity of expression. The results of comparing the orthology predictions for these two microarray chips are shown in Figure 5A . BLAST produced many more reciprocal best hits (7,025) than AffyTrees predicted orthologs (5,793). Of these, 2,926 predictions of sequence orthologs coincided, 4,099 orthology predictions were unique to the reciprocal best BLAST hits, and, 2867 orthology predictions were unique to AffyTrees. Even though BLAST produced nearly 30% more orthology predictions, fewer individual sequences were assigned an ortholog in BLAST than in AffyTrees. This was due to many of the BLAST hits having multiple ortholog assignments. On average, each M. truncatula chip sequence was assigned 1.78 Arabidopsis chip sequences as reciprocal best BLAST hits, and every Arabidopsis chip sequence was assigned 1.57 M. truncatula chip sequences. This artificially inflated the number of "orthology" predictions provided by BLAST. Dividing the number of reciprocal best BLAST hits by the amount of multiple predictions for each species gives us the number of individual genes for each species that could be assigned at least one ortholog in the other species: the exclusively BLAST-based predictions assigned 2,303 sequences from Medicago one or more orthologs in Arabidopsis, and 2,611 sequences in Arabidopsis could be assigned one or more orthologs in Medicago. The exclusively AffyTrees-based predictions assigned 2,515 Medicago sequences orthologs in Arabidopsis and 2,537 Arabidopsis sequences orthologs in Medicago, 138 more sequences than assigned by reciprocal best BLAST hits.
To determine which of the methods provided a more accurate orthology prediction, we compared the expression of predicted sequence orthologs in two sets of microarray experiments, one for Arabidopsis (Schmid et al., 2005 Accepting the 2,926 orthology assignments both BLAST and AffyTrees agreed upon as "true" orthologs, we used the Pearson (linear) correlation coefficient of the expression values to measure the coexpression of all predicted ortholog pairs. The histogram in Figure 5B shows the number of predicted ortholog pairs for a given correlation coefficient as well as a fitted scaled extreme value distribution (EVD; Fig. 5B). Most of the predicted ortholog pairs produced positive correlation coefficients, supporting our expectation that sequence orthologs, in general, should show similar expression across different organisms. In addition, the graph provides us with a means of testing the accuracy of reciprocal best BLAST hits and AffyTrees orthology predictions as seen in Figure 5C. Rather than comparing histograms directly, we approximated the histograms by a distribution with a small number of parameters to facilitate comparison of multiple datasets. The EVD approximates the various histograms depicted in Figure 5 quite well. The more accurate the set of orthologs predicted by each method, the better the corresponding fitted EVD should approximate the EVD derived from our set of 2,926 true orthologs. We then compared the sets of genes for which sequence orthologs could only be predicted by either BLAST or AffyTrees. Whenever one gene was assigned multiple sequence orthologs, we averaged their correlation coefficients to reflect that the method generating the prediction could not decide in more detail which of the predicted orthologs should be used. A total of 4,914 genes were assigned sequence orthologs only in reciprocal BLAST hits and 5,052 genes were assigned sequence orthologs only in AffyTrees. The graphs of the histograms and fitted EVDs for these sets of genes are shown in Figure 5C. Both BLAST and AffyTrees were able to predict orthologs for similar numbers of genes; however, the maximum of the BLAST-EVD lies at 0.47, while the maximum of the AffyTrees-EVD lies at 0.66. The EVD based on the AffyTrees predictions also better approximates the EVD based on the set of true orthologs. Taking the median of the correlation coefficients as the comparison metric leads to similar results (Fig. 5, B–D). Bootstrap sampling of the BLAST and AffyTrees distributions (10,000 samples, 1,000 replicates) showed the median values of the distributions to be very resilient to change. The probability of generating a randomly sampled distribution with the median value observed in the other method was, in both cases, quite unlikely (BLAST, 2.1–36; AffyTrees, 6.2–26). Both the median values of the distributions as well as the maximum of the fitted EVDs show that the histogram of the AffyTrees predictions (blue) is more similar to the histogram of the true orthologs (green) than the histogram of the best BLAST-based predictions (yellow) is to the true orthologs. This points to the AffyTrees predictions being more reliable than the predictions based on best BLAST hits.
However, it was recently shown that GCRMA (Wu et al., 2004 In an attempt to determine why the BLAST-based prediction fared poorly, we examined how various modes of orthology assignment influence the fitted EVD. We show the histograms and fitted EVD for two further datasets (Fig. 5D). The first set was generated by randomly pairing sequences from within our set of true orthologs (black) and the second by accepting all sequence homologs present in the AffyTrees phylogenies as sequence orthologs (pink). These phylogenies provide a large number of groupings of homologous sequences. We know a large number of the trees to contain paralogous sequences, and misassigning sequence paralogs as orthologs is one of the key difficulties in accurately detecting sequence orthologs. The graph shows that an EVD fitted to the random orthology assignments (black) has its maximum close to zero. Indiscriminately assigning all sequence homologs present in a tree as sequence orthologs generates many more orthology predictions, as visible by the increased amplitude of the EVD. However, the maximum of the fitted EVD is close to 0.5, well below the 0.68 maximum we determined for the EVD of the set of true orthologs (green). We therefore expect the maximum of EVDs fitted to various methods of orthology assignment, for this dataset, to lie within 0 and 0.7. The closer the maximum lies to 0.7 or above, the better the prediction method is likely to be. Not differentiating between orthology and homology, thereby causing too many sequences to be assigned as sequence orthologs, shifts the maximum of the fitted EVD to around 0.5. BLAST-based predictions more frequently assigned multiple sequence orthologs to genes than the AffyTrees predictions. This might explain why the maximum of the BLAST-EVD lies at 0.47. The best BLAST approach, while quite suited to detecting sequence homologs, therefore does not appear very accurate when used to distinguish between sequence orthologs and other homologs. The AffyTrees method, in contrast, appears far better at reliably determining orthologous sequences.
AffyTrees provides a repository of phylogenetic trees inferred from every consensus sequence represented on a variety of Affymetrix plant microarray chips. This repository can be used to gain insights into the relationship of sequence homologs, improve annotation data, or automatically generate a list of sequence orthologs between a species and the consensus sequences represented on a specific microarray chip. The inclusion of sequences from the "nr" database and our method of detecting sequence orthologs circumvent the problems reciprocal best hit approaches have when dealing with co-orthologous genes. For sequences represented on Affymetrix plant microarray chips, AffyTrees can identify sequence orthologs present on other Affymetrix plant microarray chips, as well as sequence orthologs present in the "nr" database. The ability to filter chip phylomes for specific selection criteria allows discrepancies or systematic biases between the sequence complements of chips and the corresponding genomes to be detected. Affymetrix chips were designed to measure the transcription of genes and therefore are biased toward highly expressed and protein-coding genes. This is a known and useful bias of these chips. However, other biases, for example, systematic preference for long or short sequences, differences in the EST libraries on which the chips were based, or differences in the ability to successfully predict short genes in different species, will have affected which sequences were included on a chip and thereby influence the results. We provide a means of comparing the sequence complement of microarray chips to the publicly available sequence data of the corresponding organism as well as to the microarrays of other species. Robust ways of assessing sequence orthologs and knowledge about systematic differences in the sequence complement of various chips are prerequisites to making cross-species analyses of microarray expression data feasible. Without knowledge of the sequence orthologs present on other microarray chips, there is no way of determining which probe sets are comparable across chips. Similarly, without a way of estimating sequence biases or genes missing on a chip, the conclusions drawn from the presence or absence of groups of genes derived from expression data are likely to be flawed. We show, to the extent that the limitations of the available experimental data permitted, that the majority of genes predicted to be orthologous show a similar expression across the two examined species. We also show that AffyTrees is able to assign sequence orthologs to more genes than a comparable approach relying on reciprocal best BLAST hits and, by comparing the expression of predicted sequence orthologs, that the AffyTrees orthologs appear more reliable than the BLAST-based predictions. AffyTrees provides prediction of sequence orthologs for a wide variety of species at greater accuracy than reciprocal best BLAST hits. Combined with the available phylogenetic trees, sequence alignments, and additional utilities, AffyTrees should provide a useful resource for comparative analyses of transcriptomes and proteomes.
The sequences we based our sequence-similarity searches on originated from either the "nr" database, downloaded from NCBI (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz), or from 6-frame translations of exemplar sequences for a variety of Affymetrix chips. The nucleotide exemplar sequences were downloaded, after registration, from the Affymetrix Web site by following the links to the various species (http://www.affymetrix.com/support/technical/byproduct.affx?cat=exparrays). BLAST searches were performed against the NCBI nonredundant protein database "nr" and 6-frame translation of consensus sequences for the Affymetrix microarray chips ATH1-121501, AtGenome1, Barley1, Citrus, Cotton, Grape, Maize, Medicago, Poplar, Rice, Soybean, Sugar Cane, Tomato, and Wheat. The BLAST results for sequences represented on the Arabidopsis (Arabidopsis thaliana) ATH1-121501 and Medicago truncatula chips were retrieved via the AffyTrees Web interface. Putative sequence orthologs between M. truncatula and Arabidopsis sequences were predicted as described above (scaling factor = 0.5) based on the phylogenies provided by AffyTrees. To keep the results as comparable as possible, the same cutoffs used to generate the phylogenies (i.e. >70% coverage of the query and E values better than 1e-5) were used as a lower limit for analysis of the reciprocal best BLAST hits. BLAST hits that did not satisfy these cutoffs were not taken into account. In cases where multiple BLAST hits had identical best E values, all of these best hits were taken into account. This made it possible for some genes to be assigned multiple reciprocal best BLAST hits. The method of orthology prediction we describe allows genes in one species to be assigned multiple orthologs in another. In such cases, all of the predicted sequence orthologs were taken into account. A noticeable discrepancy was apparent in the number of predicted sequence orthologs compared to the number of reciprocal best BLAST hits. To keep both approaches of detecting sequence orthologs as comparable as possible, we compared reciprocal AffyTrees orthologs to the reciprocal best BLAST hits. This allowed both methods to use "reciprocality" as a further criterion to reduce the number of false positive orthology predictions.
For each plant species, the Affymetrix CEL files of the experiments we wanted to compare were normalized using both GCRMA (Wu et al., 2004
The tool is freely accessible at http://bioinfoserver.rsbs.anu.edu.au/utils/affytrees/. Further information and help is available at http://bioinfoserver.rsbs.anu.edu.au/utils/affytrees/help.php. Javascript should be enabled in the browser and a Java1.5 or above browser plugin should be installed for visualization of phylogenetic trees. Received September 23, 2007; accepted December 3, 2007; published December 7, 2007.
1 This work was supported by the Australian Research Council Centre of Excellence. Funding to pay for the publication charges was provided by the same grant. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Georg Weiller (georg.weiller{at}anu.edu.au).
[C] Some figures in this article are displayed in color online but in black and white in the print edition.
[OA] Open Access articles can be viewed online without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.107.109603 * Corresponding author; e-mail georg.weiller{at}anu.edu.au.
Alexeyenko A, Tamas I, Liu G, Sonnhammer EL (2006) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22: e9–e15 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 Chiu JC, Lee EK, Egan MG, Sarkar IN, Coruzzi GM, DeSalle R (2006) OrthologID: automation of genome-scale ortholog identification within a parsimony framework. Bioinformatics 22: 699–707 Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30: 207–210 Frickey T, Lupas AN (2004) PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res 32: 5231–5238 Horan K, Lauricha J, Bailey-Serres J, Raikhel N, Girke T (2005) Genome cluster database. A sequence family analysis platform for Arabidopsis and rice. Plant Physiol 138: 47–54 Hubbell E, Liu WM, Mei R (2002) Robust estimators for expression analysis. Bioinformatics 18: 1585–1592 Johnson X, Brcich T, Dun EA, Goussot M, Haurogne K, Beveridge CA, Rameau C (2006) Branching genes are conserved across species. Genes controlling a novel signal in pea are coregulated by other long-distance signals. Plant Physiol 142: 1014–1026 Koski LB, Golding GB (2001) The closest BLAST hit is often not the nearest neighbor. J Mol Evol 52: 540–542[ISI][Medline] Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13: 2178–2189 Lim WK, Wang K, Lefebvre C, Califano A (2007) Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks. Bioinformatics 23: 282–288[CrossRef] O'Brien KP, Remm M, Sonnhammer EL (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 33: D476–480 Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425[Abstract] Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Scholkopf B, Weigel D, Lohmann JU (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 5: 501–506 Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278: 631–637 Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4: 41[CrossRef][Medline] Wu Z, Irizarry RA, Gentleman R, Murillo FM, Spencer F (2004) A Model Based Background Adjustment for Oligonucleotide Expression Arrays. Technical Report. Department of Biostatistics Working Papers. John Hopkins University, Baltimore, MD This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY | THE PLANT CELL | |
|---|---|---|---|