Plant Physiology 144:562-574 (2007)
© 2007 American Society of Plant Biologists
Update on Legume Seed Development
Using Genomics to Study Legume Seed Development1
Brandon H. Le2,
Javier A. Wagmaister2,
Tomokazu Kawashima2,
Anhthu Q. Bui,
John J. Harada and
Robert B. Goldberg*
Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, California 90095 (B.H.L., J.A.W., T.K., A.Q.B., R.B.G.); and Section of Plant Biology, Division of Biological Sciences, University of California, Davis, California 95616 (J.J.H.)
Seeds are essential for flowering plant reproduction because they protect, nourish, and contain the developing embryo that represents the next sporophytic generation. In addition, seeds contain energy resources that sustain the young sporophyte during germination before photosynthesis begins. In legumes, food reserves stored in embryonic cotyledons make seeds important as a food source for both human and animal consumption. For example, soybean (Glycine max) is now one of the most important seed crops in the world (Wilcox, 2004 ). Research on legume seed development has led to direct applications, such as seeds with more nutrients (Kinney, 1998 ; Wang et al., 2003 ; Krishnan, 2005 ), reduced allergens (Herman et al., 2003 ), and novel constituents, such as edible vaccines (Moravec et al., 2007 ). In the current genomic era, it is now possible to begin to understand what genes are required to make a legume seed and how regulatory networks are interconnected in legume genomes to program seed formation. In the future, this information should permit novel approaches to breed and engineer legume seeds with new agronomic traits and, most importantly, help provide a sustainable food supply for a growing human population. This Update outlines how our laboratories have been using legumes and functional genomics to identify genes that program legume seed development.
Seed development is triggered by a novel double-fertilization process that leads to the differentiation of the embryo, endosperm, and seed coat, which are the major compartments of the seed (Fig. 1, AC
; Goldberg et al., 1994 ; Miller et al., 1999 ; Gehring et al., 2004 ; Laux et al., 2004 ; Moise et al., 2005 ). These compartments have different origins and play distinct roles in seed formation. The maternally derived seed coat differentiates from the ovule integuments that surround the embryo sac and plays a major role in protecting the embryo and transferring nutrients from the maternal plant to the developing embryo (Fig. 1, A and C; Murray, 1987 ; Borisjuk et al., 2004 ; Moise et al., 2005 ). By contrast, the embryo and endosperm are direct descendents of the fertilized egg and central cell, respectively. The endosperm proliferates to occupy most of the postfertilization embryo sac and nourishes the embryo early in development (Gehring et al., 2004 ). In many flowering plants, such as legumes, the endosperm is absorbed by the embryo during development and is not present in the mature seed (Fig. 1, AC; Goldberg et al., 1994 ). After fertilization, the zygote divides asymmetrically, giving rise to a small apical cell that develops into the embryo proper and a large basal cell that forms the suspensor. The suspensor is a terminally differentiated structure that supports and nourishes the embryo proper and degenerates later in development (Yeung and Meinke, 1993 ). The embryo proper, on the other hand, represents the new sporophytic generation and contains the shoot and root meristems that are responsible for generating organ systems of the mature plant after seed germination (Fig. 1C; Goldberg et al., 1994 ; Laux et al., 2004 ).

View larger version (60K):
[in this window]
[in a new window]
[as a PowerPoint slide]
|
Figure 1. Soybean seed development. A, Cartoon depicting soybean life cycle. B, Schematic representation of soybean seed development. Embryo morphologies and developmental events were adapted and modified from Goldberg et al. (1989) . C, Paraffin transverse 10-µm sections of soybean globular, heart, cotyledon, and early maturation seeds. Inset contains a magnified view (40x) of the seed coat. Axis longitudinal section was obtained from an early maturation seed. D, Major unanswered questions in seed development. a, Axis; al, aleurone; c, cotyledon; cu, cuticle; ep, embryo proper; es, endosperm; hg, hourglass cells; ii, inner integument; oi, outer integument; pa, palisade layer; pl, plumule; py, parenchyma; rm, root meristem; s, suspensor; sc, seed coat; sm, shoot meristem; v, vascular tissues; vb, vascular bundle.
|
|
 |
MAJOR QUESTIONS REMAIN UNANSWERED IN SEED DEVELOPMENT
|
|---|
Many developmental and physiological events occur within each seed compartment during development (Fig. 1B) and are programmed, in part, by the activity of different genes (Goldberg et al., 1989 , 1994 ; Stangeland et al., 2003 ; Gehring et al., 2004 ; Haughn and Chaudhury, 2005 ). Seed development, therefore, is the result of a mosaic of distinct gene expression programs occurring in parallel in different seed compartments (e.g. embryo, endosperm, seed coat) as well as within specific regions and tissues (e.g. embryo proper, suspensor, epidermis). What these programs are and how they are integrated into unique regulatory networks within the plant genome remain major unanswered questions (Fig. 1D). Specifically, it is not yet known what genes in different seed compartments play important roles in cell fate specification, differentiation, and morphogenesis during early seed and embryo development. Molecular identification and characterization of these genes will help identify regulatory networks that program and coordinate the development of each seed compartment. In addition, it is not known what the functions are of many genes that are expressed in different seed compartments. Identifying the function of compartment-specific genes should provide new insight into their roles in seed development. At present, new genomic resources allow seed biologists to use global gene expression profiling and comparative genomics to answer many questions that only a short time ago seemed out of reach. These questions, and others (Fig. 1D), are challenging the field of seed biology, and their answers should provide new insights into the process of seed development and lead to improved seeds for human and animal consumption.
 |
LEGUMES ARE AN EXCELLENT MODEL SYSTEM TO STUDY SEED DEVELOPMENT
|
|---|
Legumes represent one of the largest and most diverse families of flowering plants, with approximately 20,000 species classified (Doyle and Luckow, 2003 ). There are three subfamilies in legumes and the largest, Papilionoideae, contains most of the model species in which different aspects of plant biology have been studied. The most common legume models are peanut (Arachis hypogaea), Lotus (Lotus japonicus), Medicago (Medicago truncatula), soybean (Glycine max), scarlet runner bean (SRB; Phaseolus coccineus), common bean (Phaseolus vulgaris), pea (Pisum sativum), and broad bean (Vicia faba). The latter five species have been used historically to study seed and embryo development (Goldberg et al., 1989 ; Johnson et al., 1994 ; Coste et al., 2001 ; Weterings et al., 2001 ; Weber et al., 2005 ).
Several features make legumes an excellent model system to study seed and embryo development. For example, many legumes, such as soybean and peanut, are food crops of major economic importance. The mature seeds of these legumes are rich in proteins, carbohydrates, and oils, and accumulate to high nutritional value. These stored seed food reserves make legumes, such as soybean, the second most important crop for human nutrition and animal feed (Rubel et al., 1972 ; Duranti and Gius, 1997 ; Graham and Vance, 2003 ). One advantage of using crop models to study seed biology is to be able to modify traits of agronomic importance, such as improved seed nutritional composition, reduced allergen levels, or increased seed number and size (Kinney, 1998 ; Herman et al., 2003 ; Wang et al., 2003 ; Gupta et al., 2006 ). In addition, legume seed biology has been studied for more than 150 years using descriptive, physiological, biochemical, molecular, and genetic approaches (see below). These studies have provided a solid intellectual framework for using legume models to study and dissect seed development in our current genomic era. The recent development of genomic tools, such as genome sequences, ESTs, oligonucleotide and cDNA microarrays, and comprehensive databases, such as the Legume Information System (http://www.comparative-legumes.org), make legumes an excellent model to study seed development at a global scale (VandenBosch and Stacey, 2003 ; Gepts et al., 2005 ; Gonzales et al., 2005 ). These genomic tools allow comparative genomic analyses in closely related species (Zhu et al., 2005 ) and should facilitate the identification and investigation of genes important for seed development.
One of the most fascinating characteristics of legumes is that collectively they produce a large range of seed sizes (Fig. 2A
). For example, some legume seeds are giants and are excellent models for developmental studies, particularly during early stages of seed development. The large size of SRB globular-stage seed and embryo allows manipulation and isolation of embryonic regions, such as the embryo proper and suspensor, using hand-dissection techniques (Walbot et al., 1972 ; Sussex et al., 1973 ; Weterings et al., 2001 ). Due to their size, large quantities of cells and tissues from these SRB embryonic regions can be obtained, facilitating molecular and biochemical studies. Manipulation of seeds and embryos at early stages of development is difficult in other model plant species with smaller seeds, such as Arabidopsis (Arabidopsis thaliana), making many legumes particularly useful to study early developmental seed biology.
A second novel feature of legumes is that their embryos show a wide range of morphological forms (Fig. 2B). For example, two closely related species, soybean and SRB, have morphologically distinct suspensors. The soybean suspensor is small, consisting of a few cells, whereas the SRB suspensor is much larger and contains several hundred cells (Fig. 2B). The variety in size and shape of legume seeds and embryos makes them excellent models for comparative morphological studies using a functional genomics approach. This strategy can lead to a better understanding of the function, evolution, and diversity of legume seeds and their corresponding compartments.
 |
LEGUMES HAVE BEEN USED TO STUDY SEED DEVELOPMENT FOR MORE THAN 150 YEARS
|
|---|
Historically, legumes have been used to address important questions of seed and embryo development. In fact, early work with legumes contributed to the development of major ideas in biology. For example, during the early 1800s, Matthias Schleiden used several legumes, including Medicago and Vicia, to investigate the endosperm and describe the process of seed development (Schleiden and Vogel, 1838 , 1842 ). These studies contributed to his role in establishing the cell theory. In the mid-1800s, Gregor Mendel used peas to study the inheritance of phenotypic variation, including seed color and shape, leading to his Laws of Inheritance and the establishment of modern-day genetics (Mendel, 1865 ).
From the late 1800s to the middle of the 1900s, legumes were used to describe the processes of seed and embryo development, including the cellular events that occur before and after fertilization, early embryo cell cleavages, and endosperm differentiation. For example, Guignard's compendium of more than 40 legume species described the rich diversity of legume embryo and suspensor morphologies (Fig. 2B; Guignard, 1882 ). These studies, and others, contributed to our overall understanding of seed and embryo development at the descriptive level (Martin, 1914 ; Brown, 1917 ; Cooper, 1938 ).
Studies on legume seed formation transitioned from descriptive anatomy to experiments at the molecular, biochemical, and physiological levels during the 1970s (Dure, 1975 ), although ultrastructural and histochemical studies of legume seed development continued (Johansson and Walles, 1993 ; Nishizawa et al., 1994 ; Duval et al., 1995 ). Work in many legumes provided some of the earliest measurements of RNA, DNA, carbohydrates, lipids, and protein levels in seeds (Rubel et al., 1972 ; Clutter et al., 1974 ; Hill and Breidenbach, 1974 ; Davies, 1976 ; Pattee et al., 1981 ; Singh et al., 1981 ; Adams et al., 1982 ; Dhillon and Miksche, 1983 ). These studies provided new insights into the processes by which food reserves accumulate and are stored in seeds, as well as demonstrating that genome endoreduplication processes occur in specific seed compartments (e.g. cotyledon, suspensor). In addition, legumes such as SRB were used to dissect and manipulate the embryo proper and suspensor experimentally to study the role of growth hormones (e.g. GA) in early embryo development (Sussex et al., 1973 ; Cionini et al., 1976 ; Alpi et al., 1979 ).
During this same period, our laboratory used RNA-excess DNA-RNA hybridization experiments to show that approximately 14,000 to 18,000 diverse mRNAs are present in soybean embryos at different developmental stages (Goldberg et al., 1981b , 1989 ). We also demonstrated that most diverse mRNA species are present throughout seed development, but that small numbers of mRNAs, including those encoding storage proteins, are regulated quantitatively at specific developmental stages (Goldberg et al., 1981a ). Seed protein genes were chosen as models to investigate gene regulation during legume seed development because they encode superprevalent mRNAs that could be easily identified and isolated and because of their importance as a food source for human and animal consumption.
Research on genes active in legume seed development exploded during the late 1970s and 1980s when it became possible to clone and study individual mRNAs and genes and reinsert them into plants using newly developed transformation techniques (Bevan et al., 1983 ; Estrella-Herrera et al., 1983 ; Fraley et al., 1983 ). In addition, Murai et al. (1983) demonstrated that the common bean phaseolin seed storage protein gene could be transferred to sunflower (Helianthus annuus) cells and expressed. This sunbean experiment showed that gene cloning and Agrobacterium tumefaciens transformation techniques could be combined to transfer foreign genes into plant cells and study their function. Research in several laboratories with legume seed protein genes, such as -conglycinin, glycinin, Kunitz trypsin inhibitor, and lectin, showed that their mRNA accumulation patterns are regulated temporally and spatially (Goldberg et al., 1981a , 1983 ; Meinke et al., 1981 ; Rerie et al., 1992 ) and controlled by both transcriptional and posttranscriptional processes (Evans et al., 1984 ; Beach et al., 1985 ; Chappell and Chrispeels, 1986 ; Walling et al., 1986 ). Subsequent work determined that cis-regulatory sequences flanking legume seed protein genes could confer embryo-specific expression patterns in heterologous plants, such as tobacco (Nicotiana tabacum) and petunia (Petunia hybrida; Chen et al., 1986 ; Okamuro et al., 1986 ; Jofuku et al., 1987 ; Higgins et al., 1988 ; Naito et al., 1988 ; Baumlein et al., 1992 ; Wohlfarth et al., 1998 ; Chandrasekharan et al., 2003 ). This work provided insights into the mechanisms controlling gene activity during seed development and showed that the cis-regulatory sequences and transcription factors controlling legume seed protein gene expression are highly conserved in other plant species.
At present, the remarkable development of new genomic resources makes it possible to study legume gene expression during seed and embryo development at a global level. Currently, Medicago, Lotus, and soybean cDNA and oligonucleotide microarrays are available (Endo et al., 2002 ; Vodkin et al., 2004 ; Firnhaber et al., 2005 ). An increasing collection of legume seed transcriptome and proteomic data (Thibaud-Nissen et al., 2003 ; Firnhaber et al., 2005 ; Hajduch et al., 2005 ; Buitink et al., 2006 ; Dhaubhadel et al., 2007 ), cDNA library collections, and public EST databases (Journet et al., 2002 ; Shoemaker et al., 2002 ; Asamizu et al., 2004 ; Firnhaber et al., 2005 ; Ramirez et al., 2005 ) have helped to provide a global view of gene activity at specific seed developmental stages. Sequences of the Medicago, Lotus, soybean, common bean, and peanut genomes (Broughton et al., 2003 ; Gepts et al., 2005 ; Young et al., 2005 ; Jackson et al., 2006 ) should provide an invaluable resource for identifying and characterizing genes that play critical roles during seed and embryo development in the near future.
 |
A NOVEL STRATEGY TO DISSECT REGULATORY NETWORKS PROGRAMMING EARLY SEED AND EMBRYO DEVELOPMENT
|
|---|
Our laboratory has developed a genomics strategy to begin to identify the regulatory networks that program legume seed and embryo development (Fig. 3
). One aspect of this strategy is to use the giant SRB embryo, pioneered by Sussex and colleagues (Walbot et al., 1972 ; Sussex et al., 1973 ), as an entry point to dissect the molecular events that occur during early embryogenesis (Fig. 4
). More recently, we incorporated soybean into our strategy (Fig. 5
) because it contrasts with SRB in terms of early embryo morphology (Fig. 2B) and because our laboratory has worked on soybean embryo development for more than 25 years (Goldberg et al., 1989 ). Recent development of soybean and SRB genomic resources allows both of these legumes to be used as excellent systems to identify genes that play important roles in the differentiation of unique seed and embryo compartments and to build compartment-specific regulatory networks that are crucial for seed formation (Figs. 46
).

View larger version (75K):
[in this window]
[in a new window]
[as a PowerPoint slide]
|
Figure 4. Using SRB as a genomics engine to uncover genes active early in embryogenesis. A, Model for the specification of the embryo proper and suspensor, adapted and modified from Weterings et al. (2001) . ac, Apical cell; bc, basal cell; ep, embryo proper; s, suspensor. B, SRB plant with a pod indicated by the arrow. C, Hand-dissected SRB globular-stage embryos before and after separating the embryo proper and suspensor. D, Functional category distribution of SRB suspensor ESTs. E, Real-time qRT-PCR validation of PcL1L mRNA accumulation pattern. Inset, In situ hybridization of SRB globular-stage embryo mRNA using a PcL1L antisense probe. In situ data were taken from Kwong et al. (2003) . F, PcL1L overexpression in transgenic Arabidopsis seedlings. G, SRB globular-stage seed paraffin sections before and after capturing the embryo proper and suspensor by LCM. H, Cross-species hybridization of SRB embryo-proper and suspensor mRNA captured in G with an Affymetrix Soybean GeneChip containing 37,593 soybean probe sets. Only SRB sequences with a high similarity to soybean EST sequences on the GeneChip will hybridize. Therefore, the proportion of heterologous SRB suspensor and embryo-proper transcripts detected on the GeneChip is lower than that detected for homologous soybean RNAs (Fig. 5C). Venn diagram of transcripts detected in the suspensor and embryo proper is shown. Numbers in parentheses refer to the number of transcription factor gene transcripts detected. Data are available at http://estdb.biology.ucla.edu/seed.
|
|

View larger version (102K):
[in this window]
[in a new window]
[as a PowerPoint slide]
|
Figure 5. Using LCM and transcriptional profiling to identify genes required to make a soybean seed. A, Globular-stage soybean seed showing the approximate number of transcripts detected in the entire seed using the Affymetrix Soybean GeneChip (Whole-Mount Seed). B and C, Globular-stage soybean seed paraffin sections before (B) and after (C) capturing the highlighted seed compartments by LCM. The approximate total number of diverse transcripts detected collectively from LCM seed compartments is shown (LCM Seed) in addition to the total number of transcripts detected in the suspensor (arrow). The number in parentheses refers to suspensor transcripts not detected in other seed compartments at the level of the GeneChip (i.e. suspensor-specific transcripts). Raw data were deposited in the Gene Expression Omnibus (GEO) as data series GSE6414 (http://www.ncbi.nlm.nih.gov/geo) and can also be accessed at http://estdb.biology.ucla.edu/seed. D, Functional category distribution of soybean suspensor transcripts detected by GeneChip analysis. E, Unsupervised hierarchical clustering of the top 2,000 most varying transcripts detected in all globular-stage seed compartments using dChip version 1.3 (Li and Wong, 2001 ). F, Supervised cluster analysis of suspensor developmentally regulated transcripts. G, Real-time qRT-PCR validation of suspensor-specific transcripts. Values represent the mean and SD of the threshold cycle (Ct) for two biological replicates with two technical replicates each. Ct values were adjusted to an 18S rRNA internal control. One Ct cycle represents a 2-fold difference in RNA prevalence. Lower Ct values indicate higher RNA levels. ent, Endothelium; ep, embryo proper; epd, epidermis; es, endosperm; hi, hilum; ii, inner integument; oi, outer integument; s, suspensor.
|
|

View larger version (74K):
[in this window]
[in a new window]
[as a PowerPoint slide]
|
Figure 6. Identifying DNA sequences important for suspensor transcription. A to D, In situ hybridization of SRB globular-stage embryos using antisense probes from G564 (A), C541 (B), PCS1511 (C), and PCEP3567 (D) cDNAs. G564 and C541 in situ hybridization data were taken from Weterings et al. (2001) . E and F, GUS enzymatic activity in transgenic tobacco (E) and Arabidopsis (F) embryos carrying a G564::GUS chimeric gene. G, G564 5'-deletion and gain-of-function analyses in transgenic tobacco plants. The number indicates position relative to the transcription start site (+1). Yellow blocks indicate approximately 150-bp tandem duplications. Red arrows indicate a 10-bp motif (GAAAAGC/TGAA) identified to be conserved in the upstream sequences of G564 and C541 (Weterings et al., 2001 ). Deletion data were taken from Weterings et al. (2001) . The 913 to 764 gain-of-function construct was made by fusing this G564 upstream region with a cauliflower mosaic virus 35S::GUS chimeric gene and transforming tobacco plants according to Koltunow et al. (1990) . H, Legume phylogenetic tree, including SRB, soybean, Lotus, and Medicago. Soybean and Lotus/Medicago diverged from SRB approximately 19 and 54 million years ago, respectively (Lavin et al., 2005 ). I, Conserved regions among legume G564 genes. Red lines indicate closely related sequences displayed by FamilyRelationsII (20-bp window size, 75% similarity or greater; Brown et al., 2005 ). Blue arrows indicate motifs identified by MEME (20-bp window size; Bailey and Elkan, 1994 ) to be significantly enriched in the G564 upstream regions. ep, Embryo proper; Gm, G. max; Lj, L. japonicus; Mt, M. truncatula; MYA, million years ago; Pc, P. coccineus; s, suspensor.
|
|
 |
USING SRB AS A GENOMICS ENGINE TO DISSECT EARLY EMBRYOGENESIS
|
|---|
Our laboratory has utilized SRB as a model system (Weterings et al., 2001 ) to identify genes and regulatory networks programming early embryo developmental events using a genomics approach (Figs. 3 and 4). We have focused on the question of how the embryo-proper and suspensor regions are specified from the apical and basal cells of a two-cell embryo, respectively (Fig. 4A; Weterings et al., 2001 ). SRB is unique in this regard because of its giant seed and large embryo (Fig. 2A), permitting hand dissection of embryo-proper and suspensor regions at early developmental stages (Fig. 4, B and C; Walbot et al., 1972 ; Sussex et al., 1973 ; Weterings et al., 2001 ). We prepared cDNA libraries from hand-dissected embryo-proper and suspensor regions of SRB globular-stage embryos (Fig. 3; Weterings et al., 2001 ) and ESTs were sequenced to determine what genes are active in the embryo proper and suspensor. Our SRB early embryo EST database is available at both http://estdb.biology.ucla.edu/PcEST and GenBank (accession nos. CA896559CA916678). Figure 4D shows the distribution of >16,000 SRB suspensor ESTs grouped by functional categories, illustrating the large diversity of genes that are active in a terminally differentiated suspensor. Surprisingly, more than 300 suspensor transcription factor ESTs were identified that are distributed into a variety of transcription factor gene families (A.Q. Bui, B.H. Le, and R.B. Goldberg, unpublished data), providing a glimpse into the spectrum of regulatory genes active in one region of a legume embryo shortly after fertilization. What roles these transcription factors play in suspensor differentiation and function remain to be determined.
As part of our strategy, we carried out real-time quantitative reverse transcription (qRT)-PCR and in situ hybridization experiments (Fig. 3) to quantify and localize the accumulation of SRB mRNAs during early embryo development. One advantage of using giant SRB embryos is that qRT-PCR can be used to quantitate mRNA levels in different regions of a single embryo. For example, PcL1L mRNA uncovered in our EST studies is localized throughout the globular-stage SRB embryo (Fig. 4E, inset; Kwong et al., 2003 ). PcL1L is a relative of the Arabidopsis LEAFY COTYLEDON1 (LEC1) CAAT-box-binding transcription factor gene that is a critical regulator of embryo development (Lotan et al., 1998 ). Real-time qRT-PCR studies on regions dissected from a single SRB embryo at the globular stage show similar PcL1L mRNA prevalences in the embryo-proper and suspensor regions (Fig. 4E). Remarkably, Arabidopsis plants transformed with a full-length PcL1L cDNA under the control of the cauliflower mosaic virus 35S gene promoter (R.W. Kwong, J. Pelletier, and J.J. Harada, unpublished data) formed ectopic embryo-like structures on seedlings (Fig. 4F), a phenotype similar to that obtained with a 35S::AtLEC1 chimeric gene (Lotan et al., 1998 ). These results suggest that PcL1L is an important regulator of SRB embryo development, illustrating the power of the SRB system as a gene discovery engine to uncover regulatory genes that play essential roles in seed development.
 |
USING SOYBEAN TO IDENTIFY GENES REQUIRED TO MAKE A SEED
|
|---|
Even though the giant SRB embryo is a novel system for studying the specification of embryo regions early in development, this legume does have some limitations for global studies of seed formation. For example, although we have produced an EST dataset for SRB globular-stage embryo regions (http://estdb.biology.ucla.edu/PcEST), few other genomic resources are available. In addition, transformation procedures have not yet been developed for SRB. Finally, because SRB is not a major food crop, it is unlikely that a genome project will be carried out to sequence the SRB genome. To complement our use of SRB as a gene discovery engine and overcome these deficiencies, we decided to go back to the future (Goldberg et al., 1989 ) and use soybean to dissect genes important for making a seed. At the present time, a large number of genomic resources have been developed for soybean, including microarrays, EST databases, and genome sequences (Shoemaker et al., 2002 ; Vodkin et al., 2004 ; Jackson et al., 2006 ). In addition, transformation procedures are well established for soybean, making it feasible to address questions of gene function (Ko et al., 2006 ; Olhoft et al., 2006 ). Finally, the unique morphological differences between soybean and SRB embryos (e.g. suspensor size and shape) should permit structure-function questions of legume embryo diversity to be studied (Fig. 2B).
Although we have been able to take advantage of the giant SRB embryo to dissect by hand embryo-proper and suspensor regions (Figs. 3 and 4C), this approach is time consuming and not practical with legumes that have smaller embryos, such as soybean (Fig. 2B). One way to overcome the limitations of hand dissection and to be able to isolate different regions from any legume seed and embryo regardless of size (Fig. 2) is to make use of spectacular progress in laser capture microdissection (LCM) technology (Day et al., 2005 ; Nelson et al., 2006 ). LCM technology makes it possible to study gene activity in the entire seed because any seed compartment, region, or tissue can be isolated easily throughout development (Fig. 5). LCM technology has been used successfully in rice (Oryza sativa), maize (Zea mays), tobacco, soybean, and Arabidopsis in which a variety of plant tissues and cell types have been isolated and studied, including those in an early Arabidopsis embryo (Asano et al., 2002 ; Kerk et al., 2003 ; Nakazono et al., 2003 ; Casson et al., 2005 ; Klink et al., 2005 ; Sanders et al., 2005 ; Spencer et al., 2007 ).
We have been using LCM with soybean seeds to identify all the genes required to make a seed (Figs. 3 and 5). In combination with GeneChip technology, we can investigate the global gene activity profiles in different compartments of the entire seed. For example, we used LCM to isolate the endosperm, suspensor, embryo proper, endothelium, inner integument, outer integument, epidermis, and hilum from a globular-stage soybean seed (Fig. 5, B and C). We hybridized RNAs isolated from each of these seed regions, as well as from intact globular-stage seeds (Fig. 5, AC), with soybean Affymetrix GeneChips (J.A. Wagmaister, X. Wang, A.Q. Bui, B.H. Le, and R.B. Goldberg, unpublished data). We then compared the spectrum of diverse transcripts present in the entire globular-stage soybean seed (Fig. 5A) to those obtained with the eight laser-captured seed regions (Fig. 5, B and C). These data are available at http://estdb.biology.ucla.edu/seed as part of our National Science Foundation (NSF) Plant Genome Research Project.
Approximately 20,000 diverse transcripts were found to be present in the whole-mount globular-stage soybean seed (Fig. 5A), a value close to that which we obtained more than a quarter of a century ago using Rot curve hybridization technology (Goldberg et al., 1981b , 1989 ). A smaller number of diverse mRNAs were found to be present in individual soybean globular-stage seed regions. For example, approximately 14,000 diverse transcripts were detected in the suspensor, including those that encode about 700 transcription factors (Fig. 5C). These values are comparable to those obtained with each of the other globular-stage seed regions (i.e. 14,00017,000 diverse transcripts and 600800 transcription factor mRNAs; J.A. Wagmaister, X. Wang, A.Q. Bui, and R.B. Goldberg, unpublished data). Soybean suspensor transcripts are distributed into functional categories (Fig. 5D) similar to what we observed from the analysis of giant SRB suspensor ESTs (Fig. 4D). These findings indicate that there is a large diversity of biological functions in the small soybean suspensors (Fig. 5D) and that there does not appear to be any apparent differences in the functional groupings of soybean and SRB suspensor mRNAs, despite large differences in size and morphology (Fig. 2B).
We estimated independently that there are approximately 22,000 diverse transcripts present in a soybean globular-stage seed (Fig. 5, B and C) by taking the union of each individual seed mRNA set captured by LCM (Fig. 5C) in close agreement with the result obtained with intact globular-stage seeds (Fig. 5A). These data indicate that (1) at least 20,000 to 22,000 diverse mRNAs are required to make a globular-stage soybean seed; (2) the majority of the diverse mRNAs present in each globular-stage seed region are shared with other regions; and (3) there are small sets of seed region-specific mRNAs.
We used hierarchical clustering to determine whether sets of mRNAs that are shared between different soybean globular-stage seed regions are coregulated at a quantitative level (Fig. 5, E and F). Our analyses identified groups of shared coregulated mRNAs that accumulate at a higher level in a particular seed region (Fig. 5E). For example, approximately 65 suspensor transcripts accumulate at a 2-fold or higher level in the suspensor compared with other seed regions (Fig. 5F).
Finally, comparison of diverse mRNAs present in each region of a soybean globular-stage seed identified small sets of region-specific transcripts. For example, 74 mRNAs were detected in the suspensor that are undetectable in other seed regions at the level of the GeneChip (Fig. 5C; J.A. Wagmaister, X. Wang, A.Q. Bui, B.H. Le, and R.B. Goldberg, unpublished data). Real-time qRT-PCR experiments validated the GeneChip specificity of two of these suspensor transcriptsan osmotin-like 34 mRNA and a mRNA encoding a member of the NAM transcription factor family (Fig. 5G; C. Cheng, J.A. Wagmaister, A.Q. Bui, and R.B. Goldberg, unpublished data). Taken together, this example illustrates that it is possible to combine LCM and GeneChip technologies to profile the spectrum of mRNAs that are present in any legume seed compartment and region throughout development (Fig. 3). The challenge will be to identify which mRNAs play a critical role in the differentiation of each seed region and how their corresponding genes are organized into regulatory networks in the soybean genome (Fig. 3).
 |
USING COMPARATIVE GENOMICS TO IDENTIFY SEED mRNAS IN DIVERSE LEGUME SPECIES
|
|---|
Legumes exhibit a wide range of diversity in seed size and embryo morphology (Fig. 2), providing an outstanding opportunity to use LCM and functional genomics to compare the mRNA sets present in the different legume seeds. For example, it should be possible to isolate and compare the RNA sets present in legume suspensors that vary greatly in size and form (e.g. SRB, Medicago, Lotus, and soybean; Fig. 2B) and address the question of what role, if any, variation in suspensor size plays in legume embryo development. Although it is unlikely that GeneChips will be constructed containing diverse mRNAs for each legume species, it is possible to take advantage of the close relationship between legumes at the DNA and RNA levels to use cross-species hybridization approaches to compare gene activity within the seed regions of any legume. Cross-species hybridization approaches have been applied successfully in plants, animals, and fungi where species have diverged from a common ancestor more than 75 million years ago, such as pig and human (Chismar et al., 2002 ; Moody et al., 2002 ; Adjaye et al., 2004 ; Becher et al., 2004 ; Ji et al., 2004 ; Wang et al., 2004 ; Chalmers et al., 2005 ; Nowrousian et al., 2005 ). This evolutionary distance is greater than that separating SRB, Medicago, Lotus, and soybean, which have been shown to diverge from a common ancestor approximately 54 million years ago (Fig. 6; Lavin et al., 2005 ).
To test this approach, we carried out cross-species hybridization using laser-captured SRB suspensors and embryo-proper RNAs hybridized with soybean GeneChips (Fig. 4, G and H). Our results indicated that most diverse SRB embryo mRNAs that are conserved enough to be detected by the soybean GeneChip are shared by SRB embryo-proper and suspensor regions (Fig. 4H). By contrast, small sets of mRNAs, including those encoding transcription factors, are specific to each region of the globular-stage SRB embryo at the level of the GeneChip (Fig. 4H). The results from these LCM cross-species hybridization studies complement those generated by sequencing hand-dissected SRB embryo-proper and suspensor ESTs (Figs. 3 and 4D). In addition, they identified 1,000 new embryo proper- and suspensor-specific mRNAs, including those encoding approximately 60 transcription factors that might play key roles in embryo region specification during early embryogenesis (Fig. 4A). Our data indicate that cross-species hybridization using mRNAs from diverse legumes can be successful in identifying genes active during seed development at a global level. Coupling LCM with cross-species hybridization using available legume GeneChips and microarrays (e.g. soybean) should provide an entry point for identifying genes that play important roles in the development of model legume seeds, such as soybean, Medicago, and Lotus, as well as in those of nonmodel legumes where few genomic resources are available (Fig. 2B).
 |
IDENTIFYING REGULATORY NETWORKS REQUIRED TO PROGRAM A LEGUME SEED
|
|---|
Using the genomics strategy that we developed to study the early stages of legume seed and embryo development (Fig. 3), we identified genes, including those encoding transcription factors, that are active specifically in the embryo proper and suspensor of SRB and soybean globular-stage embryos (Figs. 4, D and H, and 5, D and G). We also identified genes that are active specifically in other compartments of the seed (e.g. endosperm, integuments, hilum; Fig. 5C). What DNA sequences and transcription factors regulate compartment-specific genes within a seed and how compartment-specific genes are organized into regulatory networks within a plant genome are important questions of seed biology (Fig. 1D).
As a first step to uncover regulatory networks that operate within legume seeds, we used in situ hybridization to identify mRNAs from our EST database that accumulate specifically in the SRB suspensor (Figs. 3 and 4). For example, G564, C541, PCS1511, and PCEP3567 mRNAs accumulate at a high level in the suspensor of SRB globular-stage embryos (Fig. 6, AD; Weterings et al., 2001 ; A.Q. Bui, Y. Bi, and R.B. Goldberg, unpublished data). G564 and C541 mRNAs encode proteins with unknown functions (Weterings et al., 2001 ). By contrast, PCS1511 and PCEP3567 mRNAs encode GA 3 -hydroxylase and a homeodomain transcription factor related to WOX9 (Haecker et al., 2004 ), respectively. Other mRNAs, such as those encoding additional enzymes in the GA biosynthetic pathway (e.g. ent-kaurene synthase, ent-kaurene oxidase, GA 20-oxidase), show a similar accumulation pattern in SRB globular-stage embryos (A.Q. Bui, Y. Bi, and R.B. Goldberg, unpublished data), suggesting that their corresponding genes might be organized into a suspensor regulatory network. To begin dissecting suspensor regulatory networks, our laboratory analyzed in detail the G564 gene 5'-upstream region (Fig. 3; Weterings et al., 2001 ).
We showed that approximately 4.2 kb of the G564 upstream region activates transcription in the suspensor of transgenic tobacco globular-stage embryos, demonstrating that suspensor-specific expression is controlled primarily at the transcriptional level (Fig. 6E; Weterings et al., 2001 ). In addition, the G564 upstream region also activates suspensor transcription in transgenic Arabidopsis embryos (Fig. 6F; X. Wang, T. Kawashima, and R.B. Goldberg, unpublished data), suggesting that the machinery regulating suspensor-specific transcription is conserved in flowering plants. G564 5'-deletion and gain-of-function analyses identified regions important for suspensor transcription (Fig. 6G; Weterings et al., 2001 ). The G564 upstream region possesses five approximately 150-bp tandem duplications (Fig. 6G). Each duplication is capable of activating suspensor transcription, indicating that cis-regulatory sequences within each duplicated fragment are sufficient to direct transcription in the suspensor (Weterings et al., 2001 ; T. Kawashima, Y. Bi, and R.B. Goldberg, unpublished data). Computational analysis uncovered a conserved 10-bp sequence (GAAAAGC/TGAA) in the upstream regions of the SRB G564 and C541 genes (Fig. 6G, red arrows; Weterings et al., 2001 ), suggesting that this motif might play an important role in regulating suspensor-specific transcription during early embryogenesis. If so, the 10-bp motif might be conserved in the upstream regions of other SRB suspensor-specific genes and their orthologs in closely related legumes.
The spectacular increase in legume genome sequences enables comparative approaches to be used to identify conserved cis-regulatory sequences among related legume species. For example, we uncovered G564 orthologs in soybean, Lotus, and Medicago (Fig. 6, H and I; T. Kawashima and R.B. Goldberg, unpublished data). Soybean separated from SRB approximately 19 million years ago and from Lotus and Medicago approximately 54 million years ago (Fig. 6H; Lavin et al., 2005 ). Results obtained from two different computational analyses uncovered short conserved regions between the 5'-upstream DNA sequences of G564 genes in SRB, Lotus, and Medicago (Fig. 6I). The first approach used FamilyRelationsII to identify blocks of similar DNA sequences shared between G564 upstream regions (Fig. 6I, red lines; Brown et al., 2005 ). In addition, this program showed that G564 structure is conserved (two exons and one intron) in these three legumes (Fig. 6I), suggesting strongly that the G564 genes are orthologous. Blocks of similar DNA sequences in the upstream regions of orthologous genes have been shown to contain cis-regulatory sequences (Yuh et al., 2002 ). The closely related sequence blocks found in the legume G564 upstream regions might also contain cis-regulatory sequences important for suspensor-specific transcription. We also used Multiple Em for Motif Elicitation (MEME; Bailey and Elkan, 1994 ) to identify sequences significantly enriched in the G564 upstream regions (Fig. 6I, blue arrows; T. Kawashima and R.B. Goldberg, unpublished data). Significantly, regions identified by MEME in SRB include the conserved 10-bp motif sequence (Weterings et al., 2001 ). Whether the 10-bp motif is an important suspensor cis-regulatory sequence and what trans-acting factors regulate transcription in the suspensor remain to be determined.
One or more of the suspensor-specific transcription factors that we identified using EST and LCM-GeneChip analyses (Figs. 4 and 5) might interact with the conserved 10-bp motif and other cis-regulatory sequences to control transcription in the suspensor of SRB and other legumes. Similarly, transcription factors specific to other seed compartments (Fig. 5) might play an important role in controlling transcription in different parts of the seed. To date, the molecular mechanisms by which region-specific transcription factors are interconnected to form seed regulatory networks remain unknown. Studying the function of region-specific transcription factors is essential for understanding the importance of these proteins in seed development and for uncovering downstream target genes to construct seed gene regulatory networks (Fig. 3).
Advances in soybean transformation procedures (Ko et al., 2006 ; Olhoft et al., 2006 ) have made it possible to use loss-of-function and gain-of-function strategies to study gene function directly in soybean (Fig. 3). Although T-DNA is used commonly to generate loss-of-function alleles in Arabidopsis (Alonso et al., 2003 ), this technique might not be appropriate in soybean for several reasons. The soybean genome is 8 times larger than that of Arabidopsis (Arumuganathan and Earle, 1991 ) and contains a majority of repetitive sequences (Goldberg, 1978 ), requiring a large effort to generate T-DNA insertions at a saturation level. Because soybean transformation procedures are not as efficient as the seed transformation technique used in Arabidopsis (Clough and Bent, 1998 ), it would be challenging to produce large numbers of independent transgenic lines. In addition, soybean is a polyploid (Shoemaker et al., 1996 ; Hymowitz, 2004 ) and the presence of homeologous genes may complicate the interpretation of knockout results due to gene redundancy. A more productive approach is to utilize RNA interference (RNAi) knockdown strategies to study gene functions that have proven useful in a variety of eukaryotes, including soybean (Fig. 3; Subramanian et al., 2005 ; Amore and Davidson, 2006 ; Nunes et al., 2006 ). For example, Herman et al. (2003) used RNAi in soybean to eliminate allergenic proteins from soybean seeds. The advantage of RNAi is that it can be used to target specific genes and has the potential to knock down sets of closely related genes (Miki et al., 2005 ; Kaur et al., 2006 ). This approach, and an analogous one using chimeric repressors to knock down related genes (Fig. 3, CRES-T; Hiratsu et al., 2003 ), should be feasible for studying the functions of transcription factors identified by LCM and GeneChip analysis in different compartments of a soybean seed, including those that are redundant in the soybean genome (Fig. 3). The sequence of the soybean genome (Jackson et al., 2006 ) combined with RNAi studies should make it possible to identify downstream target genes that are regulated by region-specific transcription factors at the global level, facilitating identification of seed and embryo regulatory networks (Fig. 3).
 |
FUTURE PERSPECTIVES
|
|---|
The study of legume seed development has become exciting due to the availability of new genomic resources and sophisticated techniques, such as LCM and RNA profiling using GeneChip arrays. We have identified genes that are unique to a particular seed compartment and that are coregulated within the context of the soybean globular-stage seed (Fig. 5, C, E, and F). The completion of the soybean genome sequence (Jackson et al., 2006 ) should allow us to identify conserved motifs among the upstream regions of these unique and coregulated genes, thus facilitating the identification of compartment-specific cis-regulatory sequences that connect seed genes into regulatory networks. In addition to the soybean genome sequence, other legume genome sequences will be available soon (Broughton et al., 2003 ; Gepts et al., 2005 ; Young et al., 2005 ). Genome sequences from diverse legume species will provide an invaluable resource for comparative analysis to identify conserved cis-regulatory sequences that connect seed genes into regulatory networks, an approach that has been successful in other eukaryotes, such as the sea urchin (Yuh et al., 2002 ; Bolouri and Davidson, 2003 ). Comparative analysis between legume genome sequences, combined with the GeneChip and EST data obtained from seed and embryo compartments of SRB and soybean (Figs. 4 and 5) and other plants (e.g. Arabidopsis; Casson et al., 2005 ), should facilitate the discovery of genes essential for seed and embryo development, including those important for specific legume traits, such as seed size and embryo morphology (Fig. 2). In addition, comparative analysis of legume genomes with other nonlegume genomes, such as Arabidopsis, rice, and poplar (Populus spp.), will advance the discovery of genes important for seed development in flowering plants (Graham et al., 2004 ; Zhu et al., 2005 ). Finally, once a soybean whole-genome GeneChip becomes available, the entire mRNA profiles of all seed and embryo compartments can be determined, completing the identification of seed- and embryo-specific genes. Taken together, remarkable advances in genomic resources will allow us to answer questions regarding seed and embryo development (Fig. 1D) that were not possible only a few years ago. It is now becoming realistic in this genomic era to understand what genes and regulatory networks are required to make a legume seed.
Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers CA896559 to CA916678 (ESTs) and AF325187 (G564).
 |
ACKNOWLEDGMENTS
|
|---|
We are grateful to all the members of our laboratory, past and present, who have helped to establish soybean and SRB as powerful systems to investigate seed development. We particularly acknowledge Dr. Koen Weterings, Dr. Yuping Bi, and Dr. Xingjun Wang for contributing to many of the experiments summarized in this Update. In addition, we thank Ms. Chen Cheng for carrying out the real-time qRT-PCR experiment presented in Figure 5G. We would also like to acknowledge Ian Sussex, Roger Beachy, Tim Hall, Maarten Chrispeels, Niels Nielsen, Lila Vodkin, Don Boulter, T.J. Higgins, Klaus Muntz, and Uli Wobus whose laboratories helped to provide a foundation for understanding gene activity during legume seed development.
Received March 28, 2007;
accepted April 18, 2007;
published June 6, 2007.
 |
FOOTNOTES
|
|---|
1 This work was supported by the National Science Foundation Plant Genome Program (grant no. DBI0501720), the Department of Energy (grant no. DEFG0397ER20263), and Ceres Inc. T.K. is a recipient of a Nakajima Foundation predoctoral fellowship. 
2 These authors contributed equally to the article. 
The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Robert B. Goldberg (bobg{at}ucla.edu).
www.plantphysiol.org/cgi/doi/10.1104/pp.107.100362
* Corresponding author; e-mail bobg{at}ucla.edu; fax 3108258201.
 |
LITERATURE CITED
|
|---|
Adams CA, Norby SW, Rinne RW (1982) Protein modification and utilization of starch in soybean (Glycine max (L.) Merr.) seed saturation. J Exp Bot 33: 279287[Abstract/Free Full Text]Adjaye J, Herwig R, Herrmann D, Wruck W, Benkahla A, Brink TC, Nowak M, Carnwath JW, Hultschig C, Niemann H, et al (2004) Cross-species hybridisation of human and bovine orthologous genes on high density cDNA microarrays. BMC Genomics 5: 83[CrossRef][Medline] Alonso JM, Stepanova AN, Leisse TJ, Kim CJ, Chen HM, Shinn P, Stevenson DK, Zimmerman J, Barajas P, Cheuk R, et al (2003) Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301: 653657[Abstract/Free Full Text] Alpi A, Lorenzi R, Cionini PG, Bennici A, Damato F (1979) Identification of gibberellin A1 in the embryo suspensor of Phaseolus coccineus. Planta 147: 225228[CrossRef][ISI] Amore G, Davidson EH (2006) cis-Regulatory control of cyclophilin, a member of the ETS-DRI skeletogenic gene battery in the sea urchin embryo. Dev Biol 293: 555564[CrossRef][ISI][Medline]
|