Science Journals — AAAS

Transcript

1 | SCIENCE ADVANCES RESEARCH ARTICLE AGRICULTURE Copyright © 2019 The Authors, some rights reserved; Pervasive hybridizations in the history exclusive licensee American Association of wheat relatives for the Advancement of Science. No claim to † 3 † 1,2 4,5 6,7 , Celine Scornavacca * , Concetta Burgarella Sylvain Glémin , , Jacques Dainat original U.S. Government 6 6,8 6 6 Véronique Viader , Gautier Sarah , Morgane Ardisson , Sylvain Santoni , Works. Distributed 6 6 under a Creative Jacques David , Vincent Ranwez Commons Attribution NonCommercial Cultivated wheats are derived from an intricate history of three genomes, A, B, and D, present in both diploid License 4.0 (CC BY-NC). and polyploid species. It was recently proposed that the D genome originated from an ancient hybridization between the A and B lineages. However, this result has been questioned, and a robust phylogeny of wheat relatives is still lacking. Using transcriptome data from all diploid species and a new methodological approach, our comprehensive phylogenomic analysis revealed that more than half of the species descend from an ancient — hybridization event but with a more complex scenario involving a different parent than previously thought Aegilops mutica , an overlooked wild species — instead of the B genome. We also detected other extensive gene flow events that could explain long-standing controversies in the classification of wheat relatives. Downloaded from INTRODUCTION stitute the tetraploid durum wheat, and A, B, and D genomes comprise Reconstructing phylogenetic relationships between domesticated the hexaploid bread wheat. s of central interest for agriculture plant species and their wild relatives i .( et al Recently, Marcussen ) proposed the hypothesis that the 7 and breeding. Gene flow and hybridization between related species are D genome lineage arose 5 to 6 Ma ago through a homoploid hybrid relatively common in plants and make phylogeny reconstruction dif- speciation between the A genome and B genome lineages (A, B, and D http://advances.sciencemag.org/ ficult because of numerous conflicts among individual gene genealogies lineages hereafter), explaining the difficult resolution of a consensual ). During rapid species divergence, incomplete lineage sorting (ILS), 1 ( tree-like history among these three groups. This result has been which occurs when ancestral polymorphisms are still shared by two questioned, and more complex scenarios with several rounds of hy- ). The 2 species or more, is another source of phylogenetic conflicts ( bridization have been proposed since then ( 5 , 11 , 12 ). However, none Aegilops Triticum / genus, which includes cultivated wheat species, of the previous large-scale studies included all diploid species. For combines these challenging problems, and despite its high practical ) built their large multi-gene analysis 7 .( et al example, Marcussen and economical importance, the phylogenetic relationships among rs (plus one outgroup species) and only on the three diploid progenito species are still poorly resolved. These species form a group of annual he hexaploid wheat, whereas the the three corresponding genomes of t Mediterranean and Middle East grasses comprising 13 diploid and 13 diploid species were analyzed usingonlysixgenes[seefig.S6in( )]. 7 18 polyploidy species (including durum wheat and bread wheat). This A genome-wide analysis including all diploid species is still lacking. We genus belongs to the Triticeae tribe that is already known for its propose to re-evaluate the scenario of the homoploid speciation of the on May 5, 2019 3 complex reticulated history ( 4 ), and the occurrence of many alloploid , D genome and to position it in the complex phylogeny of the diploid ) shows that hybridization is possible and has promoted spe- 5 species ( relatives of cultivated wheats. To do so, we obtained and analyzed a cies formation. Moreover, species diversification likely occurred rather comprehensive genomic dataset including all extant diploid species )], and some species are rapidly [around 4 to 7 million years (Ma) ( 6 , 7 and developed a new framework to test intricate hybridization scenar- ), gener- 8 highly polymorphic, with a large effective population size ( ios. Our results shed a new light on the history of wheat relatives, and we ating a potentially high level of ILS. Both hybridization and ILS could proposed a complex but robust scenario that resolves long-standing explain why many conflicting results have been obtained for single- controversies on the hi story of these species. ). In particular, it has proven difficult 9 gene phylogenies so far ( 10 , to resolve the relationships among the diploid parental donors of the RESULTS polyploid domesticated wheats, Triticum urartu (A genome), Aegilops A phylogenomic view of the history of wheat relatives (S genome, considered to be the closest current genome of the speltoides We produced a transcriptome-based dataset of orthologous coding B genome), and (D genome): A and B genomes con- Aegilops tauschii sequences (CDSs) including at least two (and up to four) individuals for 1 UMR 6553, – CNRS, Univ Rennes, ECOBIO (Ecosystèmes, biodiversité, évolution) species plus one individual Triticum each of all the 13 diploid Aegilops / 2 Department of Ecology and Genetics, Evolutionary Biolo- F-35042 Rennes, France. of three close outgroups belonging to the Triticeae tribe: Taeniatherum 3 gy Center, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden. Institut (table caput-medusae Eremopyrum bonaepartis ,and Secale vavilovii , ’ des Sciences de l Evolution Université de Montpellier, CNRS, IRD, EPHE CC 064, 4 S1). In addition, we used the published sequence of the Hordeum National Bio- Place Eugène Bataillon, 34095 Montpellier, cedex 05, France. informatics Infrastructure Sweden (NBIS), SciLifeLab, Uppsala Biomedicinska Cen- vulgare genome (Genome Assembly ASM32608v1) as the most dis- 5 IMBIM – Department of trum (BMC), Husargatan 3, S-751 23 Uppsala, Sweden. tant outgroup. We separately assembled the transcriptome of each Medical Biochemistry and Microbiology, Uppsala University, Uppsala Biomedicinska 6 individual and stringently clustered and aligned the annotated CDSs. Centrum (BMC), Husargatan 3, Box 582, S-751 23 Uppsala, Sweden. AGAP, Univ 7 CIRAD, UMR Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France. After cleaning and processing (see Ma terials and Methods), we retained 8 AGAP, F-34398 Montpellier, France. South Green Bioinformatics Platform, BIOVERSITY, 11,033 alignments for the supertree a nalysis. Among them, we used the CIRAD, INRA, IRD, Montpellier SupAgro, Montpellier, France. 8739 genes containing at most one sequence per individual for the *Corresponding author. Email: [email protected] ction. The 11,033 individual gene supermatrix analysis and hybrid dete These authors contributed equally to this work. † ., 2019; 5 :eaav9188 1 May 2019 1of10 et al Glémin Sci. Adv.

2 | RESEARCH ARTICLE SCIENCE ADVANCES Ae. umbellulata Tr257 Ae. umbellulata Tr266 Ae. umbellulata Tr268 58 Ae. caudata Tr276 A B Ae. caudata Tr275 38 Ae. caudata Tr139 Comopyrum Ae. comosa Tr272 Ae. comosa Tr271 82 Ae. uniaristata Tr357 Ae. uniaristata Tr402 Ae. uniaristata Tr403 Ae. uniaristata Tr404 Ae. bicornis Tr407 83 Ae. bicornis Tr408 Ae. bicornis Tr406 Sitopsis 91 Ae. longissima Tr241 Ae. longissima Tr242 Ae. longissima Tr355 85 88 Ae. sharonensis Tr265 Ae. sharonensis Tr264 Ae. searsii Tr165 Ae. searsii Tr164 47 44 Ae. searsii Tr161 Ae. tauschii Tr125 Ae. tauschii Tr180 Ae. tauschii Tr352 Ae. tauschii Tr351 T. boeoticum TS10 T. boeoticum TS3 T. boeoticum TS8 82 99 T. boeoticum TS4 T. urartu Tr315 T. urartu Tr232 Downloaded from T. urartu Tr309 T. urartu Tr317 Ae. speltoides Tr251 68 Ae. speltoides Tr320 Ae. speltoides Tr323 49 Ae. speltoides Tr223 Ae. mutica Tr332 78 Ae. mutica Tr244 Ae. mutica Tr329 Ae. mutica Tr237 100 http://advances.sciencemag.org/ Ta. caput-medusae TB2 S. vavilovii Tr279 Er. bonaepartis TB1 H. vulgare HVens23 Aegilops Fig. 1. Reconstructed phylogeny of the Triticum genus. ( A / Aegilops / Triticum genus. This same topology was obtained by both the ) Phylogenetic tree of the ML analysis of 8739 gene alignments concatenation (supermatrix) and the supertree combination of 11,033 individual gene trees. All bootstrap values of the super- matrix analysis are 100 except those designated by an asterisk (* = 98). Support values for the supertree analysis are given for each interspecies node [percentage of triplets supporting a given node ( ” 7 “ B Cloudogram 13 ). ( ) )]. Time scale was obtained by making the ML tree ultrametric and assuming a divergence of 15 Ma with Hordeum ( of 248 trees (in gray) inferred from non-overlapping 10-Mb genomic window concatenations. The global phylogeny is superposed in black. on May 5, 2019 trees used to construct the supertree with SuperTriplets ( )wereob- 13 species within the D lineage clearer, where no consensus had emerged tained by maximum likelihood (ML) using RAxML v8 ( 14 ). The total- so far. The species clustering is in agreement with their geographic evidence species tree was also obtained by ML from the concatenation proximity, roughly following an east-west distribution (fig. S1). s. Both the supertree and the super- of all 8739 one-copy gene alignment matrix approaches gave the same topology (Fig. 1A), distinguishing A new approach for analyzing multispecies coalescent three main clades that we called the A lineage (the two species, Triticum with hybridization , the wild ancestor of the domes- Triticum boeoticum and T. urartu However, while the two phylogenomic approaches were fully congruent ticated einkorn wheat ), the B lineage ( Ae. T. b. ssp. monococcum and the supermatrix tree was strongly supported (bootstrap = 100 for ), and the D lineage (all other species), Aegilops mutica speltoides + all but one node), the supertree support values were low (<60) for 5 following the simplified terminology of Marcussen ). This 7 .( et al of 11 intragenus nodes (Fig. 1). This could be due to both ILS and hy- partly contradict the traditional topology reveals new insights that bridization. Scenarios with one or more hybridization events have view of wheat relative evolution. First, while the Sitopsis clade is re- alreadybeenproposed,butitwasdifficulttodirectlytestthembecause , Aegilops longissima , Aegilops searsii trieved (including Aegilops bicornis , they assumed ancestral events withou t considering all extant species. In Ae. speltoides ), is excluded from this clade and Aegilops sharonensis and addition, current methods to infer reticulated evolution with ILS are not appears to be the sister species of Ae. mutica . While this latter species yet able to deal with these large datas ets (43 ingroup individuals here), has been excluded from the Aegilops genus for a long time and its phy- especially with potential nested rounds of hybridization (text S1). For logenetic position is debated ( 10 ), our results show that it is central in example, PhyloNetworks does not consider nested hybridizations . As an ) 15 pology clarifies what the D lineage the history of wheats. Second, this to (more formally, only level 1 networks are considered) ( alternative strategy to disentangle hybridization events from ILS, we corresponds to by showing th at all nine other diploid species Aegilops proceeded in three steps. First, we searched for all potential hybrids Aegilops comosa Ae. tauschii , , , belong to this clade ( Aegilops caudata among triplets of species. Under pure ILS, one major topology and Aegilops umbelullata , Aegilops uniaristata ,andthefour Sitopsis species). two equivalent minor topologies are expected, while two topologies . based on six genes only et al This contradicts the result of Marcussen 16 can predominate over the third one under hybridization ( , 17 ). This [see fig. S6 in ( )]; indeed, they claimed that the D lineage only included 7 was the rationale used to propose the hybrid origin of the D genome Sitopsis species, whereas the four other species were and the Ae. tauschii 7 ). We thus counted the number of sites supporting the three possible ( grouped within the B lineage. Third, it makes the relationships among ., Sci. Adv. 2019; 5 :eaav9188 1 May 2019 2of10 Glémin et al

3 | RESEARCH ARTICLE SCIENCE ADVANCES topologies, from which we computed a hybridization index and its , , ( 5 7 ). Thus, we searched to determine the parental species of the 18 P 7 17 , ). Second, using the phylogeny we obtained value ( associated 16 , D lineage and whether all species of the D clade descended from the as a reference (Fig. 1), we identified the possible hybridization scenarios et al .( 7 ). To do so, same hybridization event proposed by Marcussen compatible with triplets of species showing a significant departure from we considered the indices for which an individual from the D clade the null model with pure ILS. To do so, we analyzed the hybridization could be a hybrid between parents from the A and B clades. The nine indices of all triplets of species in a systematic way (text S2): We hier- species of the D clade showed a clear signature of hybridization with etipstotherootofthephylogeny. archically parsed the indices from th om 30 to 70% (Fig. 3A), suggesting a proportion of B species varying fr We started from triplets including two individuals from the same spe- that all D species are issued from hybridizations between the A and B cies and a third individual from a sister species (recent hybridizations) clades. However, the distribution of indices was highly heterogeneous, to triplets of species belonging to the three main A, B, and D clades both across potential hybrids and for potential parents, indicating a (ancient hybridizations). Third, we developed a new composite-likelihood complex scenario for the formation of the D clade. The distribution method based on quartets of species to discriminate among complex of hybridization indices is similar regarding the A parents. In contrast, scenarios: Only one hybridization can be detected with three taxa, Ae. mutica Ae. speltoides contributed much less than Sitopsis to non- whereaswithfourtaxa,ninedegreeso f freedom are available, allowing species of the D clade, while its c Sitopsis ontribution appears similar for us to infer scenarios with two hybridizations (Fig. 2 and texts S2 and S3). species. In addition, Ae. mutica showed the unexpected and contradic- We applied the quartet method successively to the groups of species tory pattern of being both a potential parent of the D clade and a hybrid where we identified possible hybridizations. Ae. speltoides between and A or D species (fig. S2.2E). From a simple graphical reasoning and applying more formally our new quartet method, Downloaded from Reconstruction of hybridization events we showed that this could be explained by at least two interwoven hy- Hybridization appeared widespread, with 40% of triplets showing an bridization events (text S2). In the most likely scenario, Ae. mutica ,but index higher than 10%. However, the analysis of triplets composed of , hybridized with the ancestor of the A clade to give Ae. speltoides not two individuals of a same species and a third individual from a second rise to the ancestor of the whole D clade, with a proportion of the A species revealed very low indices, suggesting nearly an absence of recent clade ranging from 0.35 to 0.58, sugge sting a rather symmetrical hybrid- hybridization (text S2). Ae. mutica ancestor ization (Fig. 4 and text S4). Before this event, the http://advances.sciencemag.org/ was not considered as a member of In previous studies, Ae. mutica was partly introgressed by the ancestor of the A clade, with propor- “ and the definition of the ” ” Dlineage Blineage, “ the remained elusive tions ranging from 0.11 to 0.18 (Fig. 4 and text S4). We also computed α α γ γ γ γ (1 ) γ γ (1 ) γ γ (1 (1 ) ) Τ 2 1 1 2 2 1 1 2 1 γ γ 1 1 1 α α Τ 2 γ γ 1 2 2 on May 5, 2019 Fig. 2. Rationale of the quartet method. The dataset is composed of the counts of the 10 informative site patterns associated with four taxa (so nine degrees of freedom): 0 and 1 are the ancestral and derived states, respectively (polarization with an outgroup). A scenario corresponds to a network with four taxa and up to two ). The model also includes the times of hybrid- g hybridizations. It can be decomposed into components (i) with probabilities given by the hybridization proportions ( i T ization ( ) (eight parameters in total). For each component, (ii) the probabilities of embedded coalescent trees and (iii) the a ) and the coalescent rates on each branch ( i i T probabilities of site patterns given a coalescent tree are computed. They are function of a . Together, they give the expected frequencies of each site pattern for a and i i given scenario. The likelihood of a scenario is given by the multinomial distribution of observing the count vector { v } given the expected frequencies { p ... , , , p v }. , ... 1 1 10 10 Likelihood comparison was used to choose the best scenario. Sci. Adv. ., 5 :eaav9188 1 May 2019 3of10 et al Glémin 2019;

4 | RESEARCH ARTICLE SCIENCE ADVANCES Sitopsis hybridization indices along chromosomes for 10-Mb windows. The clade showed a distinctive hybrid- Among D species, the distribution of indices did not indicate any large and contiguous in- ization signature compared to other species (Fig. 3A and text S2), like- ly due to a secondary introgression by Ae. speltoides trogression block (Fig. 3B and text S5) as would be expected under a (text S2). This single hybridization event followed by rapid speciation ( , 20 ). Sim- 19 scenario reconciles the morphological and cytological classifications ulations showed that introgression blocks should be smaller than of Ae. speltoides in the Sitopsis clade and some molecular-based phylo- 10 Mb (the window size) to explain the observed patterns, even if genies excluding Ae. speltoides from this clade ( 10 , 21 ). Last, we searched for other possible hybridization events within the D lineage. We found chromosome rearrangements are included in the simulations (text and no signature of hybridization after the divergence of the S5). This is hardly compatible with a single and simple homoploid Sitopsis hybrid speciation event, so recurrent gene flow may have occurred dur- clades, in agreement with the strong supertree supports Comopyrum for these clades (Fig. 1) and with their ancient recognition as taxonomic ing species divergence. A B D species as hybrids between A and B lineages ) along chromosome 3 γ Hybridization index ( Ae. speltoides Ae. mutica 1.00 Sitopsis 0.7 T. boeoticum 0.6 Comopyrum 0.5 in D 0.75 Downloaded from 0.4 ) = % of B in D 0.3 γ 0.50 Ae. mutica 0.7 T. urartu 0.6 0.25 0.5 Proportion of http://advances.sciencemag.org/ 0.4 Hybridization index ( 0.3 0.00 300 200 400 500 100 0 Position along chromosome (Mb) Ae. searsii Ae. searsii Ae. tauschii Ae. tauschii Ae. bicornis Ae. bicornis Ae. comosa Ae. comosa Ae. caudata Ae. caudata Ae. uniaristata Ae. uniaristata Ae. longissima Ae. longissima Ae. umbellulata Ae. umbellulata Ae. sharonensis Ae. sharonensis D focal species ( ) Violin plots of the hybridization index for the nine species of the D lineage as a A Fig. 3. Distribution of the hybridization index for the origin of the D clade. or function of the A ( ) parents. The dotted lines correspond to a perfect 50/50 hybridization. All indices are T. urartu or T. boeoticum ) and B ( Ae. speltoides Ae. mutica 6 − <10 P significantly different from 0 ( B ) Distribution of the mean hybridization index [and 95% confidence interval (CI)] calculated on 10-Mb after Bonferroni correction). ( windows, along chromosome 3. Red dashed line, chromosome mean; blue line, loess regression with 95% CI in dark gray. The Sitopsis section and Ae. speltoides were on May 5, 2019 excluded because of additional introgression (event 3 on Fig. 4). A B γ γ 1 γ γ 1 γ γ 1 1 1 1 γ γ 1 γ 2 2 1 γ 2 2 1 γ γ 1 γ γ 1 1 1 1 γ γ 1 γ 2 2 1 γ 2 2 γ γ γ γ γ γ A ) Schematic representation of the two-hybridization tested scenarios Fig. 4. The best scenario for the origin of the D clade determined by the quartet method. ( (A, species from the A clade; D, species from the D clade; M, ;S, Ae. speltoides ). ( B ) Akaike Information Criterion (AIC) of the saturated model and the Ae. mutica four tested scenarios. Models were run with the 10 different combinations of species from the A and D clades. The best AIC are in bold. In two cases, two mo dels th close AIC) in one combination. Po int have close AIC (the second one is in italics). Scenario 4 is the best model in ni ne combinations and the second one (wi are given for scenario 4: D is the result of two successive hybridizations A + S MthenA+M → → D. For the three first combinations, there and g estimates of g 1 2 g is a second best model with a very close AIC with a much lower , in agreement with other values. Scenarios with no or only one reticulation were also tested, 1 and all have much higher AIC (text S4). et al Sci. Adv. 2019; 5 :eaav9188 1 May 2019 4of10 Glémin .,

5 | SCIENCE ADVANCES RESEARCH ARTICLE Ae. umbellulata and entities. However, we found complex patterns for vious methods using quartet of species. For example, Pease and Hahn especially for Ae. caudata , suggesting pervasive gene flows before and ( 25 ) only used the symmetry in site patterns to detect and polarize y supported clade (Fig. 1). Although during the divergence of this poorl departure from pure ILS by defining a statistics with null expectation we could not identify the complete scenario, at least three hybridization under the null hypothesis. Here, we obtain a full analytical expression events are required to explain the results (Fig. 5 and text S3). This sheds for the expectation of site patterns, allowing writing likelihood functions, ysis of the genome of light on the recent anal Ae. caudata (syn Aegilops hence to test competitive models an d to estimate their parameters. The markgrafii ) that showed major structural rearrangements compatible detailed statistics properties of the model remained to be fully explored, 22 ). with hybridization events ( and as for other methods based on site patterns ( , , 25 26 ), bias can 17 occur because of misspecifications due to, for example, polarization errors or unbalanced missing data. These improvements still need to be DISCUSSION developed. Furthermore, extending themethodtofivespeciesormore Examples of nonbifurcating speciation histories are accumulating would still allow more elaborated sce narios, but the exponentially grow- ). Reconstructing species trees despite ILS and detecting in- ( 23 , 24 ing number of parameters prevent s any simple development for now. ), but inferring the detailed history 1 trogression events are now feasible ( Alternatively, quartet statistics c ould be used as elementary blocks for of multiple and successive events with more than a few species remains , 27 , 28 )]. The findings presented here will be iterative methods [e.g., ( 15 challenging. We proposed a new meth odological framework to tackle instrumental for these developments. this issue. First, we showed that a hierarchical analysis of hybridization Owing to these new developments, we were able to propose a core indices helps identify the main potential events, hence simplifying / Aegilops reference scenario for the history of diploid species Triticum Downloaded from further analyses. Where systematic methods that can deal with large that should be pivotal for future research on wheats and their relatives manual datasets and complex scenar ios are not available, such a “ ” step (Fig. 5). We confirmed the occurrence of an ancient hybridization event canbeusefultoreducetheanalysistoasmallernumberoftaxa.Then, that gave rise to the D lineage, but we showed (i) that this lineage using quartets of species instead of tripletsallowsforahigherdegreeof includes 9, not only 5, of the 13 diploid species of the genus and (ii) that )]. Com- freedom, hence fitting more complex scenarios [see also ( 25 the hybridization scenario is a more complex scenario than previously ) or ABBA-BABA pared to triplet-based methods such as HyDe ( 17 Ae. mutica instead of proposed and involves a different parental species, http://advances.sciencemag.org/ ( ), this method does not require to assume a constant effective pop- 26 has been an overlooked spe- Ae. mutica . For a long time, Ae. speltoides ulation size across species divergence to properly estimate hybridization cies with a debated phylogenetic position. Our results plead for recon- or introgression proportions. In addition, our method goes beyond pre- sideration and extensive study of this key species in the history of Triticum A Introgression from the ancestor Triticum 1 ancestor Ae. mutica into the Origin of the D clade by hybridization 2 Ae. between the A clade ancestor and the Comopyrum on May 5, 2019 mutica ancestor Introgression of the Sitopsis ancestor by 3 the ancestor Ae. speltoides Ae. caudata 4 Complex gene flows during the divergence 4 Ae. caudata Ae. umbellulata of and ? probably involving three events (or more) Ae. umbellulata D 2 Ae. tauschii 1 Sitopsis 3 Ae. mutica B Ae. speltoides 2 6 4 0 Ma Fig. 5. The proposed scenario for the history of diploid Aegilops / Triticum species. The proposed events obtained from the analysis of hybridization indices and from the quartet method have been added to the global phylogeny. Well-supported clades have been collapsed. The length of triangles corresponds to the divergence age as in Fig. 1. For event 4, the question mark indicates the uncertainties of this complex event. Solid arrows correspond to the most likely detected events, and as we could not formally test this hypothesis (text S3). Sitopsis dashed arrows correspond to possible additional ones: We could not exclude hybridization from Sci. Adv. ., 5 :eaav9188 1 May 2019 5of10 et al Glémin 2019;

6 | SCIENCE ADVANCES RESEARCH ARTICLE wheat relatives. Ae. mutica is a self-incompatible species with poten- ance with the protocols supplied by the manufacturer. Purified cDNA tially large genetic diversity, and its direct implication in the history of templates were enriched by 15 cycles of polymerase chain reaction the D genome makes it a strong candidate as a new reservoir of genetic (PCR) for 10 s at 98°C, 30 s at 65°C, and 30 s at 72°C using PE1.0 diversity for wheat breeding programs. In addition, its potential inter- and PE2.0 primers and the Phusion High-Fidelity PCR Master Mix , the closest extant Ae. speltoides est is increased by its proximity with (Thermo Fisher Scientific, USA). Each indexed cDNA library was ver- species of the progenitor of the B genome. ified and quantified using a DNA 100 chip on a Bioanalyzer 2100 to Our results also pointed to other introgression events to various qually represented, genotypes. build pooled libraries made of 12, e Ae. speltoides in the ancestor extents, especially the introgression of The final pooled library was quanti fied by quantitative PCR with the of the clade, which can explain long-term controversies in Sitopsis KAPA Library Quantification Kit (KAPA Biosystems, USA) and the classification of wheat relatives. has been considered Ae. speltoides provided to the Get-PlaGe core facility (GenoToul platform, INRA ) or excluded from it 29 section ( Sitopsis alternatively as a member of the Toulouse, France; www.genotoul.fr) for sequencing. Each final pooled 10 ). Our results explain these contradictory results. The scenario we ( 9 , library (12 genotypes) was sequenced using the Illumina paired-end proposed also suggests that chromosome similarities of repetitive protocol on a single lane of a HiSeq 3000 sequencer for 2 × 150 cycles. 29 )mayhave elements between Ae. speltoides and the Sitopsis clade ( resulted from transposable element exchanges following hybridization, Transcriptome assembly and annotation d within a clear phylogenetic ahypothesisthatcannowbeteste Reads were cleaned and assembled fo llowing the pipeline described in framework. While our analysis pointed to other hybridization events, et al 30 ) and recalled here for comprehensiveness. Reads were .( Sarah the signature is much less clear, likely because at least three events seem 32 preprocessed with cutadapt ( ) using the TruSeq index sequence Downloaded from to be involved whereas the method we developed can only consider a corresponding to the sample, searching within the whole sequence. maximum of two events. − The end of the reads with low-quality scores (parameter, q 20) was Overall, these results suggest that this genus is especially prone to trimmed, and we only kept trimmed reads with a minimum length of hybridization. A high hybridization potential could contribute to ex- 35 bp and a mean quality higher than 30. Orphan reads were then species are polyploids (especially plainthatmorethanhalfof Aegilops discarded using a homemade script. Remaining paired reads were allopolyploids). However, despite such a high ability to hybridize, we assembled using ABySS ( 33 ) followed by one step of Cap3 ( 34 ). Reads http://advances.sciencemag.org/ did not detect any recent hybridization events, in contrast to the returned as singletons by the first assembly run were discarded. ABySS many ancient events we found. The reason for this pattern still needs was launched using the paired-end option with a kmer value of 60. Cap3 to be understood. was launched with the default parameters, including 40 bases of overlap, and the percentage of identity was set at 90%. We slightly modified the RAPSearch program ( 35 )tomakeits MATERIALS AND METHODS blast formatted output compatible with the expected input format Data acquisition of prot4est. We used this modified version of RAPSearch to identify Data were obtained following the same procedure as in Sarah et al .( 30 ) protein sequences similar to our contigs either in plant species of , and redescribed here for comprehensiveness. Sequences of T. boeoticum UniProt SwissProt (www.uniprot.org) or in the Monocotyledon spe- T. caput-medusae ,and E. bonaepartis were already obtained by cies of GreenPhyl (www.greenphyl.org/cgi-bin/index.cgi). We then on May 5, 2019 et al Clément ). Sequences of all other species were newly obtained. 31 .( used the prot4est program ( ) to predict the CDS embedded in 36 our contigs based on the following input: RAPSearch similarity All samples were constituted by a combination of leaves (20%) and Oryza matrix model for de novo – based predictions, and output, inflorescence tissues (80%). RNAs were extracted and prepared sepa- . the codon usage bias observed in T. boeoticum rately for each organ and then mixed a ccording to the given proportions. Short sequences are often difficult to cluster into reliable orthologous Samples were ground in liquid nitrogen, and total cellular RNA was groups and are not very informative for phylogeny inference; hence, extracted using a Spectrum Plant Total RNA kit (Sigma-Aldrich, we discarded predicted CDS with less than 250 bp as done in a similar USA) with a deoxyribonuclease treatment. RNA concentration was context to populate the OrthoMaM database ( 37 ). The total numbers of first measured using the NanoDrop ND-1000 Spectrophotometer and contigs per species are given in table S2. then with the Quant-iT RiboGreen (Invitrogen, USA) protocol on a Tecan Genios spectrofluorometer (Tecan Ltd., Switzerland). RNA quality Orthologous search m was assessed by running 1 lofeachRNAsampleonanRNA6000Pico We relied on USEARCH v7 ( ) to cluster the predicted CDSs. We de- 38 chip on a Bioanalyzer 2100 , USA). Samples with (Agilent Technologies signed a four-step approach that limits the impact of taxon sampling an RNA Integrity Number value greater than eight were deemed and sequence ordering during cluster creation, avoids assigning se- lumina TruSeq mRNA protocol. acceptable according to the Il quences to an arbitrary cluster in case of tile, and can easily handle The TruSeq Stranded mRNA Library Prep Kit (Illumina, USA) our large dataset (both in terms of required memory and computation s protocol with the following ’ was used according to the manufacturer time). First, for each species of the ingroup, we selected the accession containing mRNA molecules were puri- modifications. Polyadenylate – with the highest number of CDSs to represent this species during the fied from 2 m g of total RNA using poly-T oligo-attached magnetic beads. first step of cluster creation. Second, we used UCLUST to cluster these The purified mRNA was fragmented by the addition of the fragmen- sequences and to output the median sequence of each cluster, which will tation buffer and was heated at 94°C in a thermocycler for 4 min. A be used as cluster bait. Third, we used USEARCH to identify, for each fragmentation time of 4 min was used to yield library fragments of predicted CDS, the set of clusters for which the considered CDS and the 250 to 300 base pairs (bp). First-strand complementary DNA (cDNA) cluster bait had a similarity above 85% along at least 50% of their length. was synthesized using random primers to eliminate the general bias Last, all predicted CDSs having such a similarity with one single cluster toward the 3 end of the transcript. Second-strand cDNA synthesis, ′ bait were assigned to this cluster; all others were discarded. end repair, A-tailing, and adapter ligation were performed in accord- Sci. Adv. 2019; 5 :eaav9188 1 May 2019 6of10 et al Glémin .,

7 | SCIENCE ADVANCES RESEARCH ARTICLE Alignment and cleaning fast-bootstrap option. The resulting phylogeny has the same topology Following the strategy used to po pulate the OrthoMaM database ( 37 ), than the supertree shown in Fig. 1, an d all nodes but one have bootstrap CDSs were aligned at the nucleotide level based on their amino acid values equal to 100. Using the Hordeum genome as a reference, we also translation, combining the speed of MUSCLE ( 39 ) and the ability of concatenated genes in 10-Mb windows along chromosomes, obtaining 40 ) to handle sequence errors in predicted CDSs resulting MACSE ( 298 alignments with at least three ge nesperwindow.(Forthisanalysis, s amino acid translations. In more in apparent frameshift and erroneou 5976 genes were kept since the others either could not be assigned to llowing: First, CDSs were translated detail, for each cluster, we did the fo one or two a position on the Hordeum genome or were isolated — into amino acids, these amino acid sequences were then aligned using in their 10-Mb window.) We reconstructed the phylogeny sequences — MUSCLE, and the obtained protein alignment was used for deriving the ofeachalignmentusingthesamemethod.Theglobaltreeandthe298 nucleotide one using MACSE reportGapsAA2NT routine. Second, this 10-Mb trees were made ultrametric using the chronos function of the nucleotide alignment was refined using MACSE refineAlignment rou- ). Among the 298 10-Mb trees, only 248 contained all 45 ape R package ( tine. Last, the resulting amino acid alignment was cleaned with individuals. We used them to draw the cloudogram presented in Fig. 2B 41 ), and a homemade script (that will be part of the next HMMcleaner ( using Densitree ( ). 46 MACSE release) was used to report th e obtained amino acid masking at the nucleotide level. Detection of hybridization events We used the same supermatrix alignme nts to detect possible hybridiza- Phylogeny reconstructions tion events by applying the rationa le developed by Meng and Kubatko ) using the General Time 14 Gene trees were inferred with RAxML v8 ( ( 16 47 )andKubatkoandChiffman( ). Note that this was also the ratio- Downloaded from Reversible (GTR) model with a four-category gamma distribution 7 ). In broad nale used to propose the hybrid origin of the D genome ( (GTR+ G 4) to accommodate for evolution rate heterogeneity among strokes, if we consider a triplet of lineages A, B, and C, with B being a sites and using RAxML fast-bootstrap option ( f). − hybridbetweenA(inproportion1 − g ) and C (in proportion g ), then 42 BppReroot of the BppSuite ( ) was used to reroot the 43 , the probabilities of the three rooted topologies are given by 13,288 gene trees, using as outgroups the following ordered list of S_vavilovii , Er_bonaepartis , H_vulgare species: . Ta_caputMedusae ,and http://advances.sciencemag.org/ 3 = Þ t ð exp Þ g  1 Þþð 3 = Þ t ð 2exp  1 ð g ފ¼ C ; B ð ; A ½ P In more detail, for each of the gene tree, we considered each species of exp ð t Þ = 3 t 1  g Þð 1  2exp ð C Þ = 3 Þþ g P ; ð A ; B ފ¼ð ½ the outgroup list one after the other u ntil finding the first one present in ފ C A ð ; B ½ P ; ¼ exp ð t Þ = 3 the current gene tree (if none was fo und, we discarded the tree). Having identified the most relevant outgroup species for this gene tree, we then of this outgroup species formed a checked whether all the individuals where t represents the time between speciation events on the parental monophyletic clade; if yes, we rooted the tree on this clade, otherwise we trees measured in 2 N generations. It can be easily shown that e n a forest of 12,959 rooted gene trees, discarded the tree. This resulted i . F which we denoted by i B ފ ; A ½ P C ; A ð ; B ½ P ފ C ; ð ¼ g Since our aim here was to build a phylogeny of species and not of ފ 2 ½ A ; ð B ; C ފþ P ½ C ; ð A ; ފ C ; A ð ; B ½ P B P individuals, we focused on the identification of reliable species clades on May 5, 2019 from the information contained in the gene trees. Therefore, we derived [B,(A,C)] = 2exp( − )/3 directly gives the P t In addition, note that 2 y renaming each sequence by the two forests of multilabel trees b from F i ). Thus, 2 probability of incongruence due to ILS ( g can be estimated ) and keeping only clades species to which it belongs to (forest F m 95 ) positions supporting each j and i by counting the number of two-state ( F with a bootstrap value greater than 95 (forest ). Almost all trees m topology using an outgroup (O) to polarize mutations ( 47 ). Considering (99.99%) in these forests are multilab eled as alignments include several the order O/A/B/C, we have )topro- 44 individuals for at least some species. We thus used SSIMUL ( 95 by turning F and without losing — cess the multilabel of trees of F m m i Þ B C ; ; j A ; # → ; ð ¼ i ; j x its multilabeled trees into single- — phylogenetic signal when possible labeled trees. This was done by removing a copy of each pair of isomorphic Þ B ; ¼ j ; ; y # A ð ; C → i ; j i 95 F the new forests ob- and ). We denoted by 44 sibling subtrees ( F ; Þ z ¼ # i C j ; i ; j → B ; ð A ; s s 95 , respectively. We F and tained by pruning isomorphic trees of F m m used SuperTriplets ( 13 ) to construct a supertree from the 11,033 trees So as g we can define a hybridization index that is an estimator of 95 . The resulting supertree is depicted in Fig. 2. The support values F in s z  x given by SuperTriplets to the clades are very low (only three clades have a ^ ¼ g þ x z 2  y support greater than 90); this shows that, even if we only keep clades with 95 contains a high level of contradiction. a support greater than 95, F s To test the significance of this e stimator, that is, to identify values g not due to random sampling under pure ILS, Kubatko and Chiffman Supermatrix analysis 47 , 17 ( )thatisnormally ” Hils statistics “ ) proposed a statistics (called the , some trees are also multilabeled at the individual level In the forest F i distributed with mean zero and variance one. It allows rapidly detecting the two allelic copies were split. either because of paralogy or because l possible triplets in a large phylog- significant potential hybrids among al , we extracted the set of 8739 trees containing at most one From F i eny. We used this test to filter out the g estimates and only consider sig- sequence per individual. We built the concatenation of all the 8739 ) , nificant ones. Because of the high rate of false positive of this test ( 17 47 alignments corresponding to these trees, giving a supermatrix with and of the large number of sites in the alignment, we used the very strin- one sequence for all individuals. W e inferred the phylogeny from this − 6 (instead of 0.05) after Bonferroni correction for gent threshold of 10 14 )usingtheGTR+ G supermatrix with RAxML v8 ( 4modelandthe Sci. Adv. 2019; 5 :eaav9188 1 May 2019 7of10 ., et al Glémin

8 | SCIENCE ADVANCES RESEARCH ARTICLE multiple testing. In addition, we f > ocusedonmajoreventsforwhich g ( 48 , 49 ). We first obtained the vectors of expected branch lengths 10%. Notably, the above rationale implicitly assumes that the effective , with — l leading to the 10 site patterns for these four trees denoted by i size, N ranging from 1 to 4. Note that the longer a branch, the higher the i , remained the same in the two diverging A and C lineages. e ^ probability for a mutation to occur, so that branch lengths directly is still Relaxing this assumption biases the estimation of g ,but g affect observed pattern frequencie s. We hence enumerated all possible expected to be null only without hybridization, so that the detection ^ gene trees embedded in a given four-taxon species tree and computed of hybridization is conservative. However, a single value can be g difficult to interpret when multiple hybridization events occurred. both the probabilities of the compatible topologies and the mean length Thus, we first computed the statistics for all triplets to list all possible of the branches where the occurrence of a mutation leads to a given site hybridization events. Then, we forma lly tested the proposed scenarios pattern. Probabilities and branch lengths are function of divergent times within an ML framework (see below). and coalescent rates (text S3). Then, a full scenario with hybridization ^ g To compute the values of for each triplet of in dividuals, we applied can be obtained by combining the corresponding trees with their re- a modified version of the HyDe program ( 47 , 17 )toallowretrievingnot spective weights. Consider two non-nested hybridization events with for the first g − and 1 z ,and only the Hils statistics but also the counts of each patterns ( x ). , y g proportions of the parental lineages being 1 1 and 1 for the second one. The vector of probabilities g − g event and As outgroup, we used the consensus sequence of the four outgroup species 2 2 forthefullnetworkisthus to limit homoplasy, which can bias statistics. For each triplet, we ordered ^ g topologies and species such that > z > . We applied it x and computed y 1 298 10-Mb window alignments. to the full alignment and to the ¼ p g ð g Þ g  g ð g þ l l g  þð Þ 1 Þ l l Þ þð 1  g Þð 1  g 1 3 1 4 2 2 2 1 1 2 1 2 1 With 43 ingroup individuals, 74,046 triplets are possible, making the K Downloaded from analysis of individual triplets useless. Instead, we parsed the results hi- 10 ∑ where K is a normalization constant such that p ¼ 1. erarchically based on the clades previously obtained with phyloge- 1 ¼ i i For scenarios with two nested reticulations, hybridization and co- netic analyses: We started from triplets of species belonging to the alescent processes cannot be fully decoupled ( 49 ), and some embedded same species and sister species to triplets of species belonging to coalescent trees must be computed directly on a network component thethreemainclades(A,B,andD).Fromthisanalysis(detailed instead on a tree component. If only one species is issued from two in text S2), we proposed a series of hybridization scenarios. To detect http://advances.sciencemag.org/ nested hybridization events (the only case considered here), then the possible heterogeneity of ILS and hybridization events across the ge- initial network can be decomposed into two trees in proportions nome, we also analyzed the variation of the two statistics along chromo- − g ) g and one one-reticulation network in proportion ,(1 g g somes and performed simulations to evaluatethesizeofhybridization 1 2 1 2 and the vectors of branch lengths for the two trees l l ). Noting g (1 − blocks across the genome (see text S5). 1 2 2 and l for the one-reticulation network, the vector of probabilities for the full network is thus Test of multiple hybridization scenarios With three taxa, only three rooted topologies are possible, leaving only 1 two degrees of freedom to estimate scenario parameters, which is not ¼ p ð g g l þð 1  Þ Þ g Þ g l g  þð l 1 2 1 2 1 2 2 1 sufficient if multiple hybridization events occurred. Using four taxa, K 10 informative biallelic site patterns are possible, leaving nine degrees where K is the normalization constant. on May 5, 2019 of freedom to infer scenarios (text S3). Noting 0 the ancestral and v Noting the vector of the number of positions corresponding to 1 the derived allele, the 10 informative site patterns are 0|0111, the 10 biallelic patterns, the likelihood of a network is given by the mul- 0|1011, 0|1101, 0|1110, 0|0011, 0|0101, 0|0110, 0|1001, 0|1010, tinomial sampling and 0|1100. Scenarios with four taxa and up to two hybridization ! events can be described with eight parameters (see below and fig. S2.3). v 10 10 i p obabilities of the 10 site patterns In text S3, we show how to write the pr i ! ∑ ∏ v ¼ L i under a four-taxon multispecies coalescent model with up to two hy- v ! ¼ 1 1 i i ¼ i bridization events. To do so, we need to compute both the probabilities By fixing either g g to 0 or 1, we obtained a scenario with only or of the compatible gene tree topologies and the expected length of 1 2 one reticulation, and by fixing both parameters to 0 or 1, we achieved f a mutation leads to the given pat- branches for which the occurrence o a tree-like scenario without any reticulation. A scenario with one re- tern. Then, to obtain the probabilities for a full scenario, we need to take ticulation has six free parameters and that without any reticulation has the weighted sum of all possible gene trees embedded in the four-taxon only four. As all scenarios cannot be nested in each other, we used Akaike within the probability of site pattern hybridization network. Formally, i Information Criterion (AIC) to compare them, where AIC = 2 ). 2ln( L k − canbewrittenas ascenario S vectors. Likelihood maximization Below, we show how to compute the p was made with a Mathematica script provided in the Supplementary j T j S Þ ∑ ℂ Þ d P ∑ j T P ð ℂ ð S ℂ i ∈ T ∈ ℂ S ð Þ¼ p Materials. The FindMaximum function was used with 100 random i 10 ∑ P d ð ∑ Þ P j ∑ j Þ ℂ ℂ j T T ð S T ∈ S ∈ ℂ ℂ i starting points. ¼ 1 i In the following, we excluded the Sitopsis clade from the analyses (for the C is a component of the decomposition of the scenario where S because of the additional hybridization with Ae. speltoides .Wefirstap- species tree or one-reticulation network, see below), isagenetreeem- T ,and Ae. Ae. mutica plied the model to the four taxa: A clade, D clade, j is the expected length of the branch T ,and bedded in component d C speltoides to elucidate the origin of the D clade. Because the triplet anal- i i where a mutation leads to site pattern . for a given gene tree, T ysis showed heterogeneity among species, we successively run the Scenarios with two non-nested reticulations can be decomposed into model for the 10 combinations of the two species from the A clade the four trees displayed by the corresponding phylogenetic network and T. urartu ( T. boeoticum )andthefivespeciesoftheDclade ., Sci. Adv. 2019; 5 :eaav9188 1 May 2019 8of10 et al Glémin

9 | SCIENCE ADVANCES RESEARCH ARTICLE 2. P. Pamilo, M. Nei, Relationships between gene trees and species trees. Mol. Biol. Evol. 5 , ,and Ae. uniaristata ). ( Ae. comosa , , Ae. caudata Ae. tauschii , Ae. umbellulata 583 (1988). 568 – Asonlyfoursequencesarerequiredforthisanalysis,weusedthestrict 3. N. Bernhardt, J. Brassac, B. Kilian, F. R. Blattner, Dated tribe-wide whole chloroplast consensus of the different sequences of the same species. As for the , BMC Evol. Biol. genome phylogeny indicates recurrent hybridizations within Triticeae. 17 triplet analysis, we used the consensus sequence of the four outgroup 141 (2017). 4. J. S. Escobar, C. Scornavacca, A. Cenci, C. Guilhaumon, S. Santoni, E. J. P. Douzery, to polarized mutations. We only tested scenarios where the D clade V. Ranwez, S. Glémin, J. David, Multigenic phylogeny and analysis of tree incongruences could be potential hybrids as there was no signature that and Ae. mutica 11 , 181 (2011). in Triticeae (Poaceae). BMC Evol. Biol. Ae. speltoides neither species could be potential nor the two Triticum 5. L.-F. Li, B. Liu, K. M. Olsen, J. F. Wendel, A re-evaluation of the homoploid hybrid origin of hybrids according to the distribution of hybridization indices. We then New Phytol. , the donor of the wheat D-subgenome. 8 (2015). – ,4 208 Aegilops tauschii , applied the method to ,and , Ae. tauschii Ae. caudata Ae. umbellulata 6. S. Huang, A. Sirikhachornkit, X. Su, J. Faris, B. Gill, R. Haselkorn, P. Gornicki, Genes / Triticum encoding plastid acetyl-CoA carboxylase and 3-phosphoglycerate kinase of the from the Comopyrum clade. either Ae. uniaristata Ae. comosa or Aegilops complex and the evolutionary history of polyploid wheat. Proc. Natl. Acad. Sci. U.S.A. 99 , 8133 8138 (2002). – 7. T. Marcussen, S. R. Sandve, L. Heier, M. Spannagl, M. Pfeifer; The International Wheat SUPPLEMENTARY MATERIALS Genome Sequencing Consortium, K. S. Jakobsen, B. B. H. Wulff, B. Steuernagel, Supplementary material for this article is available at http://advances.sciencemag.org/cgi/ K. F. X. Mayer, O.-A. Olsen, Ancient hybridizations among the ancestral genomes of bread content/full/5/5/eaav9188/DC1 345 Science wheat. , 1250092 (2014). Text S1. Jointly inferring phylogenetic relationships and hybridization events 8. J. Dvor ák, M.-C. Luo, Z.-L. Yang, Restriction fragment length polymorphism and ř Text S2. Determination of possible hybridization scenarios divergence in the genomic regions of high and low recombination in self-fertilizing and Text S3. Inferring multiple hybridizations in four-taxon scenarios (quartet method) – , 423 cross-fertilizing aegilops species. Genetics 148 434 (1998). Text S4. Application of the quartet method 9. K. Yamane, T. Kawahara, Intra- and interspecific phylogenetic relationships among Text S5. Pattern of hybridization indices along chromosomes Downloaded from species (Poaceae) based on base-pair substitutions, indels, and Triticum-Aegilops diploid Aegilops / Fig. S1. Geographic distribution of the 13 diploid Triticum species. Am. J. Bot. microsatellites in chloroplast noncoding sequences. 1898 (2005). – , 1887 92 Fig. S2.1. Distribution of hybridization indices for triplets composed of two individuals of the 10. G. Petersen, O. Seberg, M. Yde, K. Berthelsen, Phylogenetic relationships of and Triticum same focal species and a third individual from another species. Aegilops and evidence for the origin of the A, B, and D genomes of common wheat Fig. S2.2. Distribution of hybridization indices for triplets composed of two individuals from Triticum aestivum ,70 ( ). Mol. Phylogenet. Evol. 39 – 82 (2006). two sister species and a third one. 11. L.-F. Li, B. Liu, K. M. Olsen, J. F. Wendel, Multiple rounds of ancient and recent Fig. S2.3. Distribution of hybridization indices for triplets composed of two species belonging hybridizations have occurred within the Aegilops complex. New Phytol. 208 - , Triticum to sister clades within the D lineage and a third species. http://advances.sciencemag.org/ – 12 (2015). 11 Fig. S2.4. Sitopsis species as potential hybrids between species of the D clade and either 12. M. El Baidouri, F. Murat, M. Veyssiere, M. Molinier, R. Flores, L. Burlot, M. Alaux, or Ae. speltoides . Ae. mutica H. Quesneville, C. Pont, J. Salse, Reconciling the evolutionary origin of bread wheat Fig. S2.5. An example where successive hybridization events lead to apparent contradiction in (2017). – 1486 Triticum aestivum ( 213 New Phytol. , 1477 ). hybrid triplets. 13. V. Ranwez, A. Criscuolo, E. J. P. Douzery, SuperTriplets: A triplet-based supertree approach Fig. S3.1. The 10 informative site patterns with four species and mutations polarized with an 26 to phylogenomics. Bioinformatics i123 (2010). , i115 – outgroup. 14. A. Stamatakis, RAxML version 8: A tool for phylogenetic analysis and post-analysis of Fig. S3.2. Notations for symmetric and asymmetric coalescent gene trees with four sequences. 1313 (2014). – , 1312 30 large phylogenies. Bioinformatics Fig. S3.3. Decomposition of a hybridization network with two reticulations. 15. C. Solís-Lemus, P. Bastide, C. Ané, PhyloNetworks: A package for phylogenetic networks. Fig. S3.4. Example of scenario parameterization. Mol. Biol. Evol. 34 , 3292 – 3298 (2017). Fig. S3.5. Possible gene trees embedded in a four-taxon species tree. 16. C. Meng, L. S. Kubatko, Detecting hybrid speciation in the presence of incomplete Fig. S3.6. Possible gene trees embedded in a four-taxon network tree as drawn on fig. S3.3C. lineage sorting using gene tree incongruence: A model. ,35 75 Theor. Popul. Biol. 45 – Fig. S4.1. Tested scenarios for the origin of the D clade. (2009). Fig. S4.2. Tested scenarios for hybridization within the D clade. on May 5, 2019 17. P. D. Blischak, J. Chifman, A. D. Wolfe, L. S. Kubatko, HyDe: A Python package for Ae Fig. S5.1. Distribution of the mean hybridization index for (A, . mutica , and D) triplets 67 Syst. Biol. , 821 genome-scale hybridization detection. – 829 (2018). . mutica (proportion of Ae in D) across 10-Mb concatenations. 18. S. R. Sandve, T. Marcussen, K. Mayer, K. S. Jakobsen, L. Heier, B. Steuernagel, B. B. H. Wulff, Fig. S5.2. Distribution of the mean hybridization index and CI for (A, Ae. mutica , and D) triplets / Triticum O. A. Olsen, Chloroplast phylogeny of species is not incongruent Aegilops (proportion of Ae. mutica in D) along chromosomes. with an ancient homoploid hybrid origin of the ancestor of the bread wheat D-genome. Fig. S5.3. Simulation of the distribution of the hybridization index with mean size of genomic – 10 (2015). 208 New Phytol. ,9 blocs from 5 to 50 Mb. 19. M. C. Ungerer, S. J. E. Baird, J. Pan, L. H. Rieseberg, Rapid hybrid speciation in wild Fig. S5.4. Simulation of the distribution of the hybridization index with mean size of genomic Proc. Natl. Acad. Sci. U.S.A. sunflowers. 11762 (1998). – , 11757 95 blocs of 10 Mb and with 10 to 200 rearrangements per chromosome (500 Mb). 20. C. A. Buerkle, L. H. Rieseberg, The rate of genome stabilization in homoploid hybrid Table S1. List of accessions used in the study. , 266 62 Evolution species. – 275 (2008). Table S2. Number of contigs (size, >250 bp) per individual trancriptome. 21. P. Gornicki, H. Zhu, J. Wang, G. S. Challa, Z. Zhang, B. S. Gill, W. Li, The chloroplast view of Table S3.1. Expected numbers of site patterns for symmetric taxon trees. 204 714 (2014). – , 704 New Phytol. the evolution of polyploid wheat. Table S3.2. Expected numbers of site patterns for asymmetric taxon trees. 22. T. V. Danilova, A. R. Akhunova, E. D. Akhunov, B. Friebe, B. S. Gill, Major structural genomic Table S3.3. Expected numbers of site patterns for one-reticulation networks. (Triticeae). Aegilops markgrafii alterations can be associated with hybrid speciation in Table S4.1. Count numbers for the 10 informative site patterns with the different combinations 330 (2017). 92 , 317 – Plant J. of species from the A and D clade. 23. P. Y. Novikova, N. Hohmann, V. Nizhynska, T. Tsuchimatsu, J. Ali, G. Muir, A. Guggisberg, Table S4.2. AIC of the different scenarios with zero, one, or two reticulations. T. Paape, K. Schmid, O. M. Fedorenko, S. Holm, T. Säll, C. Schlötterer, K. Marhold, Table S4.3. ML estimates of the parameters for the best model (M 4) for the 10 combinations A. Widmer, J. Sese, K. K. Shimizu, D. Weigel, U. Krämer, M. A. Koch, M. Nordborg, of A and D species. Arabidopsis Sequencing of the genus identifies a complex history of nonbifurcating Table S4.4. Count numbers for the 10 informative site patterns with either or Ae. comosa 48 speciation and abundant trans-specific polymorphism. 1082 (2016). Nat. Genet. – , 1077 as a potential parent. Ae. uniaristata 24. J. B. Pease, D. C. Haak, M. W. Hahn, L. C. Moyle, Phylogenomics reveals three sources of Table S4.5. AIC of the different scenarios with zero, one, or two reticulations. , e1002379 (2016). adaptive variation during a rapid radiation. PLOS Biol. 14 Data file S1. This Mathematica notebook file contains two main parts: (i) the detailed 25. J. B. Pease, M. W. Hahn, Detection and polarization of introgression in a five-taxon derivation of the different equations presented in text S3 and (ii) the implementation of the phylogeny. Syst. Biol. 64 , 651 – 662 (2015). likelihood method presented in text S3 (quartet method) and its application to the dataset. E. Y. Durand, N. Patterson, D. Reich, M. Slatkin, Testing for ancient admixture between 26. 2252 (2011). – closely related populations. Mol. Biol. Evol. 28 , 2239 27. V. Ranwez, O. Gascuel, Quartet-based phylogenetic inference: Improvements and limits. REFERENCES AND NOTES Mol. Biol. Evol. 18 , 1103 – 1116 (2001). 1. L. Nakhleh, Computational approaches to species phylogeny inference and gene tree 28. E. Sayyari, S. Mirarab, Fast coalescent-based computation of local branch support from , 719 728 (2013). 28 Trends Ecol. Evol. reconciliation. – – 1668 (2016). 33 quartet frequencies. Mol. Biol. Evol. , 1654 Sci. Adv. ., 5 :eaav9188 1 May 2019 9of10 et al Glémin 2019;

10 | SCIENCE ADVANCES RESEARCH ARTICLE 43. J. Dutheil, S. Gaillard, E. Bazin, S. Glémin, V. Ranwez, N. Galtier, K. Belkhir, Bio++: A set . 1. Distribution of 29. E. D. Badaeva, B. Friebe, B. S. Gill, Genome differentiation in Aegilops of C++ libraries for sequence analysis, phylogenetics, molecular evolution and highly repetitive DNA sequences on chromosomes of diploid species. Genome 39 , 7 BMC Bioinformatics population genetics. , 188 (2006). – 306 (1996). 293 44. C. Scornavacca, V. Berry, V. Ranwez, Building species trees from larger parts of 30. G. Sarah, F. Homa, S. Pointet, S. Contreras, F. Sabot, B. Nabholz, S. Santoni, L. Sauné, , 590 Inf. Comput. phylogenomic databases. – 209 605 (2011). M. Ardisson, N. Chantret, C. Sauvage, J. Tregear, C. Jourda, D. Pot, Y. Vigouroux, H. Chair, 45. E. Paradis, J. Claude, K. Strimmer, APE: Analyses of Phylogenetics and Evolution in N. Scarcelli, C. Billot, N. Yahiaoui, R. Bacilieri, B. Khadari, M. Boccara, A. Barnaud, J.-P. Péros, – Bioinformatics , 289 20 290 (2004). R language. J.-P. Labouisse, J.-L. Pham, J. David, S. Glémin, M. Ruiz, A large set of 26 new reference 10.1101/ bioRxiv 46. R. R. Bouckaert, J. Heled, DensiTree 2: Seeing trees through the forest. transcriptomes dedicated to comparative population genomics in crops and wild relatives. 012401 (2014). Mol. Ecol. Resour. – 17 580 (2017). ,565 47. L. S. Kubatko, J. Chifman, An invariants-based method for efficient identification of hybrid 31. Y. Clément, G. Sarah, Y. Holtz, F. Homa, S. Pointet, S. Contreras, B. Nabholz, F. Sabot, species from large-scale genomic data. bioRxiv 10.1101/034348 (2015). L. Sauné, M. Ardisson, R. Bacilieri, G. Besnard, A. Berger, C. Cardi, F. De Bellis, O. Fouet, 48. D. H. Huson, R. Rupp, C. Scornavacca, Phylogenetic Networks: Concepts, Algorithms and C. Jourda, B. Khadari, C. Lanaud, T. Leroy, D. Pot, C. Sauvage, N. Scarcelli, J. Tregear, Applications (Cambridge Univ. Press, 2010). Y. Vigouroux, N. Yahiaoui, M. Ruiz, S. Santoni, J.-P. Labouisse, J.-L. Pham, J. David, 49. S. Zhu, J. H. Degnan, Displayed trees do not determine distinguishability under the S. Glémin, Evolutionary forces affecting synonymous variations in plant genomes. 298 (2017). network multispecies coalescent. – , 283 66 Syst. Biol. , e1006799 (2017). 13 PLOS Genet. 32. M. Martin, Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17 ,10 – 12 (2011). Acknowledgments: Analyses were performed on the computing cluster of the Montpellier 33. J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. M. Jones, İ . Birol, ABySS: A parallel Bioinformatics Biodiversity (MBB) platform, the South Green Bioinformatics platform, and – , 1117 1123 (2009). Genome Res. 19 assembler for short read sequence data. the UPPMAX platform in Uppsala. Funding: This work was supported by the French National 9 Genome Res. 34. X. Huang, A. Madan, CAP3: A DNA sequence assembly program. 877 – , 868 Author contributions: Research Agency (ANR-11-BSV7-013-03). Conception: J. Dav., S.G., (1999). and V.R.; funding acquisition: J. Dav. and S.G.; biological data acquisition and management: 35. Y. Ye, J.-H. Choi, H. Tang, RAPSearch: A fast protein similarity search tool for short reads. C.B. and V.V.; sequence data acquisition: G.S., M.A., and S.S.; method development: S.G.; Downloaded from 12 , 159 (2011). BMC Bioinformatics data analysis: C.S., J. Dai., S.G., and V.R.; interpretation of the results: C.B., C.S., J. Dav., S.G., 36. J. D. Wasmuth, M. L. Blaxter, prot4EST: Translating expressed sequence tags from and V.R.; drafting of the manuscript: S.G.; reviewing and editing of the manuscript: C.B., C.S., 5 , 187 (2004). BMC Bioinformatics neglected genomes. J. Dav., S.G., S.S., and V.R. The authors declare that they have no Competing interests: 37. E. J. P. Douzery, C. Scornavacca, J. Romiguier, K. Belkhir, N. Galtier, F. Delsuc, V. Ranwez, competing interests. Data and materials availability: All data needed to evaluate the OrthoMaM v8: A database of orthologous exons and coding sequences for comparative conclusions in the paper are present in the paper and/or the Supplementary Materials. – genomics in mammals. Mol. Biol. Evol. 31 , 1923 1928 (2014). Sequence alignments and phylogenetic trees generated and used to produce the results are Bioinformatics 38. R. C. Edgar, Search and clustering orders of magnitude faster than BLAST. available at https://bioweb.supagro.inra.fr/WheatRelativeHistory/. Additional data related to http://advances.sciencemag.org/ – 2461 (2010). , 2460 26 this paper may be requested from the authors. 39. R. C. Edgar, MUSCLE: A multiple sequence alignment method with reduced time and 5 BMC Bioinformatics space complexity. , 113 (2004). Submitted 31 October 2018 40. V. Ranwez, S. Harispe, F. Delsuc, E. J. P. Douzery, MACSE: Multiple Alignment of Coding Accepted 20 March 2019 6 PLOS ONE SEquences accounting for frameshifts and stop codons. , e22594 (2011). Published 1 May 2019 41. H. Philippe, D. M. de Vienne, V. Ranwez, B. Roure, D. Baurain, F. Delsuc, Pitfalls in 10.1126/sciadv.aav9188 Eur. J. Taxon. 25 (2017). – ,1 283 supermatrix phylogenomics. 42. L. Gueguen, S. Gaillard, B. Boussau, M. Gouy, M. Groussin, N. C. Rochette, T. Bigot, Citation: S. Glémin, C. Scornavacca, J. Dainat, C. Burgarella, V. Viader, M. Ardisson, G. Sarah, D. Fournier, F. Pouyet, V. Cahais, A. Bernard, C. Scornavacca, B. Nabholz, A. Haudry, Sci. Adv. S. Santoni, J. David, V. Ranwez, Pervasive hybridizations in the history of wheat relatives. L. Dachary, N. Galtier, K. Belkhir, J. Y. Dutheil, Bio++: Efficient extensible libraries and tools , eaav9188 (2019). 5 – for computational molecular evolution. Mol. Biol. Evol. 30 , 1745 1750 (2013). on May 5, 2019 et al Sci. Adv. 2019; 5 :eaav9188 1 May 2019 10 of 10 Glémin .,

11 Pervasive hybridizations in the history of wheat relatives Sylvain Glémin, Celine Scornavacca, Jacques Dainat, Concetta Burgarella, Véronique Viader, Morgane Ardisson, Gautier Sarah, Sylvain Santoni, Jacques David and Vincent Ranwez (5), eaav9188. 5 Sci Adv DOI: 10.1126/sciadv.aav9188 Downloaded from ARTICLE TOOLS http://advances.sciencemag.org/content/5/5/eaav9188 SUPPLEMENTARY http://advances.sciencemag.org/content/suppl/2019/04/29/5.5.eaav9188.DC1 MATERIALS http://advances.sciencemag.org/ REFERENCES This article cites 46 articles, 7 of which you can access for free http://advances.sciencemag.org/content/5/5/eaav9188#BIBL PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions on May 5, 2019 Use of this article is subject to the Terms of Service (ISSN 2375-2548) is published by the American Association for the Advancement of Science, 1200 New Science Advances York Avenue NW, Washington, DC 20005. 2017 © The Authors, some rights reserved; exclusive licensee American Science Advances is a Association for the Advancement of Science. No claim to original U.S. Government Works. The title registered trademark of AAAS.

Related documents

Science Journals — AAAS

Science Journals — AAAS

| SCIENCE ADVANCES RESEARCH ARTICLE AGRICULTURE Copyright © 2019 The Authors, some rights reserved; Pervasive hybridizations in the history exclusive licensee American Association of wheat relatives f...

More info »
Science Journals — AAAS

Science Journals — AAAS

| SCIENCE ADVANCES RESEARCH ARTICLE AGRICULTURE Copyright © 2019 The Authors, some rights reserved; Pervasive hybridizations in the history exclusive licensee American Association of wheat relatives f...

More info »
CityNT2019TentRoll 1

CityNT2019TentRoll 1

STATE OF NEW YORK 2 0 1 9 T E N T A T I V E A S S E S S M E N T R O L L PAGE 1 VALUATION DATE-JUL 01, 2018 COUNTY - Niagara T A X A B L E SECTION OF THE ROLL - 1 CITY - North Tonawanda TAX MAP NUMBER ...

More info »
Out of Reach 2018

Out of Reach 2018

2018 of OUT REACH THE HIGH COST OF HOUSING MADE POSSIBLE BY THE GENEROSITY OF:

More info »
Out of Reach 2016

Out of Reach 2016

No Refuge for Low Income Renters MADE POSSIBLE BY THE GENEROSITY OF:

More info »
DER Directory

DER Directory

FAA CONSULTANT DER DIRECTORY May 9, 2019 AIR-6F0, Delegation & Organizational Procedures Branch This directory is generated from information in the FAA Designee Information Network (DIN). If you are a...

More info »
time is the enemy

time is the enemy

is TIME the ENEMY The surprising truth about why today’s college students graduating ... and wha T needs To change aren’t n 1 Time Is the Enemy

More info »
50 Year Trends expanded version

50 Year Trends expanded version

Expanded Edition THE AMERICAN HERI A R CIRP T B I E N L G E C 50 F S I F R T A Y E Y FRESHMAN FIFTY-YEAR TRENDS | 1966-2015 KEVIN EAGAN | ELLEN BARA STOLZENBERG | JOSEPH J. RAMIREZ | MELISSA C. ARAGON...

More info »
Implementation Handbook For The Convention On The Rights Of The Child

Implementation Handbook For The Convention On The Rights Of The Child

IMPLEMENTATION HANDBOOK FOR THE CONVENTION ON THE RIGHTS OF THE CHILD FULLY REVISED THIRD EDITION IMPLEMENTATION HANDBOOK IMPLEMENTATION HANDBOOK FOR THE CONVENTION ON THE FOR THE CONVENTION ON THE RI...

More info »
Abstract and Concrete Categories   The Joy of Cats

Abstract and Concrete Categories The Joy of Cats

Jiˇr ́ı Ad ́amek Horst Herrlich George E. Strecker Abstract and Concrete Categories The Joy of Cats Dedicated to Bernhard Banaschewski

More info »
IWMF Global Report

IWMF Global Report

Global Report on the Status of Women in the News Media INTERNATIONAL WOMEN’S MEDIA FOUNDATION

More info »
What's It Worth? The Economic Value of College Majors

What's It Worth? The Economic Value of College Majors

Anthony P. Carnevale Jeff Strohl Michelle Melton

More info »
S:\FULLCO~1\HEARIN~1\Committee print 2018\Henry\Jan. 9 report

S:\FULLCO~1\HEARIN~1\Committee print 2018\Henry\Jan. 9 report

1 C . RT S. P 115 TH ONGRESS " ! COMMITTEE PRINT 2d Session 115–21 PUTIN’S ASYMMETRIC ASSAULT ON DEMOCRACY IN RUSSIA AND EUROPE: IMPLICATIONS FOR U.S. NATIONAL SECURITY A MINORITY STAFF REPORT PREPARE...

More info »
AndersBehringBreivikManifesto

AndersBehringBreivikManifesto

2011 , London – By Andrew Berwick

More info »
DoD7045.7H

DoD7045.7H

DoD 7045.7-H EPARTMENT OF D EFENSE D F UTURE Y EARS D EFENSE P ROGRAM (FYDP) S TRUCTURE Codes and Definitions for All DoD Components Office of the Director, Program Analysis and Evaluation A pril 2004

More info »
Web Tables—Profile of Undergraduate Students: 2011–12

Web Tables—Profile of Undergraduate Students: 2011–12

Profile of WEB Undergraduate 12 Students: 2011– TABLES EDUCATION OF DEPARTMENT U.S. NCES 2015-1 OCTOBER 2014 67 These Web Tables provide comprehen- distance from home; and participation carried a bala...

More info »
Microsoft Word   cust subm Book 251 Cover.doc

Microsoft Word cust subm Book 251 Cover.doc

Summary of Reciprocity Agreements Between Texas and Other Jurisdictions Vehicle Titles and Registration Division Austin, Texas Issued November 2002

More info »
summer schedule full

summer schedule full

Summer Semester 2019 Class Schedule May 9, 2019 as of Subj Crse CRN Course Type Title Days Time Bldg/Rm Cred Instructor Course Fee Start Date - End Date Campus Part Term BB 321 Financial Accounting I ...

More info »
An Introduction to Computer Networks

An Introduction to Computer Networks

An Introduction to Computer Networks Release 1.9.18 Peter L Dordal Mar 31, 2019

More info »