Unnoticed in the tropics: phylogenomic resolution of the poorly known arachnid order Ricinulei (Arachnida)

Ricinulei are among the most obscure and cryptic arachnid orders, constituting a micro-diverse group with extreme endemism. The 76 extant species described to date are grouped in three genera: Ricinoides, from tropical Western and Central Africa, and the two Neotropical genera Cryptocellus and Pseudocellus. Until now, a single molecular phylogeny of Ricinulei has been published, recovering the African Ricinoides as the sister group of the American Pseudocellus and providing evidence for the diversification of the order pre-dating the fragmentation of Gondwana. Here, we present, to our knowledge, the first phylogenomic study of this neglected arachnid order based on data from five transcriptomes obtained from the five major mitochondrial lineages of Ricinulei. Our results, based on up to more than 2000 genes, strongly support a clade containing Pseudocellus and Cryptocellus, constituting the American group of Ricinulei, with the African Ricinoides nesting outside. Our dating of the diversification of the African and American clades using a 76 gene data matrix with 90% gene occupancy indicates that this arachnid lineage was distributed in the South American, North American and African plates of Gondwana and that its diversification is concordant with a biogeographic scenario (both for pattern and tempo) of Gondwanan vicariance.


Introduction
Ricinulei (originally known as Cryptostemmatoidae [1]) are among the most obscure and cryptic of the arachnid orders. They are characterized by having in the anterior region of the prosoma a hinged plate, the cucullus, that acts as a hood covering the mouthparts, by a locking mechanism between the prosoma and the opisthosoma (a trait shared with trigonotarbids, an extinct lineage) that can be uncoupled during mating and egglaying, and by a modified third leg in males for sperm transfer,  among other characters. A total of 76 living Ricinulei species are currently accepted [2,3] in three genera: Ricinoides Ewing, 1929 from tropical West Africa (from The Gambia to Gabon), Cryptocellus Westwood, 1874 from tropical South America and Central America (Guyana to Peru to Honduras), and Pseudocellus Platnick, 1980 from North and Central America (southern USA to Panama) [4] (figure 1). Despite abundant recent taxonomic work (e.g. [2,3,[5][6][7][8][9][10]), and some phylogenetic and biogeographic studies [11], Ricinulei remains an obscure group, as it was in 1964 when Savory [12] stated that 'the discovery of each new specimen is still something of a zoological triumph'. Seventy-six years ago, Gertsch et al. [13] found the first North American Ricinulei and reported that only ca 30 specimens were known for the Americas at the time. Ricinulei have remained a neglected and undersampled group of arthropods until the present, and only a few species are known from more than a handful of specimens. In Cryptocellus, three species are still only known by males, six by females only and two only by nymphs [5,6,8,[14][15][16][17][18][19][20][21].
With an important fossil record dating back to the Carboniferous [22,23], the phylogenetic position of Ricinulei remains contentious [24]. While virtually all studies have recovered the monophyly of Euchelicerata (=Xiphosura + Arachnida), the monophyly of Arachnida is more controversial and the position of Ricinulei is still unclear, having been recovered as sister group to Acariformes + Parasitiformes [25], Parasitiformes [26], Solifugae [27] or Xiphosura (the later two hypothesis recovered in the same study but with different gene matrices [24]), or recovered as a basal group of Arachnida, excluding Acariformes [28].
As for the phylogenetic relationships within Ricinulei, their internal relationships are virtually restricted to a recent study focusing on the African species belonging to the genus Ricinoides [11]. Murienne et al. [11] explored the evolutionary relationships between the three currently recognized genera, finding that the African Ricinoides was sister group to the American Pseudocellus, therefore

Material and methods
Seventy-nine Ricinulei specimens belonging to the three described genera were collected by sifting leaf litter, or with a Winkler apparatus (table 1). Newly sequenced specimens were collected under permit no. 17 from ARAP (Panama, 27 February 2013), no. 032 from Ministry of Scientific Research and Innovation (Republic of Cameroon, 11 March 2009) and no. 369419 from IBAMA (Brazil, 5 June 2012). We sequenced the mitochondrial marker cytochrome c oxidase subunit I (COI) to check the main mitochondrial groups in order to direct transcriptome sequencing efforts, as preliminary results suggested the existence of a high genetic variability within two of the three genera (table 1). Total DNA was extracted from a single leg of each animal using Qiagen's DNEasy tissue kit. The COI gene was sequenced as described in Murienne et al. [11]. The sequence-editing software GENEIOUS v. 6.1.3 [31] was used to read the chromatograms obtained from the automatic sequencer, to assemble both strands for each overlapping fragment and to edit the sequence data. Although alignment was trivial, sequences were aligned in MUSCLE through the online server of EMBL-EBI [32].
Uncorrected p-distances between each specimen were calculated and plotted in a heatmap, and maximum-likelihood (ML) and Bayesian inference (BI) phylogenetic hypotheses were generated with RAXML v. 8.0.24 [33] and MRBAYES v. 3.2.3 [34] as implemented in the CIPRES Science Gateway [35]. These analyses highlighted five mitochondrial clades: Pseudocellus specimens formed a single clade with less genetic variability than Cryptocellus or Ricinoides, while the other two genera were subdivided into two clades each, exhibiting high genetic variability (see Results and discussion).
Based on these analyses, five Ricinulei specimens representing the three currently recognized genera and the phylogenetic span of the two more diverse genera (Cryptocellus becki, Cryptocellus sp. nov., Pseudocellus pearsei, Ricinoides atewa and Ricinoides karschii) were selected for transcriptomic analysis. The transcriptomes of P. pearsei and R. atewa were recently published by our laboratory [24]. Additional arachnid transcriptomes were used as outgroups [24,36] (see Data accessibility; table 2). Note that Cryptocellus sp. nov. was collected twice and therefore appears with a different MCZ catalogue numbers in the COI tree (IZ-128904) and the phylogenomic tree (IZ-30913), but they correspond to the same species. Further details can be found in MCZbase, the database of the Museum of Comparative Zoology (http://mcz.harvard.edu/collections/searchcollections.html).
Total RNA was extracted with a standard trizol-based method using TRIzol (Life Sciences). After total RNA precipitation, mRNA purification was done with the Dynabeads mRNA Purification Kit (Invitrogen) following manufacturer's instructions. Quality of mRNA was assessed with a pico RNA assay in an Agilent 2100 Bioanalyzer (Agilent Technologies), and quantity was measured with a RNA assay in a Qubit fluorometer (Life Technologies). cDNA libraries were constructed in the Apollo 324 automated system using the PrepX mRNA kit (Wafergen). Concentration of the cDNA libraries was measured through a dsDNA high-sensitivity (HS) assay in a Qubit fluorometer (Invitrogen). Library quality and size selection were checked in an Agilent 2100 Bioanalyzer (Agilent Technologies) with the HS DNA assay. All samples were sequenced in an Illumina HiSeq 2500 platform with paired-end reads of 150 bp at the FAS Center for Systems Biology, Harvard University.
Demultiplexed Illumina HiSeq 2500 sequencing results, in FASTQ format, were retrieved, each sample being quality-filtered according to a threshold average quality score of 30 based on a Phred scale and adaptor trimmed using TRIMGALORE! 0.3.3 [37]. Ribosomal RNA and mitochondrial DNA were filtered out via BOWTIE V. 1.0.0 [38]. Strand specific de novo assemblies were done individually in TRINITY [39] using paired read files, a strand specificity flag and path reinforcement distance enforced to 75. Raw reads have been deposited in the National Center for Biotechnology Information Sequence Read Archive database (table 2). Redundancy reduction was done with CD-HIT-EST [40] in the raw assemblies (95% global similarity). Resulting assemblies were processed in TRANSDECODER [39] to identify candidate open-reading frames (ORFs) within the transcripts. In order to remove the variation in the coding regions of our assemblies due to alternative splicing, closely related paralogs and allelic diversity, predicted peptides were then processed with a further filter to select only one peptide per putative unigene, by     choosing the longest ORF per TRINITY subcomponent with a Python script. Peptide sequences with all final candidate ORFs were retained as multifasta files. We assigned predicted ORFs into orthologous groups across all samples using OMA stand-alone v0.99y (orthologous matrix [41]). All-by-all local alignments were parallelized across 100 cores of a single compute node, implementing a custom Bash script allowing for execution of independent threads with at least 3 s between each instance of OMA to minimize risk of collisions. Further details and protocols are described elsewhere [36]. Three different amino acid supermatrices were constructed. First, a large matrix was obtained by concatenating the set of orthogroups containing eight or more taxa, yielding a supermatrix with 2177 genes (supermatrix 1: 50% gene occupancy; 568 293 amino acids). To increase gene occupancy and to reduce the percentage of missing data, a second matrix was created by selecting the orthologues contained in 13 or more taxa (supermatrix 2: 476 genes; 75% gene occupancy, 98 933 amino acids), and a third matrix was built choosing the orthologues present in 16 or more taxa (supermatrix 3: 76 genes; 90% gene occupancy; 12 919 amino acids). ML inference was conducted with PhyML-PCMA (supermatrices 2 and 3) [42] and PhyML implementing the integrated branch length option (supermatrix 3) [43]. Bootstrap support values were based on 100 replicates. We selected 20 PCs in the PhyML-PCMA analyses and empirical amino acid frequencies. Bayesian analysis was conducted with EXABAYES [44] (two runs, three independent Markov chain Monte Carlo, MCMC chains per run) in the three supermatrices. A 50% majority-rule consensus tree was computed from the combined remaining trees from the independent runs. For practical reasons and due to the similar results obtained for the different phylogenetic analysis (see the Results and discussion), in the big supermatrix only a ML analysis was explored (PhyML-PCMA).

Ricinoides feae
To discern whether compositional heterogeneity among taxa and/or within each individual orthologue alignment was affecting phylogenetic results, we further analysed supermatrices 2 and 3 (76 and 476 genes) in BACOCA v. 1.1 [45]. The relative composition frequency variability (RCFV) values (that measures the absolute deviation from the mean for each amino acid for each taxon) was plotted in a heatmap using the R package gplots with an R script modified from [45].
To investigate potential incongruence between individual gene trees, best-scoring ML trees were inferred for each gene included in each supermatrix under the PROTGAMMALG4 with RAXML v. 8.0.1 [33]. Gene trees were decomposed into quartettes with SUPERQ v. 1.1 [46] and a supernetwork assigning edge lengths based on quartette frequencies was inferred selecting the 'balanced' edge-weight optimization function, applying no filter; the supernetworks were visualized in SPLITSTREE v. 4.13.1 [47].  A key aspect of ricinuleid systematics is their tempo of evolution and whether it is consistent with a biogeographic scenario of Gondwanan vicariance, so we used the 76 gene dataset for dating. The fossil record of Ricinulei is impressive considering the current low diversity and restricted distribution, confined to the tropical regions of both sides of the Atlantic. Selden [23] revised the fossil ricinuleids and erected the clade Palaeoricinulei for the extinct species, limiting Neoricinulei to the extant ones. At the time, Palaeoricinulei included several Carboniferous species, the oldest being Curculioides adompha, from rocks of the upper Namurian B stage of the Ruhr area, Germany, while the remaining species were Westphalian in age, from the USA and the UK [23]. Subsequently, a species from fossiliferous Cretaceous    Figure 2. (a) Top, maximum-likelihood tree of the COI sequence data (supermatrix 1). Black dots indicate a bootstrap support value higher than 90% (see the electronic supplementary material, figure S1, for further details). Bottom, heatmap of genetic distances between the main lineages of Ricinulei. Ricinulei taxa colour-coded as in (b). (b) Phylogenomic hypothesis of the evolutionary relationships of Ricinulei. Node support for the different analyses is indicated in each case, as described in the legend. Grey stars indicate fossil calibration points. (c) Palaeogeographical reconstruction according to Seton et al. [56] at the maximum and minimum ages of the split of the Ricinulei main lineages, as recovered by the molecular dating analysis. Colour bar indicates the age of oceanic lithosphere. The distribution area of the three genera described to date is shown.
of Cryptocellus and Pseudocellus indicates a possible vicariant model of cladogenesis between these two genera, the former predominantly South American, the latter predominantly Caribbean, Meso-American and North American. Future studies should determine the age of the diversification of Pseudocellus and its potential for understanding the palaeogeography of the Caribbean region [59]. Ricinulei constitute a poorly studied arachnid order which once had a broader distribution, including species in southeast Asia [30], but is now restricted to the tropical regions of West Africa and the Americas. Our data however show that this arachnid order has persisted largely unchanged for over 100 Myr, with a conservative phylogenetic pattern able to trace not only old continental movements, but also preserving regional information about the persistence of forests through time [11]. Similar patterns of vicariant diversification are common in other soil-dwelling and saproxylic animal groups originating in Gondwana, including velvet worms [60], centipedes [61] and caecilians [62]. Ricinulei is thus more than just another obscure animal group, and should be studied as a relictual arachnid order with the potential of providing a modern explanation to recalcitrant questions such as ancient Caribbean biogeography.