Recent studies have reported multiple cases of molecular adaptation in cetaceans related to their aquatic abilities. However, none of these has included the hippopotamus, precluding an understanding of whether molecular adaptations in cetaceans occurred before or after they split from their semi-aquatic sister taxa. Here, we obtained new transcriptomes from the hippopotamus and humpback whale, and analysed these together with available data from eight other cetaceans. We identified more than 11 000 orthologous genes and compiled a genome-wide dataset of 6845 coding DNA sequences among 23 mammals, to our knowledge the largest phylogenomic dataset to date for cetaceans. We found positive selection in nine genes on the branch leading to the common ancestor of hippopotamus and whales, and 461 genes in cetaceans compared to 64 in hippopotamus. Functional annotation revealed adaptations in diverse processes, including lipid metabolism, hypoxia, muscle and brain function. By combining these findings with data on protein–protein interactions, we found evidence suggesting clustering among gene products relating to nervous and muscular systems in cetaceans. We found little support for shared ancestral adaptations in the two taxa; most molecular adaptations in extant cetaceans occurred after their split with hippopotamids.
Cetaceans are arguably the most specialized of all mammals, having evolved from a terrestrial ancestor to occupy an obligate aquatic niche . Modern cetaceans show numerous phenotypic adaptations for life in the water; aside from the radical reorganization of their forelimbs into fins and loss of hindlimbs, they are able to dive and tolerate low oxygen, and possess modified circulatory and respiratory systems, large brains, hairlessness and transformations in sensory perception . Some cetaceans show extreme longevity, as well as resistance to cancer, wound healing and insulin resistance [2,3]. Other major adaptations pertain to feeding ecology; indeed, modern cetaceans diverged approximately 34 Ma [4,5] into the toothed whales (suborder Odontoceti), which evolved echolocation to hunt using ultrasonic pulses and possess a highly specialized inner ear, and the baleen whales (suborder Mysticeti), which lost their teeth and instead evolved a novel keratinous material for filtering smaller prey .
Molecular evidence has revealed that the closest living relatives of the cetaceans are the two extant members of the family Hippopotamidae: the common and pygmy hippopotamus (‘hippo’) (e.g. [6,7]). Members of the Cetacea and Hippopotamidae are grouped together in the monophyletic clade Whippomorpha , which in turn is nested within the otherwise terrestrial mammalian order Cetartiodactyla that also includes the even-toed ungulates . Recent molecular evidence suggests that the Whippomorpha diverged from other cetartiodactyls approximately 59 Ma and that the cetaceans and hippopotamids split approximately 55 Ma .
Both extant hippo species are adapted for spending long periods of time in water . Like some early cetaceans, they can walk on the bottom of bodies of freshwater due to their thick, dense (pachyosteosclerotic) limb bones , and they are predominantly hairless, with thickened, lipid-rich skin that lacks sebaceous glands [12,13]. Hippo skin contains subepidermal capillaries with thickened walls to withstand high blood pressure, an adaptation for heat exchange that has also been reported in highly active species, such as cetaceans . Hippos and cetaceans have some behavioural traits in common such as nursing underwater and subaquatic communication [1,14]. Furthermore, fossil cetaceans and hippos both possess a hyperinflated tegmen tympani of the petrosal bone, which may aid in interpreting the directionality of hearing . Yet despite some shared specializations for life in the water, it is currently unclear to what extent cetaceans and hippos evolved these adaptations independently or whether they are ancestral traits, although some fossil evidence suggests the former [1,10,16]. Indeed, the question of whether the last common ancestor of the Whippomorpha exhibited a terrestrial, semi-aquatic or aquatic lifestyle remains unresolved, and there is particular interest in determining when in their evolutionary history cetaceans gained their specialized traits for living in water [1,10].
Genome-wide scans of selection can offer powerful insights into the evolutionary history of adaptive traits (e.g. ). However, all genome-scale studies of molecular adaptation in the Whippomorpha to date have been restricted to a few cetaceans [18–24]. Similarly, inferences of selection associated with the evolutionary transition from land to water have relied heavily on data from toothed whales, used in comparative evolutionary analyses together with the cow (Bos taurus) and other more distantly related terrestrial mammals [21,22,24].
To gain a better understanding of the timing and role of natural selection in the transition of cetartiodactyl mammals to a semi-aquatic/aquatic environment, we generated transcriptome data from the common hippopotamus (referred to as ‘hippo’ below) as well as from the humpback whale Megaptera novaeangliae. By analysing these together with existing data from two mysticetes and six odonotocetes, we conducted, to our knowledge, the most comprehensive genome-scale dataset of the group to date [18,23–26]. We reasoned that if adaptation to a semi-aquatic environment preceded the split between hippos and whales, then we would expect to see signatures of positive selection in multiple genes linked to an aquatic lifestyle on the ancestral branch of Whippomorpha. If, on the other hand, adaptation to a more aquatic way of life followed the split between these groups, then we might expect independent changes on each lineage. Under both scenarios, we also predicted a greater molecular signature of aquatic adaptation in cetaceans than in hippos, reflecting the more derived body plan in the former.
2. Material and methods
2.1 Taxon sampling, sequencing and RNA sequencing de novo assembly
New RNA sequencing data for the common hippo Hippopotamus amphibius and humpback whale M. novaeangliae were generated by pair-end Illumina HiSeq sequencing at BGI (electronic supplementary material, table S1), and combined with published sequence data from genomes or transcriptomes of eight other cetacean species: sperm whale (Physeter macrocephalus), Indo-Pacific humpback dolphin (Sousa chinensis), minke whale (Balaenoptera acutorostrata), fin whale (Balaenoptera physalus), finless porpoise (Neophocaena phocaenoides), bottlenose dolphin (Tursiops truncatus), killer whale (Orcinus orca) and Yangtze River dolphin (Lipotes vexillifer) [18,23–26].
2.2 Orthologue identification and dataset assembly
We obtained orthologous coding DNA sequences (CDSs) across cetaceans and the hippo using reciprocal blastx and tblastn searches, with T. truncatus and human as references. Orthologous sequences of 13 other laurasiatherian mammals were obtained from Ensembl . CDS were aligned using PRANK v. 130820  and filtered based on Guidance default parameters . Sequences were further edited and trimmed to avoid problems with missing data and erroneous indels (electronic supplementary material, methods).
2.3 Natural selection analyses
To identify episodes of positive selection, we used codon models in codeml of PAML v. 4.4 . We first implemented branch-site model MA to identify sites under selection [30,31] on five focal branches: (i) Whippomorpha (Hippopotamidae + Cetacea); (ii) Cetacea; (iii) Mysticeti; (iv) Odontoceti; and (v) the terminal branch of H. amphibius (figure 1). Each branch-site model was compared to a null model using the likelihood ratio test (LRT) with 1 d.f., and sites with Bayes Empirical Bayes posterior probabilities of more than 0.50 were considered significant. To ensure that estimated positive ω-values represented genuine selection acting on genes, rather than alignment errors, we filtered out genes in which positively selected sites (PSSs) were found to be highly aggregated within the CDS alignment (median interval distance of PSSs ≤10 codons). The exact numbers of datasets remained in each clade or branch after filtering are given in table 1. Following this step, the associated p-values from the LRTs were corrected for multiple testing according to the Benjamini & Hochberg's  procedure that corrects the false discovery rate (FDR; Q-value) to Q<0.10. For further analyses using codon models, see the electronic supplementary material, methods, results and discussion, and table S3.
2.4 Network analysis of protein–protein interactions
For genes under positive selection, we built protein–protein interaction networks using Igraph v. 0.7.0 . We interrogated the STRING database  with the human and dolphin Ensembl gene identifiers for combined interaction scores. To visualize proteins with common functions, we obtained gene ontology (GO) terms under the biological processes domain using topGO . Corresponding GO term names were exported from Ensembl and AmiGO 2 v. 2.1.4, and GO terms were grouped into functional categories that we predicted to be important during the evolution of cetaceans and hippos (electronic supplementary material, methods). These grouped GO terms were then mapped onto each network.
2.5 Test of functional enrichment
For all genes under selection in our focal taxa, regardless of protein–protein interactions, we also tested for functional enrichment using the topGO package  (electronic supplementary material, methods, and results and discussion).
3. Results and discussion
3.1 RNA sequencing, assembly and orthologue identification
The number of candidate one-to-one orthologues across eight whippomorph species ranged from 6249 to 16 047 genes (electronic supplementary material, table S2). We combined annotated CDSs from bottlenose dolphin- and human-anchored blast searches (more than or equal to 50% coverage of the full gene length in the human genome), yielding 9267 and 14 234 one-to-one orthologues, respectively (electronic supplementary material, table S2). By sampling these sequences across additional laurasiatherian mammals, we built 11 925 gene alignments, each of which contained at least one member of each of the two extant cetacean suborders (Mysticeti and Odontoceti). Moreover, over half of these (n=6845) also contained the hippo sequence.
3.2 Scans for signatures of natural selection
To identify loci under positive selection at a genomic scale, we used branch-site codon models to estimate the ratio of non-synonymous to synonymous substitution rates (dN/dS) or omega (ω) on the ancestral branches of Whippomorpha, Cetacea, Mysticeti and Odontoceti, as well as the terminal hippo branch ((i), (ii), (iii), (iv) and (v), respectively; figure 1). Of 6845 genes containing the hippo sequence, we found 106 genes showing PSSs on the ancestral branch of Whippomorpha, as well as 201 genes on the hippo terminal branch (table 1). Across all 11 925 genes tested, signatures of molecular adaptation were detected in 391 genes on the ancestral branch of Cetacea, compared to 439 and 335 genes on the ancestral branches of Mysticeti and Odontoceti, respectively (table 1). Because signals of positive selection can sometimes arise from alignment errors, for each gene we inspected the distribution of sites with high omega values (ω>1) and filtered out genes (n=3323) in which such signals were highly aggregated (see Material and methods; table 1; electronic supplementary material, table S3). As a result, the numbers of genes retained as under positive selection decreased to 43 for the ancestral branch of Whippomorpha, and from 201 to 64 for the hippo branch (table 1; electronic supplementary material, table S5). Similarly, excluding genes with highly aggregated PSSs in the three ancestral cetacean branches, we retained 200 genes in the last common ancestor of Cetacea, compared to 173 and 125 genes on the ancestral branches of Mysticeti and Odontoceti, respectively (table 1; electronic supplementary material, table S5). Finally, we applied the FDR; Q<0.10) to correct for multiple testing, reducing further the numbers of genes to between one and 15 genes (table 1), although these numbers are likely to be underestimates given our large sample sizes and strict filtering regime.
3.3 Functional annotation of genes under selection
To compare whether sets of genes under selection in the hippo versus cetaceans show broad differences in both functional role and degree of interactions, we plotted interactions among protein products of genes found to be under positive selection (figure 2; electronic supplementary material, methods and figure S1). The hippo network consisted of 20 proteins with at least one interaction, and a key cluster was centred around serum albumin, ALB (see insets figure 2). In comparison, the network constructed for cetaceans comprised more proteins with a greater degree of connectedness; 105 proteins were connected to at least one other protein (figure 2), while several (e.g. GAPDH, EP300, CDH1) had high numbers of connections suggesting that they are important ‘hubs’. The network with the greatest concentration of connections centred around GMPS, a protein involved in guanine synthesis. By mapping GO terms onto these networks, we identified several instances in both taxa in which proteins associated with common tissue types and/or functions were clustered together. For example, in the hippo network, the linked proteins HMGCR and ALB are both associated with circulation (figure 2a), whereas in the cetacean network neighbouring proteins with functions related to the nervous system were centred around the hub of CDH1 (figure 2b). We also found clustering among proteins involved in other functions; for example, related to cell cycle and ageing in cetaceans (electronic supplementary material, figure S1A), and to lipids in both networks (electronic supplementary material, figure S1B). With one exception, proteins related to hypoxia and DNA repair occurred only in the cetacean network (RAD52, ERCC5 and SMC6), although clustering was limited (electronic supplementary material, figure S1C). No clustering was seen in proteins related to fluid, kidneys, lungs or sensory perception in either network (electronic supplementary material, figure S1D–G). Additional evidence of GO enrichment was restricted to cetaceans, and affected genes involved in brain, blood clotting and sensory perception (electronic supplementary material, results and discussion, tables S7–S9).
3.4 Molecular adaptations in the hippo and whippomorph ancestors
Among the 64 genes that were found to have undergone positive selection along the hippo branch (electronic supplementary material, table S5), we found several associated with lipid metabolism, including those with key roles in the biosynthesis and absorption of cholesterol (i.e. HMGCR, CYP2J2 and CYP8B1; [36–38]) as well as genes linked to metabolic disorders and/or obesity, including CPXM1 and PON3[39–41]. Other genes seen to be under positive selection in the hippo branch alone are known to function in glucose regulation (e.g. PDK4; [42–44]). Interestingly, PDK4 has also been found to be under selection in another aquatic mammal, the walrus . Another gene with a related metabolic function is AGL, involved in glycogen degradation [45,46]. Genes potentially linked to obesity may be related to the hippo's comparatively large size and capacity for fat storage, although the physiology of hippos in general remains little explored . Other genes showing molecular adaptation in the hippo encode proteins that might relate to the unusual demands placed on its circulatory system; notably albumin, a constituent of blood plasma that helps regulate osmotic pressure . Indeed, aside from the need to rapidly cool their skin, hippos appear to experience several circulatory changes during dives, including bradycardia (low heart rate), while at the same time maintaining their arterial blood pressure . We also found positive selection in PER1, which encodes an essential component of the circadian clock , as well as genes associated with muscle function (e.g. CKMT2).
To identify genes important in the early evolution of the Whippomorpha, we examined the ancestral branch and recovered 43 genes under positive selection (electronic supplementary material, tables S5–S6). Of these, we were able to verify PSSs in nine genes for both hippo and cetaceans, as in the rest of the genes hippo sequences contained missing data for amino acid sites identified as being under positive selection. Positively selected genes included CPT1A, a gene associated with type II diabetes and involved in fatty acid oxidation, and XRCC6, which codes for a DNA repair protein.
3.5 Molecular adaptations during cetacean evolution
Like in the hippo, genes underpinning metabolism were also found to be under selection in the cetaceans. Indeed, our results indicate that as many as 25 positively selected genes in cetaceans are involved in sugar metabolism, insulin availability or lipid metabolism. For example, two solute-carrier genes (SLC5A10 and SLC9B2) together with LMF1 and MARCH6 have all been implicated in aspects of diabetes, obesity and/or body mass index (electronic supplementary material, table S9). In light of our results, it is noteworthy that the bottlenose dolphin has been proposed as an emerging model for studying type II diabetes based on reports that fasting individuals retain comparatively high glucose levels; this diabetic state may be related to the demand to provide glucose to the brain while diving .
Many of the other amino acid changes that we found in the cetacean branches also appear to correspond to their ability to dive and resist oxidative stress. Indeed, some species dive to extraordinary depths; for example, Cuvier's beaked whale (Ziphius cavirostris) can reach more than 1000 m . To do this, cetaceans collapse lungs, sequester blood in retia mirabilia and maintain higher haemoglobin and myoglobin concentrations than terrestrial mammals . Our genome-wide scans of nine cetacean species revealed selection in key hypoxia-related genes, including DDIT4, EP300 and MGEA5. MGEA5 interacts directly with the product of OGT, a hypoxia gene that has undergone massive gene copy number expansion in cetaceans . We also found evidence for molecular adaptation in at least 13 genes involved in muscle and/or heart development and contraction in cetaceans (electronic supplementary material, table S9). Other studies have reported evidence that cetaceans have developed molecular adaptations to compensate for the lack of oxygen during dives. For example, in many diving aquatic mammals including cetaceans, myoglobin (a protein that stores oxygen in muscles) has evolved to have a greater charge, decreasing the tendency of molecules to clump and therefore increasing oxygen storage capacity in muscle cells .
Strikingly, we also found selection in eight genes related to blood clotting or platelet formation. For example, SERPINC1 produces the protein antithrombin, which interrupts the formation of blood clots . It is notable that SERPINC1 was identified as being under positive selection in addition to containing a convergent amino acid change in two cetaceans (T. truncatus and O. orca), the walrus and manatee, in a recent study of marine mammal genomes . Clotting in cetaceans differs from terrestrial mammals in that there is a relative lack of scab formation after wounding [55,56]. This reduced clotting has been attributed to the contact of blood with water rather than air, as well as the need to sequester blood in stagnant reservoirs while diving . Moreover, cetaceans show accelerated wound healing, reducing infection and accelerating tissue repair .
We identified signatures of selection in at least 46 genes in the common ancestor of cetaceans associated with the nervous system and brain development, which is interesting given that cetaceans are characterized by large absolute brain sizes, large brain-to-body mass ratios , high numbers of neocortical neurons  and high cognitive capacity (e.g. ). Indeed, many of the genes we found have been implicated in neurological disorders in humans, such as microcephaly, mental retardation, major depressive disorder and Alzheimer's disease (electronic supplementary material). PSSs were also detected in genes involved in myelination , neural connectedness, axonal guidance, cognition, neuronal development and neural progenitor cell proliferation. Previous studies have also reported some nervous system genes to be under selection in the Tursiops genome [20–22]; however, owing to the number of cetacean taxa included, our study was able to localize positive selection to distinct branches within the tree. In addition, we found that 15 nervous system-related genes showed evidence of positive selection on the mysticete ancestral branch, while six genes contained PSSs on the odontocete ancestral branch. These results are contrary to expectations, as mysticetes, while possessing large absolute brain size in some species, have smaller brain to body size ratios than odontocetes and might be expected to have fewer nervous system genes under selection .
The transition from a terrestrial to a wholly aquatic environment means that cetaceans must depend on the properties of water for the transmission of light and sound. Adaptations for living in low light include a thickened cornea, spherical lens and reduced numbers of cones . We determined whether cetaceans show positive selection in loci related to visual perception and found evidence in eight genes, some of which are known to be expressed in the cornea and/or retina, and are otherwise implicated in visual diseases (electronic supplementary material). These results complement earlier findings that cetaceans show several functional molecular changes in (or loss of) their opsin genes [63,64]. We also found molecular adaptation in five genes underpinning hearing; however, despite the fact that the toothed whales have evolved extremely high-frequency sound perception, none of these genes were exclusive to this group. Instead, the gene TNC was found to be under selection in the mysticetes only, while TECTA, JAG2 and USH1C were under selection in the ancestral branch of all cetaceans (although the latter also showed selection in odontocetes). These hearing genes add to the growing number that has been reported to be of potential importance in whales and dolphins [65–69].
In general, we find a much wider range of molecular adaptations in the cetaceans than hippos, probably reflecting their more derived body plan. For example, apart from those loci already discussed, molecular changes were also found in genes related to the kidneys (n=8) and skin/hair (n=8) (electronic supplementary material, table S9). Compared to terrestrial mammals, cetacean epidermis typically grows more quickly with fewer layers and is also less keratinized with increased cellular production and lifespan (e.g. [70,71]). Although other studies have identified the integument as a target of molecular evolution in whales and dolphins, these were unable to rule out the possibility that such changes occurred prior to the split with the hippos [20–22,24,72].
When not diving, cetaceans are exposed to the potentially harmful effects of solar radiation . We found many genes under selection that are involved in DNA repair (n=5) and/or cancer suppression (n=13) in cetaceans, particularly in the mysticete lineage, where we discovered at least 12 of these genes under positive selection, including the RAD52, which is essential for double-strand DNA break repair and genomic maintenance of cancer prevention (electronic supplementary material, table S9). Aside from sun exposure, adaptive modification of genes involved in DNA damage repair and tumour suppression may serve to overcome the predicted 1000-fold increase in cancer risk thought to arise as a result of the increased number of cell divisions in these exceptionally large and long-lived mammals . Indeed, Mysticeti contains both the largest (Balaenoptera musculus) and oldest recorded (Balaena mysticetus; more than 200 years) mammalian species, yet do not show elevated rates of cancer . Our detection of positive selection in DNA damage-related genes in cetaceans augments the result of GO analyses (also see [23,24]).
Overall, we find little support for shared ancestral aquatic adaptations in hippos and cetaceans. In particular, while many molecular adaptations thought to be important for the aquatic environment were recorded on ancestral cetacean branches (mysticetes, odontocetes or both) a comparison of coding sequences that were available for all focal members of the Whippomorpha revealed only a few cases of positive selection along the ancestral branch of the entire clade. Explanations for these findings, apart from the greater degree of morphological adaptation for aquatic existence in cetaceans, include the fragmentary nature of the hippo RNA-sequencing data as well as the relatively short evolutionary time separating the split of Whippomorpha from Ruminantia and the subsequent divergence of hippos from cetaceans. Consequently, our results seem to suggest that cetaceans and hippos evolved most aquatic adaptations separately. On the other hand, we found similar selection pressures acting on genes implicated in lipids in both groups, and more work is needed to determine whether these signatures are related to specialized lipid-rich integuments that characterize semi-aquatic and aquatic animals .
The new short read data for the hippo and humpback whale have been deposited in the SRA of GenBank under the accession nos. SRR2183469 and SRR2183423. Alignments have been submitted to Dryad: http://dx.doi.org/10.5061/dryad.4cp98.
S.J.R. conceived the project, and together with G.T. designed the study. S.A. and A.P. collected the whale samples and performed the RNA extractions. M.F.B. provided the hippo samples. G.T. assembled the data and performed the molecular evolution analyses, with input from K.J.T.D., M.R.M. and S.J.R. K.J.T.D. performed the protein–protein interaction and network analyses, with input from G.T., M.R.M. and S.J.R. G.T., M.R.M. and S.J.R. drafted the manuscript. All other authors assisted in revising the manuscript. All authors read and approved the final manuscript.
We declare we have no competing interests.
This work was funded by the European Research Council (ERC 1076 Starting grant no. 310482) awarded to S.J.R. and a Newton International Fellowship awarded to M.R.M.
We thank S. Bailey J. Parker, K. Warren and H. Oliveira for helpful advice and discussions. We are also grateful to C. Walker (QMUL GridPP High Throughput Cluster) for providing access to computing facilities. Analyses were performed with the assistance of SBCS-Informatics (http://informatics.sbcs.qmul.ac.uk) and the EPSRC-funded MidPlus cluster at Queen Mary University of London. Illustrations were by C. Buell and provided by J. Gatesy.
- Received April 21, 2015.
- Accepted September 2, 2015.
© 2015 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.