## Abstract

Same-sex sexual behaviour is ubiquitous in the animal kingdom, but its adaptive origins remain a prominent puzzle. Here, I suggest the possibility that same-sex sexual behaviour arises as a consequence of the competition between an evolutionary drive for a wide diversity in traits, which improves the adaptability of a population, and a drive for sexual dichotomization of traits, which promotes opposite-sex attraction and increases the rate of reproduction. This trade-off is explored via a simple mathematical ‘toy model’. The model exhibits a number of interesting features and suggests a simple mathematical form for describing the sexual orientation continuum.

## 1. Introduction

When a particular behaviour or trait is widespread across a group of animals, its origin is usually explained in terms of the fitness advantage that it confers. Such explanations attempt first to understand how the fitness of the animal population has a dependence on the degree to which it exhibits a given trait. It is then assumed that the processes of evolution and natural selection bring the population close to the point of maximal fitness. (See, for example, references [1,2] for a review.)

Given this paradigm, the prevalence of same-sex sexual behaviour in the animal kingdom has presented something of a puzzle. Same-sex sexual behaviour is ubiquitous across the animal kingdom, and has been catalogued in hundreds of animal species in ways that range from same-sex courtship and copulation to long-term pair bonding and parenting. (See, for example, reference [3] for an extensive review.) This ubiquity suggests the possibility that same-sex behaviour is associated with some kind of fitness advantage. The nature of this advantage, however, remains poorly understood, and is a source of considerable scientific debate. The puzzle is particularly pronounced, because same-sex attraction ostensibly has a significant cost, in the sense that it can reduce the frequency of mating between opposite-sex pairs, and thereby lower the rate of reproduction.

A number of hypotheses have been proposed to explain the origin of same-sex sexual behaviour in animals. These are reviewed, for example, in references [4,5], but a few of the more prominent hypotheses are briefly listed as follows. One hypothesis is that such behaviours arise primarily because of their role in maintaining social bonds, alliances and dominance hierarchies among members of the same sex. Another possible mechanism is that same-sex courting or mating provides ‘practice’ that improves the odds of success in later mating attempts with the opposite sex. Some studies have also considered the ‘kin selection’ hypothesis, which posits that same-sex sexual behaviour in one individual provides a genetic advantage to the individual's siblings, and on the whole provides an advantage to the family genetic line. Finally, there are genetically motivated hypotheses, such as the idea that genes promoting same-sex sexual behaviour in a homozygous state may confer a fitness advantage when in a heterozygous state, or the idea that an allele promoting same-sex sexual behaviour in one sex may increase the fitness of the opposite sex. (Table 2 of reference [5] provides a summary of these and other hypotheses, along with further references.)

The purpose of this paper is to define and consider an interesting mathematical problem that can be said to describe a different potential mechanism for the adaptive origins of same-sex sexual behaviour. Central to this proposed mechanism are two ideas: first, that having a diversity of traits among a given group confers a fitness advantage, and second, that the sexual attraction of one individual to another is determined by the traits of the other, rather than by their genetic sex. These two ideas together imply that the breadth of traits present within a given sex is pulled in opposite directions by two competing factors. On the one hand, the unpredictable environment favours a wide distribution of traits. On the other hand, the sexual nature of reproduction favours a dichotomizing of traits according to each individual's biological sex. Such a dichotomy promotes opposite-sex attraction, thereby increasing the number of offspring. The purpose of this paper is to explore the idea that a balance exists between these two factors that naturally leads to a finite degree of same-sex sexual attraction.

It should be stated up front that this paper is not intended to be taken as a realistic model for explaining the sexual behaviours or sexual orientations of any particular animal group.^{1} Instead, I focus only on a simple mathematical problem, which represents a minimal description of a possible trade-off between diversity of traits and sexual dichotomization. Further, this analysis considers only the distribution of traits and preferences that maximize the expected fitness of the population as a whole. Whether this kind of optimum can be expected to be produced by evolution and natural selection is a delicate question, and depends on the mechanisms by which sexual traits and sexual preferences are (or are not) inherited [1,2]. These questions are not considered here, and as such this paper is best read as merely an interesting mathematical problem that is *inspired* by the question of the origins of diversity in animal sexual behaviour. The hope is that this problem, and its solution, can inspire future discussion and more accurate models.

The remainder of this paper is devoted to proposing and exploring a simple ‘toy model’, which considers the optimal distribution of a single trait among the population. The distribution is completely determined by a single parameter *t* that describes the relative importance of phenotypic variation for the species fitness. The value of the parameter *t* determines both the distribution of traits among the population and the prevalence of same-sex pairing, both of which can be described analytically. The model exhibits a number of interesting mathematical features, including a series of bifurcations in the trait distribution and in the distribution of sexual orientations as a function of *t*. At small *t*, both distributions acquire a simple mathematical form. Results from the model are discussed in the context of data on human sexual orientation.

## 2. Model

In the toy model that is the subject of this paper, it is imagined that all individuals are characterized by a single trait whose value *x* ranges from 0 to 1. Suppose, for concreteness, that females tend to have values of *x* that are closer to 1, whereas males tend to have values of *x* that are closer to 0. Under this description, each sex is characterized by two probability density functions: one describing the probability of possessing a certain trait value *x*, and the other describing the probability of desiring a trait value *x*_{c} in a mate. The distributions of the trait value *x* are denoted *p*(*x*) and *q*(*x*) for males and females, respectively. The distribution of the desired trait value *x*_{c} is denoted *p*_{c}(*x*_{c}) for males and *q*_{c}(*x*_{c}) for females. The four distributions are summarized graphically in figure 1. It is assumed that *x* and *x*_{c} are independent variables, so that the trait value *x*_{c} that an individual desires in a mate is independent of the trait value *x* possessed by the individual itself.

In principle, these four distributions can be completely distinct from each other. However, in order to simplify the model, I introduce the following two assumptions. The first assumption is that there is a symmetry between the two sexes, such that each sex is equivalent to the other under a redefinition of the value of the trait *q*(*x*)=*p*(1−*x*) and *q*_{c}(*x*_{c})=*p*_{c}(1−*x*_{c}). The second assumption is that the number of individuals possessing trait value *x* is equal to the number of individuals desiring the trait value *x* in a partner. This assumption guarantees that there is ‘someone for everyone’, and is equivalent to the conditions that *p*_{c}(*x*)=*q*(*x*) and *q*_{c}(*x*)=*p*(*x*). These two assumptions together imply that there is only one relevant distribution *p*(*x*) for describing the two sexes, and that all others can be related to it by *p*_{c}(*x*)=*q*(*x*)=*p*(1−*x*) and *q*_{c}(*x*)=*p*(*x*) (figure 1).

Now, consider a population consisting of a very large number *N* of individuals, and suppose that the individuals all become paired with each other in such a way that every individual's desire for the trait value of their partner is satisfied. The proportion of heterosexual pairings that result from this process can be calculated as follows.

Consider two different trait values *x*_{1} and *x*_{2}. One can now define two groups of individuals: (i) those who possess trait value in the infinitesimal interval (*x*_{1},*x*_{1}+d*x*_{1}) and desire trait value (*x*_{2},*x*_{2}+d*x*_{2}) in a partner, and (ii) those who similarly possess *x*_{2} and desire *x*_{1}. These two groups are referred to as ‘group 1’ and ‘group 2’, respectively. The number of males in group 1 is given by *M*_{1}=*N*⋅*p*(*x*_{1}) d*x*_{1}⋅*p*_{c}(*x*_{2}) d*x*_{2}. Similarly, the number of females in group 1 is *F*_{1}=*N*⋅*q*(*x*_{1}) d*x*_{1}⋅*q*_{c}(*x*_{2}) d*x*_{2}. For group 2, one can likewise define the number of males and females as *M*_{2}=*N*⋅*p*(*x*_{2}) d*x*_{2}⋅*p*_{c}(*x*_{1}) d*x*_{1}, and *F*_{2}=*N*⋅*q*(*x*_{2}) d*x*_{2}⋅*q*_{c}(*x*_{1}) d*x*_{1}, respectively. Because of the symmetry of the distributions *p*(*x*) and *q*(*x*), the total number of individuals *M*+*F* is the same in both groups. One can, therefore, pair the two groups in such a way that each individual in group 1 is paired with an individual in group 2. If these pairings are selected at random, then the proportion of heterosexual pairings is (*M*_{1}*F*_{2}+*F*_{1}*M*_{2})/(*M*+*F*)^{2}, and the number of heterosexual pairings between the two groups is *dN*_{het}(*x*_{1},*x*_{2})=(*M*_{1}*F*_{2}+*F*_{1}*M*_{2})/(*M*+*F*). To find the total number of heterosexual pairings across the entire population, one can integrate *dN*_{het}(*x*_{1},*x*_{2}) over all values of *x*_{1},*x*_{2}. Inserting the expressions for *M*_{1,2} and *F*_{1,2} gives
*N*_{het} is maximized when the distributions of possessed traits and desired traits, *p*(*x*) and *p*(1−*x*), have zero overlap (i.e. when *p*(*x*)*p*(1−*x*)=0 everywhere). In this case, all pairings are heterosexual, *N*_{het}=*N*. If each heterosexual pairing produces *b* offspring on average, then the number of individuals in the next generation is *bN*_{het}.

On the other hand, one may expect finite overlap between *p*(*x*) and *p*(1−*x*) in situations where there is a fitness advantage conferred by each sex having a wide diversity in traits. In particular, one can define the trait *entropy* of the next generation as
*s* of the distribution *p*(*x*), multiplied by the number of individuals in the population. The entropy *S* is maximized when *p*(*x*)≡1, i.e. when every trait value is equally likely for each individual, regardless of sex. Presumably, when the environment is such that there is pressure to produce offspring and also pressure to maintain a diversity of traits, the distribution *p*(*x*) will reach a steady-state that involves a trade-off between maximizing the number of offspring and maximizing the entropy of the trait distribution (figure 1*b*,*c*).

To model that trade-off, I introduce a generic fitness function *F* that consists of a term proportional to the total offspring number plus a term proportional to the trait entropy. In other words, the proposed fitness function is
*u*_{0} and *T*_{0} are constants that arise from environmental pressures and are independent of the distribution *p*(*x*). Dividing both sides of this equation by *u*_{0}*Nb* one arrives at a renormalized fitness function *f*=*F*/(*u*_{0}*Nb*) that is a function of only a single parameter *t*
*n*=*N*_{het}/*N* (see equation (2.1)) and *s*=*S*/(*bN*_{het}) (see equation (2.2)) are functionals of the trait distribution *p*(*x*), and *t*=*T*_{0}/*u*_{0} is a dimensionless ‘entropy parameter’ that characterizes the relative importance of trait diversity. (In this sense, *t* plays a role similar to that of the temperature in the Helmholtz free energy of statistical physics.) When *t*=0, the optimum distributions have no overlap between male and female traits, and all pairings are heterosexual (*n*=1). When *p*(*x*)≡1, and heterosexual and homosexual pairings are equally likely (*n*=1/2).

In the remainder of this paper, results are presented for the distribution *p*(*x*) at different values of the parameter *t*. The primary tool used for finding the optimal *p*(*x*) is a numerical Monte Carlo algorithm, which is described in appendix A. Briefly, this algorithm divides the trait interval [0, 1] into discrete points *x*_{i}, and makes an initial guess for the function *p*(*x*_{i}). The values of *p*(*x*_{i}) are then optimized by making random deviations from the initial guess, and then evaluating the corresponding change to the population fitness *f*. Changes are kept or discarded according to the Metropolis algorithm, and the procedure is iterated until a convergent solution is found.

Once the distribution *p*(*x*) is known, one can also examine the corresponding distributions of ‘sexual orientation’ *θ*, which is defined as the probability of a given individual pairing with a same-sex rather than with an opposite-sex partner. In particular, for an individual (say, a male) that prefers a trait value *x*_{c} in a partner, one can define the orientation *ϑ*(*x*_{c}) of the individual as the proportion
*θ* as
*δ*(*x*) is the Dirac delta function. In the following section, results are presented for both the trait distribution *p*(*x*) and the orientation distribution *P*(*θ*) as a function of the entropy parameter *t*.

## 3. Results

When the entropy parameter is large, *t*≫1, the trait distribution becomes flat, *p*(*x*)≡1, which maximizes the trait entropy at the cost of reducing the total number of offspring by 50%. In fact, the optimal distribution is precisely equal to *p*(*x*)≡1 for all values of *t*≥4. Only at *t*<4 do traits begin to specialize according to sex. At *t* slightly smaller than 4, the distribution *p*(*x*) acquires a step-like shape, with traits corresponding to *x*<1/2 being more prevalent in males, and traits with *x*>1/2 being more prevalent in females. This transition is depicted in figure 2*a*.

One can describe the transition at *t*=4 analytically by writing the distribution *p*(*x*) as
*c* is a parameter to be determined. Inserting this distribution into equations (2.1) and (2.2), one can evaluate the frequency *n* of opposite-sex pairing as *n*=3/2−1/(1+*c*^{2}), and the trait entropy as *c* gives a fitness function *f*=1/2+*c*^{2}(1−*t*/4)−*c*^{4}*t*/2, which is minimized when
*t*≥4, the optimal fitness is provided when *c*=0, and the trait distribution is uniform. At *t*<4, on the other hand, there emerges a difference in trait distributions between the two sexes, with a magnitude *c* that grows as

This splitting also has an implication for the distribution of sexual orientations, *P*(*θ*). At *t*>4, when the trait distribution is uniform, all individuals have orientation *θ*=1/2, because there is no sexualization of traits. When *t* is lowered below 4, on the other hand, there emerge two classes of orientation: *θ*=(1±*c*)/2. The former class (with a majority preference for same-sex partners) comprises a smaller proportion (1−*c*)/2 of the population. The latter class (with a majority preference for opposite-sex partners) comprises a larger proportion (1+*c*)/2. In other words, the distribution of orientation *P*(*θ*) is such that *P*(*θ*)=*δ*(*θ*−1/2) at *t*>4, whereas at *t* slightly less than 4, one has *P*(*θ*)=((1+*c*)/2)*δ*(*θ*−(1−*c*)/2)+((1−*c*)/2)*δ*(*θ*−(1+*c*)/2). This bifurcation of the orientation distribution is depicted in figure 2*b*.

As *t* is reduced even further, the trait distribution undergoes a sequence of additional splittings, as illustrated in figure 2*a*. At *θ*=1/2 emerges in between the other two, and *P*(*θ*) is a sum of three Dirac delta functions. At *t* is reduced an increasingly large number of classes emerge.

When *t* becomes small, *t*≪1, the distribution *p*(*x*) has so many steps that it closely approximates a continuous function. As shown in figure 3, in this limit this function closely matches the form,
*t* at small *t*.

To derive the relation between *t*, one can insert equation (3.2) into equations (2.1) and (2.2). Evaluating the corresponding integrals at small *f*=*n*(1+*ts*) is then minimized when
*t*^{3}, suggesting that equation (3.2) is exact in the limit

Equation (3.2) also implies a specific, continuous form for the distribution of sexual orientations, *P*(*θ*). In particular, evaluating equation (2.5) gives
*t*, the distributions of male and female traits always have finite overlap, and consequently, there are no individuals with strictly heterosexual or homosexual orientation, *θ*=0 or *θ*=1. Consequently, the distribution *P*(*θ*) should be considered to be defined only over the interval [*θ*_{min},*θ*_{max}], where *P*(*θ*) is properly normalized, because

## 4. Discussion

In this paper, I have considered a simple mathematical toy model for the trade-off between sexual dichotomy of traits and trait diversity. Among the more interesting features of the model are the series of sharp transitions in the trait distribution as the parameter *t* is varied, and the ‘Fermi function’ shape of the distribution at small values of *t*. Of course, the model has employed a number of fairly artificial assumptions, most notably the assumption of a single relevant trait that is defined on the interval [0, 1]. Because this assumption is unlikely to be applicable to a real biological population, it may be difficult to find direct empirical comparisons with the trait distribution *p*(*x*).

On the other hand, the model also makes specific predictions about the distribution of sexual orientation, which can in principle be observed. For example, the model suggests that when the relative importance of trait diversity is high (or, equivalently, when the relative importance of producing a large number of offspring is low), the population can be divided into a small number of well-defined groups with similar sexual orientation. As the environment is changed in such a way that trait diversity becomes less important, these groups split into a larger number of groups through a sequence of sharp transitions. Finally, when the value of trait diversity is low, the distribution of sexual orientation becomes continuous and takes the form *P*(*θ*)∝1/*θ*.

In principle, some of these results can be tested empirically by measuring the frequency of same-sex versus opposite-sex mating or pairing for a large number of individuals across an animal population. (Of course, one should be cautious about conflating the observed frequency of same-sex behaviours with the internal preference of an individual for same-sex partners.) Unfortunately, I am unaware of any studies that present sufficient data to construct an empirical version of the distribution *P*(*θ*).

To date, the vast majority of quantitative research about same-sex sexual behaviour focuses on humans. Some studies, beginning with the Kinsey reports, [10,11], have made an effort to assess the relative abundance of different sexual orientations. One can ask, then, how the results from such studies compare with the derived results from the model of this paper.

Such a comparison should, of course, be considered to be extremely speculative in nature. It is unlikely that the diverse range of human sexual behaviours can be described using the simplistic toy model outlined in this paper. What is more, data on sexual orientation in humans usually divides individuals into discrete categories and relies on self-reporting of same-sex sexual behaviour or sexual attraction. All of this makes it difficult to say anything quantitative about the distribution *P*(*θ*).

With these caveats, one can nonetheless make a speculative comparison between the distribution *P*(*θ*) and interview/survey data about human sexual orientation. Such data often categorize individuals according to their position on the Kinsey scale, which describes sexual orientation on a seven-point scale [10]. If this seven-point scale is (dubiously) considered to correspond to evenly distributed intervals of the orientation *θ* in the range [0, 1], then one can compare it directly to the theoretical distribution *P*(*θ*) from the model. Such a comparison is presented in figure 4.

Figure 4 suggests that a very approximate fit to equation (3.4) is possible. This fit gives *t*≈0.4. This relatively small value of *t* is within the regime where the theoretical optimum distribution *p*(*t*) is well approximated by the continuous function of equation (3.2). One notable failure of the model is that it is unable to capture the relatively large proportion of individuals at either extreme of the distribution, *θ*≈0 and *θ*≈1. These extremes correspond to individuals who identify as either ‘completely heterosexual’ or ‘completely homosexual’, and their abundance is apparently greater than can be explained by the simple model proposed here. It remains an interesting question whether such extremization of sexual orientation can arise from optimization of the population fitness, or whether its appearance in the data is better ascribed to other (perhaps psychological or sociological) factors.

Future and ongoing studies may allow us to adjudicate between different proposed mechanisms for the appearance of same-sex sexual behaviour in the animal kingdom. In particular, the mechanism proposed here can be refined or refuted by collecting data on the proportion *θ* of same-sex versus opposite-sex sexual encounters for many individuals across a large animal population, and then checking whether it obeys the characteristic 1/*θ* distribution (as at

Alternatively, one could look for correlations between the rate of same-sex sexual behaviour in an animal species and the diversity of expression of a particular trait. If any such evidence is absent, it would suggest that the origins of same-sex sexual behaviour cannot be described as a simple competition between increased trait diversity and increased sexual dichotomization of traits. It could also suggest that the traits desired by a particular individual (*x*_{c}) are not statistically independent of the traits possessed by the individual (*x*); such non-independence would fundamentally alter the tension between the two terms in equation (2.3). Either way, finding a clever way to measure and study the distribution of biological traits, *p*(*x*), or the distribution of sexual orientations, *P*(*θ*), may prove to be a powerful tool for unravelling the mystery of same-sex sexual behaviour.

Finally, it is worth emphasizing that many of the results presented here are specific to the quantitative form of the assumed fitness function, equation (2.3). For example, one might consider that the degree of trait diversity is better characterized using the variance *s*. (Here, *x*.) Substituting *σ*^{2} for *s* in the fitness function *f* would then give a different mathematical optimization problem, and thus a different trait distribution *p*(*x*) for each value of the parameter *t*. Indeed, for the specific choice *f*=*n*(1+*tσ*^{2}), I find that the optimal distribution *p*(*x*) assumes only one of two extremes: at all *t* larger than some critical value, *t*_{c}≈25, the optimal distribution is *p*(*x*)=1, which corresponds to the *t*<*t*_{c}, the optimal distribution is 2*Θ*(*x*−1/2), where *Θ*(*x*) is the Heaviside step function, which corresponds to the *θ*=1/2) or heterosexual individuals *θ*=1), depending on whether *t* is above or below a critical value.

## Data accessibility

Numerical code corresponding to the analysis performed in §2 and appendix A is available as a part of this article's electronic supplementary material.

## Competing interests

I have no competing interests.

## Funding

This work is unfunded.

## Acknowledgements

I am grateful to several anonymous reviewers, whose comments and criticisms led to significant improvements in this manuscript. I am also grateful to multiple friends (who probably prefer to remain anonymous), who encouraged me in this work even when I felt like its value was highly dubious.

## Appendix A. Numerical optimization of *p*(*x*)

In §2, a model is introduced that relates the fitness *f* of the population to the trait distribution *p*(*x*). Written out explicitly, this relation is
*t*, there is a specific distribution *p*(*x*) that maximizes equation (4.1). This distribution can be found numerically using the following method.

First, the interval [0, 1] is divided into a set of *M* regularly spaced points, {*x*_{i}}. The results presented here use *M*=60. An initial guess is then made for the values of the distribution, *p*(*x*_{i}), consistent with the normalization constraint
*p*(*x*_{i})≡1. A Metropolis-type algorithm is then used to incrementally update the values of *p*(*x*_{i}) in such a way that the maximum of *f* is increasingly approached. Specifically, the algorithm consists of repeatedly choosing random pairs of points *x*_{i} and *x*_{j}, and then updating the values *p*(*x*_{i}) and *p*(*x*_{j}) such that *δ* is chosen at random from a small interval; results presented here use *δ*∈(0,0.01). After each update, the change *δ*_{f} in the fitness is evaluated. If *δ*_{f} is positive, then the update is kept. If *δ*_{f}<0, on the other hand, then the update is reverted with probability *β* is an ‘inverse temperature’ parameter that determines the rate of convergence of the solution and the final numerical accuracy.

Results presented in §3 use a process of successively increasing values of *β*, starting at *β*=10^{5} and gradually increasing to *β*=10^{11}. At each value of *β*, a large number, 10^{4} *M*, of updates is attempted to ensure convergence of the solution. Care was taken to ensure that the solution converged to the same result for different random realizations of the numerical procedure.

Finally, one can note that equation (4.1) has no explicit dependence on the value of *x*, and therefore, the numerical procedure does not, in general, find a set of values {*p*(*x*_{i})} that is meaningfully ordered as a function of *x*_{i}. One can therefore arrange the numerical values {*p*(*x*_{i})} in order of decreasing value, and the resulting solution produces the same value of the fitness *f*.

## Footnotes

- Received June 21, 2016.
- Accepted August 17, 2016.

- © 2016 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.