Social living ultimately depends on coordination between group members, and communication is necessary to make this possible. We suggest that this might have been the key selection pressure acting on the evolution of language in humans and use a behavioural coordination model to explore the impact of communication efficiency on social group coordination. We show that when language production is expensive but there is an individual benefit to the efficiency with which individuals coordinate their behaviour, the evolution of efficient communication is selected for. Contrary to some views of language evolution, the speed of evolution is necessarily slow because there is no advantage in some individuals evolving communication abilities that much exceed those of the community at large. However, once a threshold competence has been achieved, evolution of higher order language skills may indeed be precipitate.
Explaining the evolution of the large, complex, in some cases technologically advanced, societies of mostly unrelated individuals such as those characteristics of the more advanced mammals like primates and humans has been a challenge to biologists. All apes live in groups with relatively complex structure, and at least some level of technological culture. However, great ape communities cannot grow much beyond 40–50 individuals [1,2]. Ape sociality based on intensive, expensive, stable and long-lasting dyads is a limiting factor for the size of these societies [2,3]. By contrast, human groups are potentially limitless in size, and complexity. Maintaining coordination in such large groups requires both a large brain and the ability to communicate third-party information [4,5]. Third-party information, however, cannot be passed on without some form of language because this requires both time and place to be marked as well as the ability to identify third parties and comment on their actions.
Language of any kind has appeared only a few times in evolution, and arguably complex language evolved only once. This is a conundrum: given how useful humans find language for a host of different functions, its rarity is surprising. One possible explanation is that language might have evolved as a side effect of some other unrelated trait, and thus might be an evolutionary accident. If this explanation is true, then we would expect to see complex language emerging in a relatively short time once the simple form appears as a side effect. An alternative explanation might be that the circumstances in which complex human language evolved were unique, resulting in the phenomenon of language being unique. This explanation would allow a slower process in which the unique circumstances rather than the lucky ‘mutation’ would be driving the appearance of complex language.
While there is a problem of cheating in all forms of communication [6–9], the use of language to pass on third-party information opens up two new forms of collective action problem. First, individuals can lie about the information that they received from others . This is different from primary lies, as it is more difficult to detect indirect lies of this kind. The usual solution is network-embedded reputation, rather than repeated games [11,12].
However, even if all the dishonest signalling problems are resolved, a second form of collective action problem arises. If the producing and processing of linguistic information is costly, then it might be in the interests of the individual to freeride on others' (costly) ability to produce efficient coordination. This condition is clearly present in humans. The production of human language requires an apparatus that is particularly costly: it required a set of changes to human breathing that increases mortality risk, substantially increased cognitive processing power and, hence, an energetically more costly brain [13,14]. Why has the production of this public good not collapsed via less good linguists freeriding on the collective benefits provided by others while saving on their individual costs?
One possible reason is that human language might be functionally tuned for mate choice purposes , or directly useful for other reasons. While we do not doubt that, once it has evolved, language provides a capacity that can be used in many different contexts where it might be exploited by, for example, sexual selection, the fact that language's capacity to pass on third-party information has evolved extremely rarely does suggest that these features are secondary [16,17]. Instead, we argue that the primary origin of language lies in coordinating relatively large-scale collective action , that language exists to enable cooperation on a large scale, either as an ecological signalling system  or a means of social coordination [19,20]. For a group to perform a complex task, its members need to coordinate their actions using a pre-agreed set of signs and signals.
To investigate this problem, we use an agent-based behavioural synchrony model  with two novel distinguishing characteristics. First, we use a primate-like two-level social network in which each individual interacts only with a small subset of its social group [22–24]. Second, we use the group's behavioural synchrony as a necessary condition for group action in what is, in effect, a two-step process: in order to solve an important ecological problem it is first necessary to create a cohesive social group of some specified size because it is the group that allows the problem to be solved . In other words, it is behavioural coordination that makes social action (such as defence against predators or coordinated foraging routes) possible. The fitness payoff to the individual members of a community depends both on the size of the group and on how well synchronized their behaviour is with each other . Although we cast our model in terms of language, we argue that it applies more generally to all communication systems whenever communication is targeted at facilitating coordination and/or social cohesion through coordinated or synchronized action.
We use compass direction as a simple exemplar of synchronized action, but it represents any kind of behavioural synchrony (e.g. agreeing on a social rule or a cultural icon) that ensures collective action. In this model, each member of a group of agents is assigned a random vector on a dial (i.e. a compass direction). This is their initial information about the social action of the group. Agents go through a set of one-on-one meetings with those to whom they are socially linked. At each meeting, they exchange information about their respective vectors. However, each agent receives this information with noise (a trait of the recipient rather than the sender of the information). Using this new (but noisy) information, the agents reset their own vector values midway on the dial between their own previous value and the (noisy) one they received from the agent they have just met. This set-up allows us to take the noise element to be an inverse proxy for the efficiency of communication (in effect, the efficiency of language): the more precise the information transfer during an encounter, the more effective the process of synchronization will be. Improving the effectiveness of communication by reducing the error term then represents the costs of evolving language (or, indeed, any other communication system).
Once synchronization is achieved, the group performs the communal action. The better they manage to coordinate, the more the group gets out of this action (and the higher the individual fitness of the ingroup members). As information transmission noise slows down the rate of coordination (or convergence), the higher the average noise level the further the agents end up from each other, and thus the lower the benefit from the communal action. Put more explicitly, if coordination efficiency declines the less effectively the group evades predators, the less successful it is in foraging. While the benefit each individual receives from the group action is the same for everyone, each agent's payoff is modified by two costs: her vector value's distance from everyone else's at the end of the synchronization process, and the cost she is willing to invest to reduce the noise she faces when receiving information. Thus, this model poses a classic public goods problem: the low noise level is a public good in that it creates a benefit for all (faster, better synchronization), which can only be achieved via the individuals' costly contributions (reducing noise). The problem of linguistic coordination thus adds a solution to the public goods problem via a cost that the agents face from poor individual-level coordination, which is intrinsic to coordination and thus does not require third-party policing. This way language that, all else equal, should pose a public goods conundrum simply bypasses the problem if it emerges as a coordination tool.
First, we build a model of linguistic coordination of collective action in a group.
We assume that the agents form a network such that each agent is connected to the same number of others as every other group member. Thus to introduce a group of agents mathematically, let G(n) denote the set of all connected k-regular n-sized graphs 2.1 where ei denotes the set of agents that agent i is connected to and ρ(i,j) denotes the network distance between nodes i and j.
Note that the definition in equation (2.1) implies that all the elements of the G set are connected graphs. We use connected graphs, as we are interested in how a group of agents that form one social network can coordinate collective action. If they are not connected, then the problem becomes trivial as no group-level coordination becomes possible. In other words, we assume that the agents form a social network that can be defined as a group.
Our definition of the set G also implies that we are only interested in k-regular graphs, i.e. networks in which each agent is connected to the same number of others. This is based on the observation that if maintaining relationships is costly then the social network is unlikely to be fully connected , a claim that is supported by the fact that the variation of meaningful social relationships is limited in both non-human primates, as well as in humans [27,28].
We assume that the agents are coordinating a collective action problem, represented in the model by coordination on a unit circle. Thus, at time t = 0, each agent starts with a random vector on a dial, a standard representation of group synchrony problem 2.2 where are independently and identically distributed.
When the agents receive new information, an agent updates her direction to midway between her current direction and the perceived direction of the interactant 2.3 where 2.4 where ϕi and ϕj are information vectors of agents i and j.
We place the coordination problem on a unit circle rather than on a linear range, as in the former no single exchange of information can provide any guidance as to where the group's coordination converges, and hence the agents cannot shortcut the coordination process.
A synchronization event on the graph g(n)∈G(n) is a series of T meetings among the agents, where a ‘meeting’ between two agents, i and j, is an interaction between connected agents, with each pair of connected agents equally likely to meet 2.5 that is, the agents meet only agents that they themselves are connected to.
So far the set-up is the same as in . However, in the present model, we assume that the agents' communication includes an error factor on the side of the information recipient. Thus, when agent i receives information from a partner she perceives it perturbed by a noise characteristic to her, . In other words, each agent is coupled with a noise variable that is specific to their reception of the information.
Thus, in a meeting between agents i and j, after exchanging their information, both agents update their respective information the following way 2.6 where is a random variable that is either a positive or negative deviation from the information value.
Meetings are repeated until t = T, where T = τn/2, i.e. until on average each agent participates in τ information exchanges.
Let di denote the individual i's average distance from the others at T, the endpoint of the synchronization event 2.7 where we applied the same convention as in (2.4).
Let denote the average distance among the agents 2.8
We assume that the group's payoff from the collective action is dependent on the efficiency of coordination in a way that the less efficient the coordination, the lower the group-level payoff. Thus, let C denote the group-level payoff, and let H denote the net group payoff the following way: 2.9 where D is the parameter of losing payoff due to inefficiency of coordination. We assume that D > 0.
Let us assume that the agent's individual payoff is made up of the following elements: her share from the group's payoff reduced by her distance to the other agents, and by the cost of reducing her noise level. Thus her payoff, Hi, is determined the following way: 2.10 where A is the cost parameter associated with the individual's information being distant to other individuals, and B is the cost of reducing the noise of the incoming information.
Combining equations (2.9) and (2.10) 2.11
As we assumed above that D > 0, we can normalize the payoff function the following way: 2.12 thus 2.13 where , and . The parameter α can be interpreted as the agent's individual cost of being inefficient at updating the information, and the parameter β can be interpreted as the cost of communication precision.
Note that as C and D are fixed parameters and the group size, n is fixed, the agent i's maximization of the payoff hi in is equivalent to the optimum in Hi. Hence, it is possible to examine the individual's optimization problem simplified to 2.14
In other words, the optimization problem involves choosing the right noise level, given both the cost of being inefficient at coordination and the cost of increasing communication precision.
Note that this set-up implies two important characteristics of this coordination problem. First, the agents are naive in the primary coordination problem: they communicate information to others in a non-strategic way. Second, however, there is a secondary element, the noise level, which is the proxy for linguistic ability, in which there is space for agents to alter their individual payoffs. Assuming that β is positive (which is equivalent to the assumption that both B and D parameters are positive), then if α is small enough implies that there would be a conflict of interest between the group and the noise level. In this case, the model simplifies to the classic tragedy of commons case. The inclusion of the second term in (2.13) thus allows a solution to the standard free rider problem in a way that the cost term comes as an intrinsic consequence of coordination, and not as third-party policing.
To examine the incentives for the individual, let us fix the noise level of all agents except for the focal agent i (figure 1). As expected, the results show that the optimal noise level is dependent on both parameters α (the agent's individual cost of being inefficient at updating the information) and β (the general cost of communication precision). More importantly, however, the optimal noise level is above that of everyone else in the group for some parameters, while smaller than that of the others in some other parameter ranges. Note that in the former case the agent can be seen as freeriding on others, while in the latter case the agent is a net contributor to the collective action.
Note that the tendency towards freeriding is moderated by a cost of being far from the others at the end of the synchronization process. In effect, this creates a trade-off. If the cost of communication precision is relatively high, it is beneficial for the agent to freeride on others; but if the cost of being individually inefficient at coordination is relatively high, then it is beneficial for the agent to tolerate a lower noise level than others.
Although, mathematically, this structure is trivial, when applied to language it raises some interesting questions. Although it has been claimed that linguistic information is strategic, to our knowledge it has not been shown that the ability to communicate in complex language is also open to free riders, and thus could easily result in a tragedy of commons. Perhaps, the reason behind the rarity of complex language lies exactly in this point: it might just be likely that β tends to be above α in almost all natural settings.
Assuming that communication precision, i.e. the variable set , is inherited by agents, we introduce an evolutionary mechanism based on the payoffs. To do this, we assume a simple evolutionary dynamics such that, at each evolutionary step, the noise level of the agent with the smallest payoff is replaced by that of the highest payoff, thereby mimicking differential selection favouring the more successful individuals: in each step, the agent i with the lowest payoff is eliminated, and replaced with a new agent, still labelled i. The new agent i's value of is chosen using that of the highest-payoff agent from the previous round (say, j), with an error 3.1 where 3.2 where s is an evolutionary time step, and θ is the parameter of copying error. (Note that the evolutionary mechanism in equations (3.1) and (3.2) assumes that the payoff from the collective action, captured in parameter C, is high enough so that all payoffs are positive, i.e. hi > 0 for all i. We return to this assumption below.)
Not surprisingly, the ratio of the cost parameters α and β determines the way the group's noise level evolves. If the cost of precision is high, then freeriding is evolutionarily advantageous, and noise levels quickly converge to maximum (figure 2a). If this cost is less forbidding, then the group ends up with a low level of noise in communication for every member of the group, and thus a more efficient coordination of their communal action. The final evolutionary state is entirely dependent on the ratio of the two costs with a rapid phase transition between the two states (figure 2b).
Thus, in some parameter range the classic collapse of public goods emerges: high communication costs trigger freeriding among the agents, leading to the collapse of coordination efficiency, and a collapse in the group's ability to perform the collective action that the coordination is designed to achieve. However, elsewhere within the parameter range, all agents choose to make costly investments towards a shared communication efficiency, as the opportunity cost incurred by losing efficiency in the precision of their individual coordination decisions is even more costly.
The evolution of low-noise communication lends itself to being interpreted in terms of language evolution. Language is a communication tool that facilitates the coordination of group action: language is costly for the individual to produce, but failure to invest in linguistic ability results in reduced benefits from group action.
This suggests that language could have evolved as a tool for coordinating collective action within a group providing the following conditions hold:
(a) The cost of being an inefficient coordinator individually is relatively high compared with the cost of increasing communication precision. That is, the agent's lost share of the group payoff from collective action due to their inability to coordinate well is larger than the cost of producing efficient language.
(b) The payoff from the group's collective action is high enough to exceed the cost of the agent's communication precision at all points during the evolutionary process. Even if the cost of communication efficiency is smaller than the payoff at the very end of the evolution of language, the process would not get going if it is high at the very beginning. Hence, this account of language evolution is as much about the underlying collective action as about the individual costs.
This implies that the size of the group plays a role. To examine whether this is so, first let us ignore the fact that the payoff has to be positive (so that the agents actually survive). In other words, we assume that the parameter C in equation (2.9) is always high enough so that all payoffs are positive. However, let us assume that for this payoff to materialize, that is, for the group to survive, the average distance among the agents has to be below a threshold 3.3 where λ is an arbitrary parameter. This is equivalent to changing equation (2.9) to the following: 3.4 where 3.5 and we assume that the group can only survive if H > 0.
The rationale behind introducing the limit, λ, lies with the assumption that groups that are unable to get to a minimum level of efficiency in coordinating their collective action get nothing out of trying to do so. To see why this is important, note that the average distance among the agents at the end of a synchronization event, , increases with group size, i.e. , a property established elsewhere . Thus, it is meaningful to ask how large a group can be and still be able to perform a collective action given the limit of λ in (3.3).
Let nmax denote the maximum size of group that can achieve behavioural synchrony given a particular error level. That is, 3.6
The results suggest that, as the noise level falls, maximum group size increases (figure 3a). In other words, the evolution of more precise communication allows larger group sizes, in line with the role that language is thought to have played in human evolution [20,29]. Many primate and human groups can draw a benefit from coordinating a larger group, for instance, from a disproportionately increased efficiency in exploiting the environment, or from maintaining higher technological complexity, or from defence against predators or neighbouring conspecific groups. If that is the case, the increased coordination efficiency allowed by the fall of noise in our model, i.e. the rise of complex language in our interpretation, would lead to increasing group size.
Note that in the two alternative paths of evolution (figure 2a) there is a remarkable difference in the speed of reaching the long-term steady state: while the collapse is fast, the evolution of low noise levels is slow. The precision of communication thus emerges relatively slowly at first (figure 3b): even if it is optimal for the individual to improve precision, it is suboptimal to be much better than the rest of the group. In other words, being a relatively good communicator might provide individual benefits, driving the slow evolution of language forward. However, if the individual is much better in communication ability than the rest of the group then the advantage is limited, while the cost is high. As far as the efficiency of group coordination is concerned, there is no benefit in being a significantly more efficient communicator than other members of your community. This makes intuitive sense: if a hypothetical group of modern day humans would be learning a new language together, there would be no point for any one individual to jump ahead to being a fluent speaker while everyone else in the group were still beginners. If such a jump is very costly, which would have been the case with the evolution of linguistic abilities, then it would be outright disadvantageous to move much faster than the rest of the group. This suggests that if language evolved primarily as a coordination tool, its evolution was a slow process that required a pressure towards larger groups with more efficient coordination for a protracted period of evolutionary time.
We have presented a model of behavioural synchrony in which we have shown that the ability to communicate with language can be perceived as a free rider problem when the primary purpose of language use is the coordination of the group. We have shown it is possible to portray the ‘punishment term’ of the collective action problem as intrinsic to coordination, and hence the presence of a third-party punisher is not needed. This may explain how language could have emerged despite the fact that it seems to pose a classic free rider problem.
Humans are unique among animals in their propensity to cooperate with non-kin in large groups . Hitherto, research on this topic has been concerned mainly with how the collective action paradox is resolved either spontaneously , via peer [32–34] or pool [35,36] punishment, via reputational mechanisms [11,12], or via costly communication signals [37,38]. A problem, however, has been that it is not clear why or how the human ability to form the groups that produce public goods from the costly contributions of the individuals evolved in the first place .
As a possible solution to this problem, we introduced a model of collective action defined in terms of synchrony [4,21,40] which uses behavioural coordination  on structured social networks . In our model, all agents are naive, and the possible incentive incompatibility in the dyadic interactions is solved by the stability of the dyads that are maintained via frequently repeated interactions. Instead, we observe that even if the objective of the individual was to cooperate, there is a crucial assumption lying behind the cooperation being linguistic: the ability to produce and process language.
Our model shows that if the essence of cooperation is to achieve behavioural synchrony using a language-like signalling process, then the collective action paradox disappears as long as small improvements in the individual's language ability are not forbiddingly expensive. This is in line with recent findings on the effect of group synchrony on human physiology [43,44], and suggests that the evolution of language, sensitivity towards physical synchrony and the propensity for cooperation could well have emerged in a coevolutionary process.
Why is language so rare, when it is so useful? Our model suggests that the answer may be that it has to provide an evolutionary benefit right at the outset of the emergence of communication, and continue to do so all the way through the evolutionary process. This seemingly trivial point translates into the observation that for complex language to emerge, there must be an ecological advantage of increased group size that outweighs the cost of further precision of communication. This suggests that human language must have evolved stepwise with progressively increasing complexity. This is in line with evidence from primates that both gestural and vocal communication channels respond in just this way to increasing group size. Among anthropoid primate species, the size and complexity of the vocal repertoire correlates with social group size , and the same appears to be true of gestural communication , although this may be subject to a lower asymptotic limit . Similarly, among chickadees (a North American passerine) vocal complexity increases facultatively with social group size in both the wild and captivity .
The results allow us to draw four important novel conclusions, all of which alter the conventional wisdom on language evolution. First, our model relies on language as a mechanism for coordinating behaviour, not for the exchange of factual information: doing so allows language to evolve explicitly as a social function in a way that builds on the natural functions of gestural and vocal communication in primates. This does not mean that language cannot be used for the exchange of information, but it does suggest that this was a secondary function.
Second, at the same time, this approach circumvents the collective action paradox in the sense that a ‘punishment term’, which in conventional models implies a third-party punisher, can—in the case of language—be perceived as a naturally intrinsic phenomenon of language.
Third, the acquisition of efficient language facilitates the evolution of progressively larger social groups (which becomes critical when community size determines the benefits of sociality). The fact that language explicitly allows agents to coordinate larger groups has not previously been considered, either in terms of language evolution or for the evolution of cooperation.
Fourth, the process of language evolution is, at least initially, slow and cumulative, requiring language communities to evolve increasing competences stepwise. Hitherto, the consensus has been that language must have evolved rapidly . Our results suggest that this is not necessarily so; indeed, our model suggests that evolving capacities that are significantly better than those of the rest of the community may actually be disadvantageous.
These results have important implications for how we view both human sociality and the evolution of cooperation in the human context, both of which have been of considerable interest to the social and evolutionary sciences. The conventional view of human sociality, in particular, is that members of a group exchange resources or information of some kind for which they pay a cost, with dyadic exchanges being the typical model for all such interactions. In such cases, they must recover that cost either by an equivalent trade (I give you apples in exchange for your oranges) or by the reciprocation of the benefit over time (I help you out now, but I expect you to help me out in return at some point in the near future). This is essentially an economist's view of sociality (cooperation as a trading relationship), and we suggest that it has been the main cause of the collective action paradox which has bedevilled attempts to explain the evolution of large-scale cooperation. By contrast, our model is based on the view that, in the anthropoid primate and human cases (and perhaps those of a number of other intensely social taxa), cooperative solutions to key ecological problems are an emergent property of behavioural synchrony at group level [4,49]. In primates, the principal selection factor favouring group-living is protection against predators [50–52], and continues to be so in humans , where predators are to be understood as being either conventional carnivores or conspecific raiders [54–56]. However, solving that ecological problem requires that the individual animals can form and maintain a coordinated, cohesive social group.
In such cases, the effectiveness of defence is provided by the size of the group, and the size of the group is simply a function of how well individuals can coordinate their behaviour so as to remain together. The cost that is paid for doing so is minimal, because individuals are not explicitly involved in active defence—it is the simple presence of the group that deters predators and raiders, not the actions of individual group members [51,57]. More importantly, they are not engaged in exchanging any kind of commodity (including information), and therefore individual animals cannot face the prospect that others might contribute less than they do. As a result, the free rider problem that gives rise to the collective action paradox is not so intrusive: an individual cannot ‘pay’ more or less than anyone else for the benefit gained from defence—it simply joins the club or it does not, and it benefits accordingly (and may pay a heavier cost if it doesn't join by incurring a higher predation risk if it forages alone or in a smaller group [51,57]). Taken together, these results imply that human sociality is less about dyadic exchange and more about achieving social coordination at the community level.
All results are contained by the model, no data was used.
T.D.-B. and R.I.M.D. had conceived the paper together. T.D.-B. designed the mathematical model, as well as coded and ran the simulations, analysed the results. T.D.-B. and R.I.M.D. drafted and edited the paper together.
The authors have no competing interests.
T.D.-B. and R.I.M.D. were supported by the European Research Council Advanced Grant to R.I.M.D. under grant no. 295663. In addition, T.D.-B. was supported by the European Union FP7 under grant number 288021, and by Trinity College Oxford.
The authors thank Balázs Szendröi, Anna Rotkirch, Max van Duijn, Isabel Behncke Izquierdo and James Carney for their helpful comments.
- Received April 13, 2016.
- Accepted November 18, 2016.
- © 2016 The Authors.
Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.