## Abstract

Vast numbers of scientific articles are published each year, some of which attract considerable attention, and some of which go almost unnoticed. Here, we investigate whether any of this variance can be explained by a simple metric of one aspect of the paper's presentation: the length of its title. Our analysis provides evidence that journals which publish papers with shorter titles receive more citations per paper. These results are consistent with the intriguing hypothesis that papers with shorter titles may be easier to understand, and hence attract more citations.

## 1. Introduction

Written communication is now being recorded online on a massive scale [1–5]. Colossal amounts of data on collective information gathering and distribution via online services such as *Twitter* [6–9], *Wikipedia* [10–13], *Google* [14–17], news services [18] and even large digitized collections of books [19–21] can now be analysed, widening our understanding of economic decision-making [11,14,16], human conflict [7,12] and natural disasters [22,23].

Scientific endeavours also generate extensive written communication, in the form of papers. We define a paper to be more successful than others if it has received a greater number of citations. The online database *Scopus* contains citation records of papers, offering remarkable insights into academic conversation. Recently, advances have been made in quantifying scientific output based on publication statistics [24–28]. A number of studies have provided evidence that the long-term success of scientists depends on their early publications [29,30]. Further analyses have indicated that a paper's success can be partially predicted by its early success [31–33] as well as the reputation of the authors [34]. In addition, papers in particular academic domains gain more citations than others [35].

Here, we consider whether we can find any evidence that the style in which a paper is written may relate to its success. Specifically, we consider the length of the article title chosen by the authors and investigate whether the length bears any relation to the number of citations. Previous studies have explored different characteristics of scientific paper titles [36–41]. A subset of these studies have focused on identifying stylistic attributes of academic writing and the use of a colon or question in a paper's title [36–39]. Those which have investigated the relationship between the length of an article's title and the number of citations it receives have been limited to relatively small samples, up to a maximum of 2200 papers [40,41]. These analyses have reported conflicting results, with one study suggesting that papers with longer titles might receive more citations [41] and another finding no evidence of a relationship [40]. Here, we exploit data on a much larger sample of 140 000 papers in order to investigate whether a paper's title length bears any relation to the number of citations it receives.

## 2. Results

We analyse data provided by *Scopus*, one of the leading bibliometric platforms. A *Scopus* user can search and export data on journal articles in batches of 20 000 records, including data on how often each article has been cited since publication. We download data on the 20 000 most cited papers in each year between 2007 and 2013.

We determine the number of characters in each paper's title, including spaces and punctuation. Using the year 2010 as an example, we rank the papers' title length and citations (figure 1*a*). Upon visual inspection, there appears to be a high concentration of papers with short titles and many citations, as well as a high concentration of papers with long titles and few citations. We find that for the top 20 000 most highly cited papers published in 2010, papers with shorter titles receive more citations (Kendall's *τ*=−0.07, *N*=15 395, *p*<0.001). We apply the same analysis to each year in our sample and find that papers from all years exhibit this relationship between their title length and citations (figure 1*b*; all *τ*s <−0.042, all *p*s <0.001, *α*=0.05, Kendall's *τ* correlation with false discovery rate (FDR) correction).

Some journals may attract a greater number of citations for their papers owing to their reputation. To remove any potential influence of the journal in which a paper is published on the relationship between citations received and paper title length, we rank all of the papers in terms of the number of citations received and transform these ranks into percentiles. We calculate percentiles in terms of the length of papers' titles in the same fashion. In this transformed dataset, for papers published in 2010, we find that papers with shorter titles receive more citations (figure 1*c*; *τ*=−0.020, *N*=15 395, *p*<0.001, Kendall's *τ* correlation). Again, we run parallel analyses for the 20 000 most cited papers in each year between 2007 and 2013. For years 2007–2010, we find that papers with shorter titles receive more citations, whereas papers published during 2011–2013 do not (figure 1*d*; for years 2007–2010, all *τ*s <−0.016, all *N*s >14 791, all *p*s <0.01; for years 2011–2013, all |*τ*|s <0.01, all *N*s >15 396, all *p*s >0.05; Kendall's *τ* with FDR correction). These smaller *τ*s suggest that the journal in which a paper is published may help explain the relationship between paper title length and the number of citations the paper receives.

To investigate this hypothesis further, we group papers by their journal. Again, using 2010 as an example, we calculate the median number of citations and median title length for each journal. We find that journals which published papers with shorter titles also tend to receive more citations per paper (figure 2*a*; Kendall's *τ*=−0.19, *N*=361, *p*<0.001). Parallel analyses for papers published in each year between 2007 and 2013 show that this relationship holds for papers published in all 7 years in our sample (figure 2*b*; 2012: *τ*=−0.1, *N*=320, *p*<0.05; 2013: *τ*=−0.11, *N*=352, *p*<0.01; all other years: all *τ*s≤−0.14, all *p*s <0.001, *α*=0.05; Kendall's *τ* correlation with FDR correction). Finally, we carry out a complementary aggregated analysis across all years of data in our sample. We rank all papers published in a given year by citations received and by title length, and transform these ranks into percentiles for that year. Again, we find that journals which publish papers with shorter titles also tend to receive more citations per paper (figure 3; *τ*=−0.19, *N*=625, *p*<0.001, Kendall's *τ* correlation).

Our primary analysis is based on rank-based statistics. To complement our analysis, we fit a mixed-effects model to the log of the number of citations a paper receives as a function of its title length controlling for the journal in which each paper is published. A mixed-effects models allows us to control for the journal in which each paper is published. We define our model as
*c*_{j,p} is the number of citations received by paper *p* published in journal *j*. The distribution of citations received by a paper is highly positively skewed. For this reason, we log these citation counts, so that the distribution of the residuals of our model, *ϵ*, is closer to a Gaussian distribution. The grand intercept is *I*, whereas *I*_{j} is an intercept for each journal. There is a fixed slope *L* for the number of characters in the title *l*_{j,p} for paper *p* published in journal *j*. There is also a journal-level random effects slope for the title length *L*_{j}. We fit the model for each year using maximum likelihood. We find that papers published during 2007–2011 with shorter titles tend to receive more citations while those published during 2012 and 2013 do not (for years 2007–2010: all *t*s <−3.832, all *p*s <0.001; 2011: *t*=−3.314, *N*=345 *p*<0.01; 2012–2013: both *t*s <−0.251, both *p*s >0.05; *t*-test on slope *L* with FDR correction). The values of the slope *L* are given for all years in table 1.

Again, we investigate if this relationship exists when aggregating papers by the journal in which they are published. We fit a linear regression model to the median number of citations papers receive per journal as a function of the median title length. We define our model as
*c*_{j} is the median number of citations received by papers published in journal *j*. The intercept is *I*, and there is a slope *L* for the median number of characters in the titles of papers *l*_{j} published in journal *j*. Again, we log the citation counts, so that the distribution of the residuals of our model, *ϵ*, is closer to a Gaussian distribution. We fit the model for each year. We find that journals which publish papers with shorter titles also tend to receive more citations per paper (for years 2007–2011: all *t*s <−4.215, all *p*s <0.001; 2012–2013: both *t*s <−2.022, both *p*s <0.05; *t*-test of slope *L* with FDR correction). The values of the slope *L* are given for all years in table 1.

## 3. Discussion

In this study, we investigate whether the length of a scientific paper's title is related to the number of citations it receives. We analyse the 20 000 most highly cited papers for the years 2007–2013, representing a sample size between 1.12% and 1.53% of all papers published in each of these years. Previous studies analysing much smaller sets of papers have reported conflicting evidence, suggesting either that the length of a paper's title bears no relation to its scientific impact [40], or that longer titles can be linked to greater citation counts [41].

Our analysis suggests that papers with shorter titles do receive greater numbers of citations. However, it is well known that papers published in certain journals attract more citations than papers published in others. When citation counts are adjusted for the journal in which the paper is published, we find that the strength of the evidence for the relationship between title length and citations received is reduced. Our results do however reveal that journals which publish papers with shorter titles tend to receive more citations per paper.

We propose three possible explanations for these results. One potential explanation is that high-impact journals might restrict the length of their papers' titles. Similarly, incremental research might be published under longer titles in less prestigious journals. A third possible explanation is that shorter titles may be easier to understand, enabling wider readership and increasing the influence of a paper.

Our findings provide evidence that elements of the style in which a paper is written may relate to the number of times it is cited. Future analysis will investigate whether further stylistic attributes of the language used in a paper can be related to the number of citations it receives.

## 4. Methods

We retrieve bibliometric data from *Scopus* (http://www.scopus.com) between 21 October 2014 and 14 November 2014. To obtain data on the 20 000 most cited papers published in each of the 7 years from 2007 to 2013, we search for any papers that are marked by *Scopus* as an ‘article’ with the following search query:

DOCTYPE(ar) AND PUBYEAR = {year},

where {year} is replaced by each of the years 2007–2013. In total, we retrieve 140 000 records. In 2007, *Scopus* reports 1 302 973 published papers which increases to 1 788 065 papers in 2013. The top 20 000 most cited papers published in each year represent a sample of 1.53% in 2007, decreasing to 1.12% in 2013.

Some journals are referred to with multiple variations of their name (for example, ‘*Analyst*’ and ‘*The Analyst*’). For this reason, we clean the dataset from *Scopus* by deleting leading ‘The’s from each journal's title, and converting the title to lower case. We also identify all journals which have fewer than 10 papers in the most cited 20 000 papers for a given year, and remove the papers in such journals for that year. The basic characteristics of our dataset before and after cleaning are depicted in the electronic supplementary material, figure S1.

## Data accessibility

Datasets used in this study are available via the Dryad Repository (doi:10.5061/dryad.hg3j0).

## Authors' contributions

A.L., H.S.M. and T.P. performed analyses, discussed the results and contributed to the text of the manuscript.

## Competing interests

The authors declare no competing financial interests.

## Funding

The authors acknowledge the support of Research Councils UK Digital Economy via grant no. EP/K039830/1.

- Received June 26, 2015.
- Accepted July 27, 2015.

© 2015 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.