## Abstract

Railway suicide is a significant public health problem. In addition to the loss of lives, these suicides occur in public space, causing traumatization among train drivers and passengers, and significant public transport delays. Prevention efforts depend upon accurate knowledge of clustering phenomena across the railway network, and spatial risk factors. Factors such as proximity to psychiatric institutions have been discussed to impact on railway suicides, but analytic evaluations are scarce and limited. We identify 15 hotspots on the Austrian railway system while taking case location uncertainties into account. These hotspots represent 0.9% of the total track length (5916 km/3676 miles) that account for up to 17% of all railway suicides (*N*=1130). We model suicide locations on the network using a smoothed inhomogeneous Poisson process and validate it using randomization tests. We find that the density of psychiatric beds is a significant predictor of railway suicide. Further predictors are population density, multitrack structure and—less consistently—spatial socio-economic factors including total suicide rates. We evaluate the model for the identified hotspots and show that the actual influence of these variables differs across individual hotspots. This analysis provides important information for suicide prevention research and practice. We recommend structural separation of railway tracks from nearby psychiatric institutions to prevent railway suicide.

## 1. Background

Suicide is a considerable public health problem. According the World Health Organization [1], more than 1 million individuals die by suicide globally each year. In Western countries, suicide ranks consistently among the top three causes of death among young individuals, resulting in a tremendous social, psychological and economic burden to individuals, families and society. Railway suicide is a suicide method that involves the jumping or lying in front of a train in order to die by suicide. These suicides typically occur in public locations, thereby affecting and traumatizing a large number of individuals including train drivers, passengers and clean-up personnel. Moreover, railway suicides result in significant delays in public transport. For these reasons, these suicides have received a specific focus in suicide prevention initiatives [2–5]. Individuals dying by railway suicide have been found to be of younger age as compared to the average suicide decedents [6–8], more often involve individuals with severe mental illness [6,9,10], and suicide decedents typically reside close to the railway tracks [6,11].

Density-based clustering of railway suicides has been reported in the literature [7,12]. In particular, suicides seem to be more frequent close to psychiatric institutions [9,10,12], and also railway parameters including high train frequency have been found to be associated with railway suicides in some studies [13,14]. Some [11], but not all [7] studies suggest that railway suicides increase with increasing population density close to the railway. Media and social contagion effects may play a role in the formation of locations with high density of railway suicide events [15,16], so-called hotspots.

Studies on the clustering of suicides, in general, and on area-related factors that impact on railway suicides specifically have been hampered by a number of limitations. First, correlations between variables such as population density and railway suicides have often been assessed on an aggregate level over time, without looking at spatial variations of densities across the whole length of the railway track. Second, probable measurement errors regarding the specific locations of individual railway suicides have not been taken into account. Finally, models to estimate the relative influence of different factors have typically not controlled for both area-related population and railway factors, and have not been validated with adequate simulation techniques.

When it comes to the assessment of clustering on railway networks, an additional challenge in the field of suicide prevention research has been that any simulation needs to take into account that cases can only occur along the railroad network. Here, a careful consideration of distance measurement between cases and objects such as psychiatric institutions needs to be made: distance can be measured using Euclidean (air-line) distance or along the railroad network, e.g. by measuring the length of the shortest path from one point on the network to another. The choice depends on data availability and research question. We decided to use network-based shortest path distances to measure distances between cases of suicide, because this approach takes the railroad layout and crossings into account.

Using a single, fixed grid introduces a problem known as MAUP (modifiable areal unit problem), i.e. bias introduced by choosing a specific tessellation of the study area or the network [17]. As we also use a grid-based algorithm to estimate the locations of suicide hotspots, we mitigate this problem by averaging over multiple grid definitions with varying origins. For the model simulations, we choose to use a point pattern model that, by definition, is free of MAUP.

We use a density-based clustering approach which (i) can identify spatially connected regions of arbitrary shapes as clusters and (ii) also takes account of the spatial uncertainty of case locations.

*What is a suicide cluster/hotspot?* We aimed to find a definition of a railway suicide hotspot that allows us to identify locations with high case densities while incorporating the spatial uncertainty of case coordinates. As railway suicide hotspot, we define a spatially connected region where the mean density of cases per area is above a threshold. The mean density is calculated separately for each pixel from simulations of a Gaussian noise model created to incorporate the spatial uncertainty of case coordinates. The per-pixel threshold is chosen as the lower 10% quantile from the simulation envelope of the Gaussian noise model. We did not consider background characteristics such as population density (i.e. cases per area unit) for defining clusters, because our interest was on locations with higher overall case numbers, which would be of greatest relevance to suicide prevention efforts.

## 2. Material and methods

### 2.1. Hotspots of railway suicides

*Data on locations of suicides* on the Austrian railroad network from 1998 to 2009 have been obtained from the main Austrian railroad operator ÖBB (*N*=1170).

After removal of cases from Lichtenstein (*N*=4), cases inside trains (*N*=6) or not directly located on the railroad network (*N*=30), 1130 cases of suicide by train impact remain.

Locations are recorded as kilometre-number on the railroad network together with the name of the closest station. This information has been used together with an ÖBB Railway Map [18] and Google Maps (maps.google.com) to obtain the spatial coordinates for each case. We estimate that this procedure has a (Gaussian) measurement uncertainty of approximately 1 km.

*Data on the railroad network* have been obtained from the publicly available Euroglobalmap [19] from 2014. We use the associated metadata to retain only the operational part of the network. We expect the railroad network to remain static over the study period. From 1998 some railroad segments have been closed, but for all suicide cases from 1998 to 2009, a proper network segment exists in the network data from 2014. The total network length is 5916 km (3676 miles).

*Hotspots.* To identify the spatial locations with highest densities of suicides per railroad kilometre, we use box-counting on multiple fixed pixel grids: we divide the study region into a regular grid of quadratic pixels with 1 km side-length and count the number of cases per pixel. To mitigate the introduced discretization error (MAUP) [20], we repeat the procedure 16 times using slightly displaced grids while keeping the pixel size constant (quadrates with 1 km side-length). The displacement is such that the original pixel of 1 km side-length is divided symmetrically into sub-pixels of 250 m side-length.

To take the case location uncertainty of about 1 km into account, we generate 1000 realizations of a Gaussian noise model, i.e. the case locations are displaced along the railroad network using a normally distributed random displacement with *σ*=1 *km*.

For each realization, the box-counting is performed for each of the 16 displaced grids, resulting in a final 250 m wide grid with 16 000 counts per pixel. From this ensemble, we pixel-wisely find the lower 10% quantile of the number of cases. This results in a conservative estimate for the spatial case density distribution.

Finally, we identify the hotspots from the resulting image as eight-connected groups of pixels, i.e. neighbouring pixels with value also above threshold and located sideways, above/below and also diagonally are grouped. Hotspots with 3 pixel minimum border-to-border distance are merged.

We validate the stability of this procedure by using various larger values of grid width. We consistently identify the same hotspots as presented, corresponding to the order of the respective grid width.

### 2.2. Psychiatric institutions and railway suicides

*Data on hospitals* and medical institutions of Austria from 2014 with total bed counts, types of departments and street addresses have been obtained from the Austrian Federal Ministry of Health [21].

The geographical locations of the facilities have been obtained by looking up their street addresses using QGIS [22] with overlay data from Google Maps (maps.google.com), OpenStreetMap (www.openstreetmap.org) or Geoland Basemap (www.basemap.at).

For each institution with a psychiatric department or unit, we calculated the average department size by dividing their total bed counts by the respective number of department types. The resulting psychiatric bed counts are, therefore, proportional to the total number of beds and inversely proportional to the number of department types in the respective facility. This serves as a proxy for psychiatric institution size.

*Testing the possible influence of psychiatric institutions* on railway suicides is done using a threefold approach. First, we test the null hypothesis *H*_{0}: *the locations of psychiatric institutions have no influence on locations of railway suicides*, using constrained-realization Monte Carlo (MC) simulations [23] of *H*_{0} (§(c)). Second, we fit an inhomogeneous Poisson process (IPP) to the point pattern of suicides using maximum-likelihood estimation and then generate random samples from the fitted model to estimate the effect size of the influence of psychiatric institutions to nearby suicides (§(d)). Third, we evaluate the model at the identified hotspots to quantify the excess risk associated with the variables at these locations (§(e)).

### 2.3. Testing the null hypothesis

*Testing H _{0}* is done by randomly reshuffling the locations of hospitals such that the statistical distribution of psychiatric beds along the railroad network remains close to the original statistical distribution (i.e. the distribution calculated from the original geographical locations). This means, we create random configurations of hospital locations while retaining the property that more institutions are usually located in more densely populated areas.

Let *k* be the bin number, *c*(*k*) the bin centre value (of psychiatric bed density) for bin *k*. Let *n*(*k*) and *n*_{0}(*k*) be the frequencies in the randomized and in the original data, respectively.

To measure the deviation of a random configuration (with histogram *n*(*k*)) from the original distribution (with histogram *n*_{0}(*k*)), we use a *weighted* *χ*^{2}-statistic of the histograms, i.e. we calculate
*c*(*k*). The weighting ensures that larger psychiatric institutions carry more weight.

To obtain a test statistic allowing us to decide on the rejection of *H*_{0}, we fit an inhomogeneous Poisson model using the psychiatric bed density, population density and single versus multitrack railway as explaining variables and the actual suicide locations as dependent variable (the exact procedure is explained in §(d)). As test statistic we use the parameter estimate of the psychiatric bed density. This statistic is calculated for 999 random hospital location configurations and compared with the test statistic obtained for the actual hospital locations.

It is required to select an acceptance/rejection threshold for the *χ*^{2}-statistic of the random configurations. A high threshold leads to efficient simulation (fewer rejections), but might also result in inconsistency with the to-be simulated hypothesis *H*_{0}. A low threshold ensures this consistency, but is more inefficient to simulate.

After experimenting with different thresholds, we decide on a threshold of 5000, i.e. accept the random configuration if its *χ*^{2}≤5000, otherwise we reject it. In order to assess whether this threshold still allows for a consistent simulation of *H*_{0}, we determine the sensitivity of the test statistic with regard to *χ*^{2}: we group the surrogates into 12 equally sized groups according to their *χ*^{2}-value. We then compare the influence of each group on the test statistic. A Kruskal–Wallis test did not reject that the groups result in the same values for the test statistic (*p*=0.62; electronic supplementary material, figure S1). Assuming that the group with smallest *χ*^{2} sufficiently constrains the simulation in order to produce correct simulations of *H*_{0}, we conclude that the groups with higher *χ*^{2} values do this as well and that the selected threshold value was adequate.

### 2.4. Fitting an inhomogeneous Poisson model

We model the suicide locations as a spatial point pattern generated by an IPP with a spatially varying log-linear intensity function of the spatial variables:

(a)

*Population density*. We obtained publicly available population data of 2006 from Statistics Austria [24], given in absolute numbers of registered residents on a 1 km wide grid. In order to account for some mobility of the population, we calculate a smoothed population map using Gaussian kernel smoothing with a bandwidth of 1.6 km on a 250 m wide destination grid. As fitting the IPP requires the log-density, we first logarithmize the population data and smooth afterwards.(b)

*Psychiatric bed density*. From the approximate number of psychiatric beds per hospital and the hospital locations (§(b)) we create a map of the spatial density of psychiatric beds using Gaussian kernel smoothing with a bandwidth of 0.8 km. This bandwidth choice accounts for the mobility of psychiatric patients which has been reported before [14]. Again, we need the logarithmic psychiatric bed density when fitting the IPP: as the data are zero-inflated, we first offset them by adding one, then logarithmize and finally apply Gaussian kernel smoothing.(c)

*Single track versus multitrack*. We use a binary variable encoding for single versus multitrack segment from the Euroglobalmap data (§(a)). Typically, multitrack segments are associated with a higher amount of railway traffic. The variable can therefore be seen as a proxy for low versus high traffic density.(d)

*Total suicide rate and socio-demographic factors*. We extract two spatial variables as the most important factors from a principal components analysis (PCA) on input data from various sources and spatial resolutions. These are: (i) mean yearly gross income of employees (in EUR, from 2003, on a district level [25]), (ii) fraction of population older than 15 years with low (no completed secondary education), medium (completed secondary) and with high (completed tertiary) eduction (2011, on a community level [26]), (iii) fraction of male/female population, (iv) fraction of young (age ≤19 years), adult (20 ≤ age ≤ 64) and elderly (age ≥65) population, (v) fraction of persons without Austrian or EU citizenship, (vi) fraction of unemployed persons (iii)–(vi) on a 10 km grid, from 2008 [27]), (vii) the number of suicides in the years 2001–2008 per mean population in the same time span (district level).

To perform the PCA, we sample these variables—together with the population density from above—at 20 000 random points on the railroad network. On these data we perform a population-weighted, centred PCA. The first two components are then used as variables in the analyses.

*Fitting the IPP* is done using the practical maximum-likelihood approach of Baddeley *et al.* [28]. We adapt the procedure to work with a network tessellation instead of a two-dimensional one. The procedure works as follows:

Displace the original cases randomly (normally distributed with

*σ*=1*km*) along the railroad network to take the measurement uncertainty of the case locations (§(a)) into account. These are the*data points*.Select 20 000 random

*dummy points*on the network. The higher the number of dummy points, the better the accuracy of the maximum-likelihood estimates. This number has been chosen high enough to give stable results when repeating this procedure.Compute the network weights

*w*_{i}for data and dummy points, i.e. the length of its network share (*i*numbering the points).Data and dummy points together define a (non-unique) network tessellation, i.e. each point gets assigned its respective share of the network. The tessellating procedure we use is similar to the Voronoi tessellation of a network [29], p. 84, but we also handle

*critical nodes*. A critical node is a point that prohibits mutually exclusive Voronoi segments, i.e. leads to overlapping Voronoi segments. We handle this by dividing the lengths of overlapping segments to their adjacent points proportionally to the inverse shortest-path distance between this overlapping segment and its respective adjacent point.Set the dependent variable

*y*_{i}=1/*w*_{i}for each data point and*y*_{i}=0 for each dummy point.Find the vector of explaining variables

*v*_{j}(*x*_{i}) at each data and dummy point (*j*numbering the variables).Fit a Poisson generalized linear model to find the coefficient estimates

*θ*_{j}using variables*v*_{j}(*x*_{i}), weights*w*_{i}and responses*y*_{i}.

*Incorporating case location uncertainty*. To account for the case location uncertainty, we simulate a Gaussian noise model on the network with *σ*≈1 km 39 times and each time perform the IPP fit described above. The resulting mean coefficient estimates are then used to create 199 simulated realizations of the IPP: we use a regular grid with 250 m wide pixels to simulate the fitted model. From the mean coefficient estimates, the explaining variables evaluated at pixel centres, and the network length per pixel, we create a two-dimensional IPP intensity image. Each realization creates 1130 cases distributed on the map proportional to this intensity image.

As the outcomes of this simulation procedure strongly depend on the grid width (i.e. smaller is more accurate), we use a *Z*-test based on the simulation variance to compare the simulation input coefficients with their corresponding mean simulation coefficients.

*Goodness of fit*. The goodness of fit is estimated by calculating the pair distribution function together with simulation envelopes from the simulated model. The pair distribution function, also called pair correlation function, is the histogram of all pairwise distances of the points of a point pattern, normalized with respect to complete spatial randomness (CSR) [30]. We defined CSR as a homogeneous Poisson process on the railroad network, which has been estimated by simulation of 199 realizations. Note that the point-wise 95%-simulation envelope for CSR was at any point within 1±0.5 (electronic supplementary material, figure S2). The distances were measured as shortest-path distances on the railroad network.

*Temporal pair correlation*. We also take a look at the purely temporal pair correlation functions, i.e. not considering spatial locations of suicides and baselining to a homogeneous Poisson process in time. There is no temporal clustering visible, neither on short nor on long time scales (electronic supplementary material, figure S3).

### 2.5. Evaluating the model at hotspots

Above (§(a)) we identified (groups of) pixels as hotspots. We calculate the excess risk ER_{j} associated with a variable *j* at a given hotspot as the *mean value* of excess risk across the pixels of the respective hotspot.

To calculate the excess risk er_{j}(*x*_{i}) associated with variable *j* at pixel *x*_{j}, we use the log-linear model to find that the intensity

## 3. Results

### 3.1. Hotspots

Table 1 shows the 15 identified hotspots on the Austrian railway network. For each hotspot we provide the number of pixels included, the approximate number of suicide cases, and the distance to the closest psychiatric institution. We also show the evaluations of the population and railway variables for each hotspot, i.e. the excess risks for population density, psychiatric bed density, multitrack versus single-track railway and socio-economic components. We used the first two components of a PCA, explaining 89.8% of the variance, with component 1 loading high on high education and component 2 loading high on total suicide rates.

Figure 1 shows their locations represented as dots. The size of the dots corresponds to the approximate number of railway suicide cases.

### 3.2. Inhomogeneous Poisson model

Table 2 shows the maximum-likelihood estimates (means and standard deviations) based on 199 simulations. The *Z*-scores of the simulation input parameters in relation to the statistical distribution resulting from the simulated model indicate that the discretization error of the simulation is negligible, i.e. the grid width is sufficiently small.

Figure 2 compares the pair distributions from the original point pattern with pair distribution of the simulated model. The figure illustrates (i) the clear difference of the suicide pattern from CSR, (ii) that the suicide point pattern seems generally well represented by the fitted model, and (iii) a possible non-modelled over-clustering at very small distances of up to a few kilometres.

### 3.3. Psychiatric institutions

We create surrogate data from the null hypothesis using constrained-realization MC 999 times and calculate the test statistic. The result is approximately normally distributed with *μ*=−0.58±0.02 and *σ*=0.57±0.02. The test statistic for the original data is 0.74 which leads to rejection of *H*_{0} with *p*<0.011.

## 4. Discussion

### 4.1. Main findings

In this study, we identified several hotspots of railway suicides along the railway network. The 15 hotspots identified in this study comprised between 111 and 190 railway suicides, which corresponds to 9.8%–16.8% of all railway suicides in the observation period. The specific locations were spread across the entire country, with eight of nine federal states having at least one hotspot, covering approximately 54 km of the railway track in total (0.9% of the entire track length). A previous Dutch study on railway suicides similarly found that 16% of all railway suicides occurred on just 33 km of the Dutch trail track, which was about 1.2% of the total track length [7].

In order to model railway suicides we assumed a smoothened IPP. We tested the influence of psychiatric bed density, single versus multitrack railway; as well as population density on railway suicides. We controlled the model for two principal components of a wide range of spatial socio-economic factors, including population income level, education, proportion of males and females, age groups, proportion of residents without EU citizenship, unemployment rates, as well as total suicide rates. The two components used loaded high on education (component 1) and total suicide rates (component 2), as well as moderately on income level (component 2).

The findings indicated that railway suicides were associated with proximity of psychiatric institutions in walking distance to the railway network. The closer and higher the number of psychiatric beds in the proximity of the railway network was, the more suicides occurred on the railway track. An evaluation of the model for the specific hotspots indicated that in 5 of 15 identified hotspots, psychiatric institutions were not more than 1 km away from the railway. The excess risk associated with proximity to psychiatric institutions was larger than 1.5 in 6 of 15 hotspots, and ranged from 1.5 to 2.9 depending on the hotspot, indicating a small to medium effect size. A validation of the model with a randomization test which shuffled psychiatric institutions on the railway network taking account of the population density in the specific areas, confirmed this result. Several previous studies have described that proximity of railway tracks to psychiatric institutions may increase the number of railway suicides in that area [7,9,12]. However, none of these studies have applied the present three-step methodological approach. In a Dutch study, it turned out that all locations on the network with the highest frequency of suicides per kilometre had psychiatric institutions within 800 m walking distance [7]. Similarly, a study from Germany indicated that 75% of the German railway locations with highest suicide density had psychiatric institutions nearby [12]. These studies were descriptive, and did not control for any demographic or railway factors. A survey study from Brisbane, Australia indicated that 48% of railway suicides occurred close to a psychiatric institution, and 57% of decedents had been treated for mental health problems [9].

Also consistent with earlier research, local population density was associated with railway suicides along the railway track [4]. In the UK, railway suicides were typically found around major cities with high population density [11]. By contrast, the Dutch evaluation of railway suicides found similar rates of train suicides in three regions with different population densities, with a lower rate in the most densely populated region [7]. The authors of this study concluded that tracks may be less accessible in Dutch areas with high population density. However, similar to the other previous analyses, this analysis did not take small-area variations in population densities and railway suicides into account. In the present analysis, the individual evaluation of 15 hotspots indicated that population density was relevant to all of these locations, with excess risks ranging from 4.2 to 14, which corresponds to medium to large effect sizes.

We also found an independent association between multitrack lines on the Austrian railway network and railway suicides, with excess risks ranging from 1.0 to 1.4 for the individual hotspots. Previous studies analysed if the availability of trains, or passenger numbers were associated with railway suicide and showed mixed results [7]. The previously mentioned analysis of Dutch railway suicides did not identify a correlation between railway suicide and railway density or intensity of railway transportation as indicated by the variations in railway suicides across four areas with different rail density [7]. By contrast, railway suicides in England and Wales were correlated with the growth of the railway system over time [35].

Total spatial suicide rates, which had a very high loading in the second principal component used in the analysis, were associated with railway suicides at some but not all of the identified hotspots. Excess risks were negative or positive, depending on the individual hotspot, and overall sizes of risk estimates were small. A previous study identified decreasing total suicide rates but increasing railway suicides in a 10-year period of 1991 to 2000 in Germany [36]. Similarly, Dutch railway suicides did not show the same longitudinal time patterns as total suicide rates [7]. The small and inconsistent relationship of the spatial distributions of total suicides and railway suicides in the present study highlights that the impact of total suicides on railway suicides is low, inconsistent, and, if there is any at all, only relevant at selected hotspots.

Spatial socio-economic factors, particularly high level of education, which loaded strongly in the first principal component used in this analysis, were related to railway suicide at several hotspots, but again, the associations were not consistent. At 13 of the 15 hotspots, there was a small direct association between high level of education and railway suicides, suggesting that areas with higher level of education of the population are more affected by railway suicide. Interestingly, these 13 hotspots were all located close to urban centres, whereas the two hotspots with divergent findings were located in rural areas. Education may reflect an uncontrolled effect of urbanity. We are not aware of any study assessing spatial associations of socio-economic variables with railway suicides, but a meta-analysis investigating the association of spatial socio-economic conditions with total suicide rates indicated substantial heterogeneity of findings, with 30% of studies with significant findings reporting direct associations and 70% reporting inverse associations.

A crucial step in any analysis on potential clustering is the definition of a cluster. A recent review of studies on suicide clustering identified that only a third of 82 included studies defined what they meant by clusters, and only 27 studies provided data as to how many suicides may have occurred in a cluster [37]. In this study, we defined railway suicide hotspots (or clusters) as a spatially connected region where the median density of cases per area was above a threshold calculated from simulations of a Gaussian noise model created to incorporate the spatial uncertainty of case coordinates. The threshold was chosen as the lower 10 per cent quantile from the simulation envelope of the Gaussian noise model. This is different from the Centers for Disease Control’s definition of suicide clusters as ‘a group of suicides that occur closer together in time and space than would normally be expected in a given community’ [38]. We neither considered background characteristics such as population density (i.e. cases per area unit), nor did we use a time component in our present definition, because our interest was on locations with the overall highest case densities, which would be of greatest relevance to suicide prevention efforts on the railway network.

The present model, while having a good model fit, indicated some potential over-clustering of railway suicides at small distances of up to 20 km. This unexplained over-clustering at small distances may be due to specific environmental characteristics not controlled for in the model or due to imitative behaviour, which has been shown to be typically limited to areas of up to 50 km radius maximum [39]. Imitative behaviour can occur due to social contagion effects [39]. For example, media reporting on a railway suicide may trigger further suicides in the same location, which may again increase media reporting, resulting in a vicious circle of heightened media attention and more suicides [15]. Contagion effects typically show not only a spatial clustering but also a time clustering, i.e. their duration is limited. A review of studies on suicide clustering indicated that the median duration of clusters was 11 months, with a range of 0–24 months [40]. Owing to the absence of any visible time clustering in this study, our analysis does rather point to structural, more permanent factors that impact on the clustering of railway suicides at small distances in specific areas. These factors may either be population- or area-related.

### 4.2. Implications for research and prevention

This study has several implications for future research in the area of suicide clustering on networks such as on railway tracks. It shows that standard methods are applicable in this area which has been characterized by a lack of standardized statistical approaches to data analysis. Future studies may use this study as a basis for further in-depth analysis of spatial distributions of suicides on networks.

There are several important implications for suicide prevention. First, this study shows that a significant proportion of suicides occurs on a very small proportion of the network. Barriers and other structural measures may be applicable to prevent railway suicides at these hotspots [4]. In this context, particularly psychiatric institutions need to be structurally separated from railway tracks. However, this study also highlights that not all locations with relatively high railway suicide densities show the same characteristics. While population density seems to play a major role at several locations on the Austrian railway network, vicinity to psychiatric institutions seems to play a stronger role at other hotspots. A careful site inspection of the identified hotspots seems warranted in order to assess specific conditions and develop adequate prevention techniques tailored to the specific hotspots. Beside fences and spatial separation, media collaborations to prevent imitative effects, as well as speed control have been proposed as preventive measures in the literature [4]. Further, front modifications of trains as well as wheel and rail designs may contribute to the prevention of lethal consequences of train–person collisions [4]. Most of these measures have not been evaluated with regard to their effectiveness for suicide prevention. In Switzerland, barriers on bridges have been shown to significantly reduce suicidal behaviour [41]. Suicides at nearby bridges within 5 min walking distance did not increase subsequently. Similarly, a minimal structural intervention of installing guard rails at hospital windows significantly reduced the number of patients dying by jumping out of windows [42]. A meta-analysis of structural interventions at suicide hotspots confirmed that there was an overall preventive effect of these interventions on suicides by jumping [43].

### 4.3. Strengths and limitations

This study has several strengths. With regard to the identification of hotspots, using a single, fixed grid introduces a problem known as MAUP, i.e. a bias introduced by choosing a specific tessellation of the area or the network. Therefore, during the hotspot estimation, we averaged over multiple grid definitions with varying origins. For the model simulations, we used a point pattern model that, by definition, is free of MAUP [17]. In recent studies on suicide clustering [37,39,44], statistical software such as SatScan has been used, which basically works by moving windows of elliptical shape and varying size over the study region and identifies clusters using a statistical criterion [20]. Because the railway represents a network and suicides can only occur on that network, the present density-based clustering approach was superior in that (i) it allowed the identification of spatially connected regions of arbitrary shapes as clusters and (ii) it also allowed to take into account the spatial uncertainty of case locations.

Also the modelling technique used here has several strengths. While earlier studies typically assessed correlations between parameters such as population density and railway suicides on an aggregate level over time, we were able to investigate spatial variations of densities across the whole length of the railway track. The present statistical modelling allowed us to take probable measurement errors regarding the specific locations of individual railway suicides into account. The adjustment of the model for socio-demographic variables as well as for the total suicide rate is a strength, as well as the validation of the model using randomization tests.

Despite many strengths, this study also has some limitations. Not all socio-demographic variables were available at the level of communities; therefore, we needed to use some district data, which may introduce bias [45]. Further, the density of psychiatric beds was an approximate measure based on the average number of beds per department type. This measure reflects the average department type size of a given institution. Actual psychiatric bed numbers may deviate from the present densities. Future studies should use more accurate data for the size of psychiatric institutions. However, also average bed numbers per department type as calculated here might be related to suicide hotspots independent of the specific size of the psychiatric departments.

Importantly, no information on socio-demographic and morbidity-related characteristics of the suicide decedents was available. We do not know if the suicide decedents had a mental illness or if they were patients at the respective institutions. Finally, the measurement of distances between the rail track and psychiatric institutions in terms of Euclidian distance (air-line) may not be entirely accurate, because this assumes that individuals can move freely in all directions. Future studies may use other network information, e.g. the sidewalk structures, to determine distances between objects and the railway network.

Finally, the model might have been stronger if some measure of existing suicide barriers (e.g. walls and fences) could have been added. For example, some psychiatric institutions might be physically separated from tracks. These barriers may effectively reduce the risk of suicide at these locations. Unfortunately, data on existing barriers were not available for this study.

## 5. Conclusion

For the first time, we show here that spatial clustering of suicide hotspots can be understood with unprecedented precision through models for point processes. This has immediate consequences for suicide prevention. The study indicates that a considerable number of railway suicides occur on a very short fraction of the total network. Vicinity to psychiatric institutions, population densities, as well as multitrack parts of the network are particularly relevant and warrant attention in tailored prevention efforts. Spatial socio-economic factors and total suicide rates are not consistently associated with railway suicides.

## Ethics

This study was approved by the Ethics Committee at the Medical University of Vienna, Austria (EC no. 1050/2016). Informed consent was not required for this study.

## Data accessibility

Owing to the sensitive nature of the data used in this analysis, we received an exception from the Journal’s data sharing policy regarding the sharing of primary data. We requested this waiver particularly for our primary data on the specific locations and dates of individual railway suicides. These data are sensitive in nature. They allow one to identify individual cases of suicides. In the paper, we do not provide specific locations for individual acts and only approximate locations for the spots with the highest number of cases. Individuals interested in the data can contact the Austrian Federal Railways (ÖBB) who is maintaining the dataset. In order to allow other researchers to apply and test our approach, we publish the code used in the three analytic sections of the paper, along with surrogate data, and with supporting documents at: https://github.com/mstrauss/hotspots/. This includes the code for all three parts of the analysis, i.e.

the identification of hotspots, taking account of the case location uncertainty;

the simulation of the null hypothesis (i.e. spatial vicinity to psychiatric institutions is not associated with railway suicides); and

the evaluation of the model for the individual hotspots.

## Authors' contributions

M.J.S. planned and carried out the statistical analysis, researched literature, drafted the manuscript. P.K. supported the statistical analysis/methodological approach. G.S. collected the railway suicide data. T.N. drafted the manuscript, collected literature, coordinated the study. All authors revised the manuscript for important intellectual content and approved the final version of the manuscript for submission.

## Competing interests

We declare we have no competing interests.

## Funding

We received no funding for this study.

## Acknowledgements

We thank Dr Johanna Gerngross and Ing. Peter Kleinschuster from the Austrian Federal Railways (ÖBB) for their assistance in early phase of developing this study. We thank Statistics Austria for providing spatial socio-economic data and the Ministry of Health (BMG) for data on medical institutions. Railway network data were obtained from EuroGeographics. Original product is freely available at www.eurogeographics.org. Terms of the licence available at http://www.eurogeographics.org/form/topographic-data-eurogeographics.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3699067

- Received September 26, 2016.
- Accepted February 6, 2017.

- © 2017 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.