Lognormal Lorenz and normal receiver operating characteristic curves as mirror images

The Lorenz curve for assessing economic inequality depicts the relation between two cumulative distribution functions (CDFs), one for the distribution of incomes or wealth and the other for their first-moment distribution. By contrast, the receiver operating characteristic (ROC) curve for evaluating diagnostic systems depicts the relation between the complements of two CDFs, one for the distribution noise and the other for the distribution of signal plus noise. We demonstrate that the lognormal model of the Lorenz curve, which is often adopted to model the distribution of income and wealth, is a mirror image of the equal-variance normal model of the ROC curve, which is a fundamental model for evaluating diagnostic systems. The relationship between these two models extends the potential application of each. For example, the lognormal Lorenz curve can be used to evaluate diagnostic systems derived from equal-variance normal distributions.


Summary
The Lorenz curve for assessing economic inequality depicts the relation between two cumulative distribution functions (CDFs), one for the distribution of incomes or wealth and the other for their first-moment distribution. By contrast, the receiver operating characteristic (ROC) curve for evaluating diagnostic systems depicts the relation between the complements of two CDFs, one for the distribution noise and the other for the distribution of signal plus noise. We demonstrate that the lognormal model of the Lorenz curve, which is often adopted to model the distribution of income and wealth, is a mirror image of the equal-variance normal model of the ROC curve, which is a fundamental model for evaluating diagnostic systems. The relationship between these two models extends the potential application of each. For example, the lognormal Lorenz curve can be used to evaluate diagnostic systems derived from equal-variance normal distributions.

Introduction
The Lorenz curve of economics and the receiver operating characteristic (ROC) curve of psychology and medicine are closely related analytic tools. The Lorenz curve is primarily used to assess economic inequality, and the ROC curve to evaluate diagnostic systems. In this paper, we review the relation between the two most common forms of these curves-the lognormal model of the Lorenz curve and the normal model of the ROC curve.
The general relation between curves of this kind has been developed by Bamber [1], who did not consider the specific case of Lorenz curves; rather, he reviewed ordinal dominance curves [2], of which Lorenz curves are an important special case. Given two continuous random variables, X and Y, a point on a graph of an ordinal dominance curve for X and Y is located at P(X ≤ c) on the horizontal axis and P(Y ≤ c) on the vertical axis. By comparison,

The Lorenz curve
A Lorenz curve that represents the distribution of income or wealth in a society may be defined as a curve '. . .whereby the percentages of the population arranged from the poorest to the richest are represented on the horizontal axis and the percentages of income enjoyed by the bottom x% of the population is shown on the vertical axis' [3, pp. 29-30]. Although Lorenz curves are mostly used to measure inequality in the distribution of income, they have proved valuable analytic tools in other fields as well, including measuring diversity in plant populations and assessing inequality in health among individuals and groups [4,5].
Kendall & Stuart [6] introduce the Lorenz curve by means of a pair of parametric equations in x, a positive variable associated with income or other economic quantity. The equations are specified in terms of the probability density function f (t): where μ is the mean of f (t).

The receiver operating characteristic curve
The ROC curve evaluates how well a diagnostic system can distinguish between two events, such as how well a glucose tolerance test differentiates the presence from the absence of diabetes, or how well a psychological checklist differentiates potential recidivists from law-abiding persons. ROC curves have many other applications, including statistical analysis (e.g. [8]). In view of the provenance of ROC analysis in detection theory [9], the events are often designated 'signal' and 'noise' and abbreviated 's' and 'n'. The vertical axis of an ROC graph is usually called the hit rate, and, as is well known, it is a conditional probability for correctly distinguishing between two events, A and B. It is the probability that, given that Event A occurred, Event A was correctly identified as having occurred. We denote this conditional probability as P H . Similarly the horizontal axis of an ROC graph is often called the false-alarm rate. It is the conditional probability that, given that Event B occurred, it was mis-identified as Event A. We denote this probability as P F . (The other two possible outcomes-misses and correct rejections-are complements of hits and false-alarms, and so provide no additional information.) Parametric equations for the ROC are where f s (x) and f n (x) are probability density functions associated with s and n, respectively, and the criterion c is set by the diagnostic system. Note that the integration limits depend on the domain of the distribution assumed.

An equation for the lognormal Lorenz curve
The equation for the Lorenz curve, L(p), is and where 0 ≤ p ≤ 1.
To find a function for the lognormal Lorenz curve, a suitable parametrization of the lognormal distribution is required. The two-parameter version has two properties that commend it for modelling income: it is valid only for positive values, and it has a long upper tail, both of which usually characterize the distribution of income.
The lognormal distribution is so named because it is a distribution whose logarithm is normally distributed. If Y = ln(X) is a normally distributed random variable with two parameters (mean = μ and variance = σ 2 ), then X is said to be lognormally distributed with two parameters, μ and σ 2 . The probability density function for a two-parameter lognormal distribution [10, eqn (2.5)] is The lognormal CDF with parameters in logarithmic space of μ and σ 2 can be written as where Φ(·) is the normal distribution function. The CDF for the first-moment distribution of the lognormal distribution is given by the theorem [10] that the jth moment distribution function of a lognormal distribution with parameters μ and σ 2 is itself a lognormal distribution with parameters μ + jσ 2 and σ 2 , where here j = 1. Thus An equation for the lognormal Lorenz curve can be obtained by solving equation (5.1) for ln(x) followed by substitution into equation (5.2), thereby eliminating x. Rearrangement of equation (5.1) yields which, when substituted for ln(x) in equation (5.2), gives and simplifies to This result can be found in, for example, Cowell [11, p. 157].
6. An equation for the equal-variance normal receiver operating characteristic curve Perhaps, the most common model of the ROC curve is that based on two equal-variance normal distributions, one distribution for the noise and the other for the signal plus noise. The origin of the distributions is usually assigned to the mean of the noise distribution. The scale factor is set by assigning unit value to each variance. As a result, the ROC curve has only one parameter, namely d , the mean of the signal distribution specified in standard deviation units. Whereas the axes of the Lorenz curve are CDFs, those of the ROC curve are complements of CDFs. Thus, if P F is computed from a normal distribution with mean 0 and variance 1, then Similarly, if P H is computed from a normal distribution with mean d and variance 1, then We write the equation for the ROC curve, R(p), as where p = P F and again 0 ≤ p ≤ 1. Then inverting equation (6.1) gives and substituting the result into equation (6.2) to eliminate c yields

Relation between the lognormal Lorenz and the equal-variance normal receiver operating characteristic curves
Observe that equation (6.3) for the equal-variance normal ROC curve is the same as equation (5.3) for the lognormal Lorenz curve but with d in place of −σ . Furthermore, if μ is the mean of a lognormal density in logarithmic space, then μ + σ 2 is the mean of its first-moment density [10]. Recall that d is the difference between the means of these two normal densities divided by their common standard deviation, so and is the same as equation (6.3) for the equal-variance ROC curve. The lognormal Lorenz curve and the equal-variance normal ROC curve are thus congruent. In addition, Aitchison & Brown [10] observed that the lognormal Lorenz curve is symmetrical about the negative diagonal of a unit square, that is about the line drawn from the point (0, 1) to (1,0). Similarly, the equal-variance normal ROC curve is symmetrical about the negative diagonal [13, p. 60]. The congruence of the two curves and their symmetry about the negative diagonal means that, after a rotation through 180 • about the point (1/2, 1/2), the rotated and original curves appear to be mirror images in the positive diagonal of the square.
Kakawani [14]  given by the equation where A R is the proportion of the area under the curve [15]. For example, the area under the ROC curve with d = 0.752, illustrated in figure 1, is 0.702. The area between the ROC curve and the positive diagonal is A R − 0.5, which for the example in figure 1 is 0.202. This is also the proportion of the square's area between the diagonal and the lognormal Lorenz curve (the hatched area in figure 1), which is an important quantity in economics known as the area of concentration. Kendall & Stuart [6,p. 49] proved that the standard index of inequality, the Gini coefficient of concentration (0 ≤ G ≤ 1), is twice the area of concentration.
The indices d and G for the equal-variance normal ROC curves and for lognormal Lorenz curves are related. From geometry it is clear that Recall that for the lognormal Lorenz, d = (μ + σ 2 − μ)/σ = σ . Substituting σ for d gives G = 2Φ(σ/ √ 2) − 1, which is a standard equation for computing G for the lognormal Lorenz (e.g. [10]).

An empirical example
Podder & Chatterjee [16] examined changes in the distribution of New Zealand household income from 1984 to 1996. They provided the ordinate values corresponding to each decile of income, so that graphs of Lorenz curves can be constructed from their numerical results. As an example, figure 1 shows their data for 1995/1996 (their table 2). The solid smooth curve in figure 1 is our estimate of the best-fitting lognormal Lorenz curve to those data. The fit was determined using SDT ASSISTANT software [17] to fit the equal-variance normal ROC to confidence ratings. The ROC parameter d , or equivalently the Lorenz parameter σ , for the fitted curve is 0.751. Podder & Chatterjee [16] reported a Gini coefficient of 0.404 for these data. The Gini coefficient for the fitted curve in figure 1 is 0.405 which attests to the closeness of the fit of the lognormal model to the data. The mirror image ROC curve for d = 0.751 is shown in figure 1 for comparison.

Discussion
The preceding comparison of lognormal Lorenz and equal-variance normal ROC curves has described how they are congruent and related by a 180 • rotation about the point (1/2, 1/2). That comparison neglects an important feature of ROC analysis that concerns the criterion or threshold for a decision. In many applications of ROC analysis, both the location of the criterion and the accuracy of the system under test are of interest. As is well known, the location of a criterion is shown by a point on the ROC curve, whereas the accuracy of the system is specified by some index that characterizes the curve as a whole.
For the Lorenz curve, by contrast, the location of the criterion is normally not of primary interest because its main purpose is to provide a summary index of inequality. Like ROC analysis, it yields that index on the basis of the curve as a whole, from which the area of concentration and the Gini coefficient of inequality can be determined. To obtain that result, analysts do not need to decide, for example, what level of income constitutes poverty. The Gini index of the Lorenz curve thus measures a dimension analogous to accuracy of the ROC curve and yields a measure that is independent of decision criteria. Of course, analysts may be interested in particular points on a Lorenz curve, such as the point showing what percentage of the population earns 80% of the total income, but these points are not interpreted, as they are in ROC analysis, as indicating the diagnostic system's choice of decision criteria.
Nonetheless, an analysis confined to comparing the shapes of Lorenz curves and the shapes of their 180 • rotated ROC curves does not take into account the effect of such a rotation on the location of corresponding points on each curve. The Lorenz point (x, y) and the 180 • rotated ROC point (1 − x, 1 − y) are not mirror images, though they may lie on curves that are mirror images, such as the lognormal Lorenz curve and the equal-variance normal ROC curve.
One benefit of this relation is that readily available ROC software is equally applicable to the analysis of Lorenz curves. In addition, the potential applications of each curve are extended. For example, the lognormal Lorenz curve could be used to evaluate diagnostic systems that might otherwise be modelled by the equal-variance normal ROC. And the measurement of economic inequality can be added to the large repertoire of existing applications of equal-variance normal ROC analysis.
Our analysis deals only with equal-variance ROCs. Lee [18] compared Lorenz and unequal-variance ROC curves in the evaluation of diagnostic tests. He observed that unequal ROC curves can be constructed for diagnostic tests that are not monotonic with likelihood ratio, in which case the ROC curves may dip below the positive diagonal of the unit square. Such ROC curves are sometimes called 'improper'. Lorenz curves cannot be improper in this sense because they are by definition convex. In other words, they satisfy the condition, noted above, that L (p) ≥ 0. Thus, improper ROC curves do not have a corresponding rotated Lorenz curve. To construct such a Lorenz curve, the results of the diagnostic test have to be re-ordered to be monotonic with likelihood ratio. Lee's examples illustrate that the general relation between ROC and Lorenz curves does not extend to improper ROC curves.

Conclusion
Although Lorenz and ROC curves stem from entirely different origins and have been applied to entirely different problems, they are intimately connected. In particular, the equations for two important examples-the lognormal Lorenz curve and the equal-variance ROC curve-are identical except for the sign of their respective parameters. Moreover, these symmetric Lorenz and ROC curves are congruent and are mirror images in the positive diagonal. Because of this close relationship, the potential areas of application of each curve are broadened.
Data accessibility. The data in figure 1