Tree diagram to illustrate the false discovery rate in screening tests. This example is for a prevalence of 1%, specificity 95% and sensitivity 80%. Out of 10 000 people screened, 495+80=575 give positive tests. Of these, 495 are false positives so the false discovery rate is 86%.
Tree diagram to illustrate the false discovery rate in significance tests. This example considers 1000 tests, in which the prevalence of real effects is 10%. The lower limb shows that with the conventional significance level, p=0.05, there will be 45 false positives. The upper limb shows that there will be 80 true positive tests. The false discovery rate is therefore 45/(45+80)=36%, far bigger than 5%.
Results of 100 000 simulated t-tests, when the null hypothesis is true. The test looks at the difference between the means of two groups of observations which have identical true means, and a standard deviation of 1. (a) The distribution of the 100 000 ‘observed’ differences between means (it is centred on zero and has a standard deviation of 0.354). (b) The distribution of the 100 000 p-values. As expected, 5% of the tests give (false) positives (p≤0.05), but the distribution is flat (uniform).
The case where the null hypothesis is not true. Simulated t-tests are based on samples from the postulated true distributions shown: blue, control group; red, treatment group. The observations are supposed to be normally distributed with means that differ by 1 s.d., as shown in (a). The distributions of the means of 16 observations are shown in (b).
Results of 100 000 simulated t-tests in the case where the null hypothesis is not true, but as shown in figure 4. (a) The distribution of the 100 000 ‘observed’ values for the differences between means of 16 observations. It has a mean of 1, and a standard deviation of 0.354. (b) The distribution of the 100 000 p-values: 78% of them are equal to or less than 0.05 (as expected from the power of the tests).
Distribution of 100 000 p-values from tests like those in figure 5, but with only four observations in each group, rather than 16. The calculated power of the tests is only 0.22 in this case, and it is found, as expected, that 22% are p≤0.05.
The average difference between means for all tests that came out with p≤0.05. Each point was found from 100 000 simulated t-tests, with data as in figure 4. The power of the tests was varied by changing the number, n, of ‘observations’ that were averaged for each mean. This varied from n=3 (power=0.157) for the leftmost point, to n=50 (power=0.9986) for the rightmost point. Intermediate points were calculated with n=4, 5, 6, 8, 10, 12, 14, 16 and 20.