Holm rejects H0 for the first and third managers, but Bonferroni only rejects H0 for the first manager
Comparison with m=10 p-values
Aim to control FWER at 0.05
p-values below the balck horizontal line are rejected by Bonferroni
p-values below the blue line are rejected by Holm
Holm and Bonferroni make the same conclusion on the black points, but only Holm rejects for the red point
A More Extreme Example
Now five hypotheses are rejected by Holm but not by Bonferroni ...
even though both control FWER at 0.05
Holm or Bonferroni?
Bonferroni is simple : reject any null hypothesis with a p-value below α/m
Holm is slightly more complicated, but it will lead to more rejections while controlling FWER
So, Holm is a better choice
The False Discovery Rate
Back to this table :
The FWER rate focuses on controlling Pr(V>1), i.e., the probability of falsely rejecting any null hypothesis
This is a tough ask when m is large. It will cause us to be super conservative(i.e. to very rarely reject)
Instead, we can control the false discovery rate
FDR=E(V/R)
Intuition Behind the False Discovery Rate
FDR=E(V/R)=E(total number of rejectionsnumber of false rejections)
A scientist conducts a hypothesis test on each of m=20,000 drug candidates
She wants to identify a smaller set of promising candidates to investigate further
She wants reassurance that this smaller set is really promising, i.e. not too many falsely rejected H0's
FWER controls Pr(at least one false rejection)
FDR controls the fraction of candidates in the smaller set that are really false rejections.
Benjamini-Hochberg Procedure to Control FDR
Specify q, the level at which to control the FDR
Compute p-values p1,…,pm for the null hypothesesH01,…,H0m
Order the p-values so that p(1)≤⋯≤p(m)
Define L=max{j:p(j)<qj/m}
Reject all null hypotheses H0j for which pj≤p(L)
Then, FDR≤q
A Comparison of FDR vs FWER
Here, p-values for m=2,000null hypotheses are displayed
To control FWER at level α=0.1 with Bonferroni : reject hypotheses below green line
To control FDR at level q=0.1 with Benjamini-Hochberg : reject hypothese shown in blue
Consider m=5 p-values from the Fund data : p1=0.006,p2=0.918,p3=0.012,p4=0.601,p5=0.756
To control FDR at level q=0.05 using Benjamini-Hochberg :
Notice that p(1)<0.05/5,p(2)<2×0.05/5,p(5)>5×0.05/5
So, we reject H01 and H03
To control FWER at level α=0.05 using Bonferroni :
We reject any null hypothesis for which the p-value is less than 0.05/5
So, we reject only H01
Re-Sampling Approaches
So far, we have assumed that we want to test some null hypothesisH0 with some test statistic T, and that we know the distribution of T under H0
This allows us to compute the p-value
What if this theoretical null distribution is unknown?
A Re-Sampling Approach for a Two-Sample t-Test
Suppose we want to test H0:E(X)=E(Y) versus Hα:E(X)=E(Y), using nX independent observations from X and nY independent observations from Y
The two-sample t-statistic takes the form
T=s1/nX+1/nYμ^X−μ^Y
If nX and nY are large, then T approximately follows a N(0,1) distribution under H0
If nX and nY are small, then we don't know the theorectical null distribution of T
Let's take a permutation or re-sampling approach...
Compute the two-sample t-statistic T on the original data x1,…,xnX and y1,…,ynY
For b=1,…,B(where B is a large number, like 1,000) :
2.1. Randomly shuffle the nX+nY observations
2.2. Call the first nX shuffled observations x1∗,…,xnX∗ and call the remaining observations y1∗,…,ynY∗
2.3. Compute a two-sample t-statistic on the shuffled data, and call it T∗b
The p-value is given by
B∑b=1B1(∣T∗b∣)≥∣T∣
Theoretical p-value is 0.041. Re-sampling p-value is 0.042
Theoretical p-value is 0.571. Re-sampling p-value is 0.673
More on Re-Sampling Approaches
Re-sampling approaches are useful if the theoretical null distribution is unavailable, or requires stringent assumptions
An extension of the re-sampling approach to compute a p-value can be used to control FDR
This example involved a two-sample t-test, but similar approaches can be developed for other test statistics