Fisher's Exact Test

Understanding Fisher's Exact Test

Fisher's Exact Test is a statistical significance test used to determine if there are nonrandom associations between two categorical variables in a contingency table. Developed by Sir Ronald A. Fisher in the early 20th century, the test is exact because its p-value (the probability of obtaining a test statistic at least as extreme as the one observed during the test) is calculated exactly from the hypergeometric distribution, rather than relying on an approximation that becomes more accurate with larger sample sizes, as is the case with the chi-squared test.

When to Use Fisher's Exact Test

Fisher's Exact Test is particularly useful in the following scenarios:

  • When sample sizes are small and the assumptions of the chi-squared test are not met.
  • When data are sparse in a 2x2 contingency table (i.e., at least one expected frequency is less than 5).
  • When analyzing the results of experiments with fixed margins, meaning the row or column totals are fixed by the design of the experiment.

How Fisher's Exact Test Works

The test is commonly applied to a 2x2 contingency table, which displays the distribution of two categorical variables. Consider the following example:

            | Success | Failure | Total
---------------------------------------
Treatment A |    a    |    b    | a + b
Treatment B |    c    |    d    | c + d
---------------------------------------
Total       |   a+c   |   b+d   |   N

In this table, 'a' and 'd' represent the number of successes and failures under Treatment A and B, respectively, while 'b' and 'c' represent the corresponding failures and successes.

The null hypothesis (H0) of Fisher's Exact Test is that there is no association between the treatment and the outcome. The alternative hypothesis (H1) is that there is an association.

To perform the test, we calculate the probability of observing the given set of values in the table under the null hypothesis. We also calculate the probabilities of all possible tables that are more extreme than the observed table and could be formed under the null hypothesis. The p-value is then the sum of these probabilities. If the p-value is less than or equal to a significance level (commonly 0.05), we reject the null hypothesis, suggesting that there is a statistically significant association between the variables.

Calculating Fisher's Exact Test

The probability of any particular set of values in a 2x2 table, given the marginal totals, is calculated using the hypergeometric distribution:

P = [(a+b)!(c+d)!(a+c)!(b+d)!] / [a!b!c!d!N!]

Where '!' denotes factorial, the product of all positive integers up to that number (e.g., 4! = 4 x 3 x 2 x 1 = 24).

The p-value is then calculated by summing the probabilities of the observed table and all tables with a more extreme distribution of values that could have occurred given the same marginal totals.

Limitations of Fisher's Exact Test

While Fisher's Exact Test is powerful for small sample sizes, it has limitations:

  • It can be computationally intensive for larger sample sizes or tables larger than 2x2.
  • It assumes that the marginal totals are fixed, which may not always be appropriate.
  • It does not provide an estimate of the strength of association or its direction.

Conclusion

Fisher's Exact Test remains a fundamental tool for the analysis of categorical data, especially in cases with small sample sizes or where data do not meet the assumptions required for chi-squared tests. Its exact nature provides a precise p-value, offering a reliable method for testing the independence of two categorical variables.

Despite its limitations, Fisher's Exact Test is a valuable method in the statistical analysis toolkit, particularly in fields such as medical research, biology, and other disciplines where experimental designs often result in small sample sizes and sparse data.

Please sign up or login with your details

Forgot password? Click here to reset