Analyzing Police Shooting Data with Statistical Testing
The Washington Post dataset on fatal police shootings provides an opportunity to apply statistical testing techniques to analyze the data. In this post, I will demonstrate using p-values, hypothesis testing, and permutation testing to draw insights. The null hypothesis is that there is no difference in the frequency of police shootings across groups. To test this, we can calculate p-values using simulation-based permutation testing. The steps are:
1. Calculate the observed difference between groups of interest.
2. Permute or shuffle the data many times, each time recalculating the difference between groups. This simulates the null hypothesis.
3. Compare the observed difference to the permutation distribution to calculate a p-value.
4. If p < 0.05, reject the null.
For example, a permutation test can be used to evaluate whether race is a significant predictor of police shootings. If the p-value is below 0.05, we would reject the null hypothesis and conclude race has a statistically significant association. We can repeat this process for other variables like signs of mental illness, fleeing the scene, and the victim’s armed status. Permutation testing allows us to rigorously test relationships in the data while avoiding many assumption pitfalls. This demonstrates how techniques like hypothesis testing and permutation tests can extract powerful insights from the Washington Post database. The ability to move beyond simple descriptive statistics to make statistically rigorous conclusions is incredibly valuable.