The Washington Post’s database on fatal police shootings provides a valuable opportunity to thoroughly explore and summarize a complex dataset using a variety of descriptive statistical techniques. In this post, I’ll demonstrate different methods for analyzing and describing this multidimensional data.
To start, I calculate key summary statistics that describe the central tendency and spread of the data. The mean and median number of shootings per year or month provide measures of the typical shooting count during the time period. Comparing these values shows whether the distribution is symmetrical or skewed. The standard deviation and variance indicate the amount of dispersion around the mean. Higher values signify more variability in shooting counts. Next, I assess the shape of the distribution using kurtosis and skewness. High kurtosis suggests frequent extreme deviations from the mean, while skewness measures asymmetry. For count data like this, there may be significant positive kurtosis due to the rarity of very high shooting counts. Testing for normality is also informative. Graphical methods like histograms and Q-Q plots provide visualizations of normality. Formal significance tests like the Shapiro-Wilk test can confirm non-normal distributions that impact further statistical modeling. For modeling over time, it’s important to test for stationarity using diagnostics like the Dickey-Fuller test. The data can also be visualized using boxplots, dot charts, and scatterplots. Comparing shootings by year via boxplots is insightful, while scatterplots of shootings by month uncover seasonality. Dot charts visualize shootings by variables like victim age, race, and other demographics. This provides intuition about the data. By thoroughly exploring and describing the Washington Post database using these statistical techniques, I gain crucial insights into the shape, central tendency, outliers, and patterns. This descriptive foundation enables more advanced analytics like hypothesis testing, modeling, and regression analysis.