The Washington Post’s database provides a comprehensive record of fatal police shootings in the U.S. The data in this database includes information such as the race, mental health status, and armament of the deceased. This post will give a statistical perspective on this data using logistic regression.
Logistic regression is a statistical method used to understand the relationship between several independent variables and a binary dependent variable. For this data, the dependent variable could be whether the person shot was armed or not. Data preparation is the first step, which involves handling missing values and converting categorical variables into numerical variables. After preparing the data, a logistic regression model can be built using ‘armed’ as the dependent variable and factors such as ‘race’ or ‘mental illness’ as independent variables. The model is then trained and tested on portions of the data to evaluate its performance. The output of the logistic regression model is a probability that the given input point belongs to a certain class. This can provide insight into the factors that contribute to whether a person shot by the police was armed or not. Interpreting the results of a logistic regression analysis requires statistical expertise. The coefficients of the logistic regression model are in odds-ratio form, representing the change in odds resulting from a one-unit change in the predictor. It’s important to note that while logistic regression can identify relationships between variables, it does not prove causation. Other factors not included in the model could also influence the outcome. However, the results can provide valuable insights and guide further research.
In conclusion, the Washington Post’s Fatal Force database provides a wealth of information about fatal police shootings. By applying logistic regression analysis, we can gain a deeper understanding of the factors associated with these tragic events.