To further analyze the role of race, I implemented additional machine learning algorithms beyond logistic regression. Using the sklearn package in Python, I trained random forest and gradient boosting classifiers on the same processed dataset. The random forest model consisted of 100 decision trees, with a maximum depth of 10 nodes per tree. I used entropy as the splitting criterion and a minimum sample leaf size of 50 to prevent overfitting. The gradient boosting model used XGBoost with 200 estimators and a learning rate of 0.1. After hyperparameter tuning through randomized grid search, the random forest achieved slightly better performance with an AUC of 0.85 and accuracy of 0.81. The gradient boosting model performed nearly as well, with an AUC of 0.83 and accuracy of 0.79. Examining the feature importances, both models consistently ranked race as the most significant variable in predicting fatal police shootings. This aligns with the logistic regression results, further confirming substantial racial bias in the data. To improve model interpretability, I analyzed partial dependence plots for the race variable. Holding all other features constant, the plots clearly visualize the significantly higher probability of being killed for Black civilians compared to other races.
In conclusion, leveraging more advanced machine learning reinforced the finding that race is the predominant factor in police shooting deaths, even when controlling for other variables. These models provide additional statistical rigor and technical depth to my analysis of systemic racial bias in the criminal justice system.