10/27 – Friday

The extensive police shooting dataset compiled by the Washington Post can provide valuable insights when explored using clustering algorithms like k-means, k-medoids, and DBSCAN. In this post, I’ll overview how these methods could group and analyze this data.Clustering algorithms identify groups of similar data points when no predefined categories exist. Some applications on this data could include:

– Grouping police departments by shooting patterns over time. Are there clusters of cities with increasing vs decreasing trends?- Segmenting victims into clusters based on demographics like race, age, mental illness to uncover groups at highest risk. – Discovering clusters of cities with disproportionate shooting rates per capita compared to their populations. – Using location data to cluster shootings by geographic patterns at the city and neighborhood levels. K-means forms clusters based on minimizing within-group variance. K-medoids is more robust to outliers. DBSCAN groups points by density without needing to pre-specify cluster count.

These methods could reveal new insights not apparent by simply reading summary statistics. Identifying clustered subgroups by victim profile, geography, department patterns, and temporal trends can aid targeting of policing reforms and policy efforts. Overall, unsupervised clustering represents a valuable approach for discovering hidden patterns, segments, and data-driven groupings within the rich Washington Post police shooting dataset. Moving beyond predefined categories allows the data itself to guide understanding.

Leave a Reply

Your email address will not be published. Required fields are marked *