What shapes crime data?
Crime totals reflect more than incidents. They are shaped by reporting, policing patterns, and structural factors that algorithms may ignore.
Policing density
More policing often leads to more recorded incidents.
Population density
Busier areas naturally produce more reports.
Reporting bias
Some communities report incidents differently.
Interactive map
Hover over a district to preview incident totals, or click to explore how those numbers can turn into broader assumptions about place, risk, and community identity. This view shows totals rather than per capita rates.
Incident totals
Colors show relative totals from the cleaned 2015–2022 dataset.
District boundaries come from the Boston Police Districts GeoJSON. This map shows total reported incidents by district, not population-adjusted rates. A per capita view could change how districts compare by accounting for differences in population size and activity density.
Explore Boston Districts
Click a district below to see an example of how a system might interpret the data and why that interpretation can be misleading.
District totals are based on combined Boston crime data (2015–2022) processed in Python.
Select a district
Start by clicking one of the district buttons. This panel will update with sample interpretation text.
What the data shows at a glance
These totals come from the combined Boston crime data files from 2015–2022. High counts may look objective, but numbers alone do not explain context, neighborhood size, policing patterns, or how risk gets interpreted.
Highest recorded totals
- B2 — 99,635 incidents
- C11 — 86,111 incidents
- D4 — 84,927 incidents
Lower recorded totals
- A7 — 27,895 incidents
- E5 — 29,050 incidents
What algorithms might do
A system could turn these totals into rankings, flags, or risk scores. That can make the output seem neutral, even when it hides important social and geographic context.
What a ranking might emphasize
These five districts represent the highest totals in the dataset. Visual rankings like this quickly draw attention to the top values, emphasizing contrast and order while leaving out the context behind why those differences exist.
How bias happens
- Crime data gets grouped by place
- Algorithms may treat high counts as high “risk”
- Context gets removed
- Entire communities can be judged unfairly
Why this matters
For Boston students and residents, neighborhood reputation already affects where people live, travel, and feel safe. This project asks what happens when algorithms strengthen those assumptions.
Key takeaways
These patterns are not just about numbers. They show how data can shape assumptions about people, place, and safety when context is left out.
Data is not neutral in practice
Even when crime data looks objective, the way it is grouped, ranked, and interpreted can lead to biased conclusions about entire districts.
One number cannot define a place
Incident totals do not capture community history, reporting patterns, structural inequality, or the lived realities of the people who live there.
Algorithms can reinforce stereotypes
When systems turn district data into simplified labels like “high risk,” they can reproduce existing assumptions instead of helping people understand a place more fully.
Bias in action
Match each data point with a possible explanation. This section asks you to pause and consider how quickly numbers can turn into assumptions when context is missing.
There may seem like there is one obvious answer, but crime data can often be interpreted in multiple ways depending on what context is included or ignored.
Data point
Possible explanation
Sources, disclaimer, and further reading
This project combines public crime data with research on algorithmic bias, fairness, and policing. The district totals shown here are meant to illustrate how data can be interpreted by systems, not to define any neighborhood or community. Crime counts alone do not capture structural inequality, over-policing, underreporting, population differences, or lived experience.
Dataset
Boston crime data (2015–2022) from Kaggle, cleaned and summarized in Python for this project.
View the datasetAcademic research
- Berk et al. (2024) — Improving fairness in criminal justice algorithmic risk assessments
- Castro-Toledo & Gómez-Bellvís (2026) — Democratic use of technologies for urban security
- Chen & Dai (2026) — Facial recognition in digital policing
- Chen et al. (2026) — AI-powered robots in policing
- Ferdaus et al. (2026) — Trustworthy AI review
- Hu (2026) — Fairness in machine learning
- Huq (2019) — Racial equity in algorithmic criminal justice
- McElreath et al. (2022) — Pre-crime prediction and bias