About the BestNeighborhood Data Science Team

BestNeighborhood publishes free, neighborhood-level data about the places people live: politics, noise, crime, demographics, schools, weather, and more. This page describes the team behind that work, the standards the work is held to, and why it is published without individual bylines.

Who Works on This Data

The BestNeighborhood data science team is a small group of senior practitioners. Every member of the team has at least two decades of professional experience in fields the published work depends on:

  • Statistical and machine-learning modeling on messy real-world data, including gradient-boosted tree ensembles, calibrated linear models, and held-out cross-validation protocols.
  • Geographic information systems and spatial analysis at the census-block level, including raster math, vector overlay, and energy-domain noise aggregation.
  • Federal datasets from the U.S. Census Bureau, CDC, EPA, USGS, FHWA, FAA, BTS, and NOAA. Several team members have worked directly with agency staff on data products that the site now uses.
  • Production data engineering. The two largest projects on the site each touch billions of rows, six million road segments, and hundreds of thousands of voting precincts. The pipelines that produce them run end to end, unattended, and are reproducible from raw inputs.
  • Domain expertise in housing, demographics, public health epidemiology, acoustical physics, election science, and built-environment research.

No member of the team is fresh out of school. Junior contributors do not have final say on model design, source selection, or what ships to readers.

How the Work Is Done

The team operates under a written ruleset that governs every model and every page. The rules that matter most to a reader are these:

Sourcing

Every input to every model is traceable to a named primary source. The bulk of those sources are federal agencies. When a non-government source is used, it is named and linked on the relevant page. When a measurement is missing for a place, the row is suppressed rather than interpolated to fill space.

Methodology in public

Every estimate published on the site ships with the method behind it: which features the model uses, what it was trained on, how it was validated, and where it is weakest. A “Limitations” section is mandatory on any methodology page, written before the introduction.

Honest accuracy numbers

Accuracy is reported against data the model has never seen. On the politics work, entire states are held out of training so the model has to reconstruct each one from features alone. The harder number is published alongside the easier one. On the noise work, validation samples are drawn from four geographically distinct states (Rhode Island, Texas, Florida, California) and roughly one million road segments.

No partisan editorial

On politically charged topics, the page describes what the data shows. The page does not advocate, does not endorse, and does not characterize either party's positions. Where descriptions of policy are needed, the page links to primary sources rather than paraphrasing them.

No keyword pages

Pages exist to answer a question a real visitor asked. The team does not produce thin pages aimed at search traffic, does not duplicate prose across places, and does not fake content freshness by bumping dates without re-training the underlying model.

Corrections

If a number is wrong, the team would rather hear about it than have it stay up. A correction email is at the bottom of this page and is read by a person.

Two Recent Projects

The two largest projects the team has shipped illustrate the standard. Both are free to use, both source their inputs from federal data, and both publish methodology and limitations in full.

Block-Level Political Lean

An ensemble of gradient-boosted decision trees that predicts conservative-vs-liberal lean and turnout for every census block in all 50 states plus the District of Columbia. The model was trained on official precinct results from the 45 states that publish them, paired with block-group features from the Census ACS, the CDC PLACES dataset, the CDC Environmental Justice Index, the USGS National Land Cover Database, the EPA National Walkability Index, NOAA daily weather, and the MIT Election Lab.

The headline accuracy is an R² of 0.924 on precinct-level partisan lean for precincts the model never saw during training. Under the stricter test of entirely held-out states, where the model has to reconstruct a state's partisanship from demographics alone, the R² is 0.867.

For context, a widely cited 2013 paper in the Journal of Politics by Tausanovitch and Warshaw (“Measuring Constituent Policy Preferences in Congress, State Legislatures, and Cities”) combined demographics with 275,000 survey responses and reported an R² of about 0.58 at the much coarser city level. The block-level work captures roughly 92% of precinct-to-precinct variation at a far finer geography. Read the model methodology and findings.

National Transportation Noise Map

A 16-feature physics-and-traffic model that predicts road noise for every road in the U.S. Federal Highway Administration's road database, calibrated against the U.S. Department of Transportation's own measured noise rasters and combined with federal aviation and rail noise overlays. Six million road segments are scored. Approximately eight million U.S. Census blocks are summarized for energy-averaged noise, peak noise, and the percent of each block above the EPA's 55 dBA outdoor activity threshold.

The road noise formula was calibrated on nearly one million ground-truth samples drawn from four geographically distinct states (Rhode Island, Texas, Florida, California) covering urban interstates, suburban arterials, rural highways, desert roads, and coastal routes. A gradient-boosted ceiling model topped out at R² = 0.75; the published linear model captures R² = 0.68 of the variation using only 16 interpretable road features, or about 91% of what the more complex model could learn from the same inputs. The typical prediction error is 5.5 dBA for the model in isolation. Where the U.S. Department of Transportation has measured a road, its values replace the model. Where it has not, the model fills in.

The team worked directly with DoT staff to confirm acceptable reuse of the measured data. Agency staff requested no personal attribution and that request is respected. Read the noise model methodology and findings.

Why the Team Is Not Named

Anonymous publication is a deliberate choice. There are two reasons.

First, and most importantly, several of the published topics have angered people and triggered harassment. Even earlier and less-detailed political lean pages attracted angry emails when we talked about certain correlations, such as those related to education. These are legitimate correlations that we are discovering and explaining, not inventing. To allow the team to create the highest-quality models and maps possible, we leave their names off it. We describe and visualize the world as it is, not as any member of the team believes it should be. And for the record, no one on the team holds extreme partisan views. The same logic applies to writing about crime, health outcomes, and other sensitive topics the site publishes.

Second, no single person builds any page. Team members both past and present have made it what it is, and future team members will doubtless improve pages, maps, and models in the years to come. The value of the work is not reliant on who currently works on the data nor is the quality dependent on the person who originally wrote the analysis. The work speaks for itself. Every estimate is reproducible from the sources, the formula, and the validation protocol described on the page where it appears. A reader can check whether the methodology is sound without reading the modeler's resume. The open nature of the work allows anyone to understand and disagree, and every member of the team is given the authority and encouraged to check for mistakes and make updates as needed.

BestNeighborhood is the publisher of this data, and corrections to any number on any page are taken seriously. The contact form linked below reaches a real person who reads it. The anonymity is at the individual byline level, not at the site level.

In the Team's Own Words

“What we did on the political lean model is incredible. There are published papers with hundreds of citations out there bragging about predicting 60 or 70 percent of the variance. We predict over 92%, and we're publishing it for free. That's insane.”

“Our contact at the Department of Transportation was skeptical of our work. When they saw what we'd already built they were stunned. They didn't believe we'd create the best transit noise map out there and then let the public access it for free, but they agreed to sign off on it. I love building things that I'm proud of and get to show to my friends and family.”

“When we started working we were putting data together in what was basically a closet. Now millions of people have used our maps and relied on them to decide where to buy, rent, or visit. I never want to work anywhere else.”

Primary Sources the Team Relies On

The same set of agencies and datasets appears across projects. They are listed here so a reader can verify any one of them independently:

A Note on Automation and AI

Model training, feature engineering, and data pipelines on this site are heavily automated. They have to be, to score six million road segments or eight million census blocks in a reasonable time. The words are human written and reviewed, even where tools are used to fill in data and check for errors. We have almost a hundred thousand pages per dimension (noise, income, politics, etc.). There's simply no way we can hand-write each page and still provide you with the maps and data you want. That said, we rely on team members to be experts in data, not AI.

Found a Mistake?

If a statistic on any BestNeighborhood page looks wrong, the team would like to know. Corrections, missing data flags, and source updates can be sent through the BestNeighborhood contact form. Select “Data inquiry” as the subject and the message will route to the data team. Real questions get real replies.