BestNeighborhood publishes free, neighborhood-level data about the places people live: politics, noise, crime, demographics, schools, weather, and more. This page describes the team behind that work, the standards the work is held to, and why it is published without individual bylines.
Who Works on This Data
The BestNeighborhood data science team is a small group of senior practitioners. Every member of the team has at least two decades of professional experience in fields the published work depends on:
- Statistical and machine-learning modeling on messy real-world data, including gradient-boosted tree ensembles, calibrated linear models, and held-out cross-validation protocols.
- Geographic information systems and spatial analysis at the census-block level, including raster math, vector overlay, and energy-domain noise aggregation.
- Federal datasets from the U.S. Census Bureau, CDC, EPA, USGS, FHWA, FAA, BTS, and NOAA. Several team members have worked directly with agency staff on data products that the site now uses.
- Production data engineering. The two largest projects on the site each touch billions of rows, six million road segments, and hundreds of thousands of voting precincts. The pipelines that produce them run end to end, unattended, and are reproducible from raw inputs.
- Domain expertise in housing, demographics, public health epidemiology, acoustical physics, election science, and built-environment research.
No member of the team is fresh out of school. Junior contributors do not have final say on model design, source selection, or what ships to readers.
How the Work Is Done
The team operates under a written ruleset that governs every model and every page. The rules that matter most to a reader are these:
Two Recent Projects
The two largest projects the team has shipped illustrate the standard. Both are free to use, both source their inputs from federal data, and both publish methodology and limitations in full.
Block-Level Political Lean
An ensemble of gradient-boosted decision trees that predicts conservative-vs-liberal lean and turnout for every census block in all 50 states plus the District of Columbia. The model was trained on official precinct results from the 45 states that publish them, paired with block-group features from the Census ACS, the CDC PLACES dataset, the CDC Environmental Justice Index, the USGS National Land Cover Database, the EPA National Walkability Index, NOAA daily weather, and the MIT Election Lab.
The headline accuracy is an R² of 0.924 on precinct-level partisan lean for precincts the model never saw during training. Under the stricter test of entirely held-out states, where the model has to reconstruct a state's partisanship from demographics alone, the R² is 0.867.
For context, a widely cited 2013 paper in the Journal of Politics by Tausanovitch and Warshaw (“Measuring Constituent Policy Preferences in Congress, State Legislatures, and Cities”) combined demographics with 275,000 survey responses and reported an R² of about 0.58 at the much coarser city level. The block-level work captures roughly 92% of precinct-to-precinct variation at a far finer geography. Read the model methodology and findings.
National Transportation Noise Map
A 16-feature physics-and-traffic model that predicts road noise for every road in the U.S. Federal Highway Administration's road database, calibrated against the U.S. Department of Transportation's own measured noise rasters and combined with federal aviation and rail noise overlays. Six million road segments are scored. Approximately eight million U.S. Census blocks are summarized for energy-averaged noise, peak noise, and the percent of each block above the EPA's 55 dBA outdoor activity threshold.
The road noise formula was calibrated on nearly one million ground-truth samples drawn from four geographically distinct states (Rhode Island, Texas, Florida, California) covering urban interstates, suburban arterials, rural highways, desert roads, and coastal routes. A gradient-boosted ceiling model topped out at R² = 0.75; the published linear model captures R² = 0.68 of the variation using only 16 interpretable road features, or about 91% of what the more complex model could learn from the same inputs. The typical prediction error is 5.5 dBA for the model in isolation. Where the U.S. Department of Transportation has measured a road, its values replace the model. Where it has not, the model fills in.
The team worked directly with DoT staff to confirm acceptable reuse of the measured data. Agency staff requested no personal attribution and that request is respected. Read the noise model methodology and findings.
Why the Team Is Not Named
Anonymous publication is a deliberate choice. There are two reasons.
First, and most importantly, several of the published topics have angered people and triggered harassment. Even earlier and less-detailed political lean pages attracted angry emails when we talked about certain correlations, such as those related to education. These are legitimate correlations that we are discovering and explaining, not inventing. To allow the team to create the highest-quality models and maps possible, we leave their names off it. We describe and visualize the world as it is, not as any member of the team believes it should be. And for the record, no one on the team holds extreme partisan views. The same logic applies to writing about crime, health outcomes, and other sensitive topics the site publishes.
Second, no single person builds any page. Team members both past and present have made it what it is, and future team members will doubtless improve pages, maps, and models in the years to come. The value of the work is not reliant on who currently works on the data nor is the quality dependent on the person who originally wrote the analysis. The work speaks for itself. Every estimate is reproducible from the sources, the formula, and the validation protocol described on the page where it appears. A reader can check whether the methodology is sound without reading the modeler's resume. The open nature of the work allows anyone to understand and disagree, and every member of the team is given the authority and encouraged to check for mistakes and make updates as needed.
BestNeighborhood is the publisher of this data, and corrections to any number on any page are taken seriously. The contact form linked below reaches a real person who reads it. The anonymity is at the individual byline level, not at the site level.
In the Team's Own Words
Primary Sources the Team Relies On
The same set of agencies and datasets appears across projects. They are listed here so a reader can verify any one of them independently:
- American Community Survey and the 2020 Decennial Census, U.S. Census Bureau.
- CDC PLACES and the CDC Environmental Justice Index, U.S. Centers for Disease Control and Prevention.
- EPA National Walkability Index, U.S. Environmental Protection Agency.
- National Land Cover Database, U.S. Geological Survey and the Multi-Resolution Land Characteristics Consortium.
- Highway Performance Monitoring System, Federal Highway Administration.
- National Transportation Noise Map, Bureau of Transportation Statistics, U.S. Department of Transportation.
- Terminal Area Forecast, Federal Aviation Administration.
- PRISM Climate Group, Oregon State University, and the NOAA Global Historical Climatology Network.
- MIT Election Lab Survey of the Performance of American Elections.
A Note on Automation and AI
Model training, feature engineering, and data pipelines on this site are heavily automated. They have to be, to score six million road segments or eight million census blocks in a reasonable time. The words are human written and reviewed, even where tools are used to fill in data and check for errors. We have almost a hundred thousand pages per dimension (noise, income, politics, etc.). There's simply no way we can hand-write each page and still provide you with the maps and data you want. That said, we rely on team members to be experts in data, not AI.
Found a Mistake?
If a statistic on any BestNeighborhood page looks wrong, the team would like to know. Corrections, missing data flags, and source updates can be sent through the BestNeighborhood contact form. Select “Data inquiry” as the subject and the message will route to the data team. Real questions get real replies.