8  Limitations & Uncertainty

The results in Chapter 6 are derived estimates, not direct measurements. Every step of the pipeline, from the source data to the redistribution to the final ratio calculations, introduces uncertainty. This chapter itemises the four most important sources and explains what each implies for interpreting and extending the analysis.

8.1 Area-weighting assumes uniform internal distribution

The core assumption of area-weighted interpolation is that the quantity being redistributed is distributed uniformly within each source unit. For ACS income, the source units are census block groups; for Ookla broadband, the source units are 610-metre speed-test tiles. Wherever that uniformity assumption is wrong, wherever households or speed-test activity are spatially clustered within a unit, the redistribution will misallocate a fraction of the value.

In practice, both income and population density vary within block groups, particularly at the edges where block groups straddle the civic-association boundary. The effect is largest for target associations that are small relative to the source units they overlap, and smallest for large associations that wholly contain many source units. Analysts should treat estimates for the smallest associations with additional caution and, where possible, validate against parcel-level or point-level data.

This limitation is not unique to this workflow; it is inherent to any area-weighted interpolation approach [1], [2]. The redistribute_direct function used here, like all area-weighting methods, cannot recover sub-unit spatial variation that is not encoded in the source data.

8.2 Ookla data reflects active test-takers, not all households

The Ookla speed-test tiles record the average performance of users who actively ran a speed test during the measurement quarter. This introduces a form of self-selection bias: test-takers are not a random sample of all internet subscribers. People who run speed tests are more likely to be technically engaged, more likely to be troubleshooting a slow connection, and possibly more concentrated in particular building types or demographic groups [3].

The direction of this bias is not straightforwardly predictable. If frustrated users test more frequently, the Ookla data may understate typical speeds. If technically sophisticated users dominate the test pool, the data may overstate typical speeds. For the purposes of a comparative analysis across civic associations within a single county, what matters is whether the bias is spatially systematic: whether the test-taker population differs in structure across associations in ways that would distort the between-association comparisons. That cannot be fully evaluated without ground-truth data.

Users of this analysis should treat the download speeds as indicative of infrastructure capacity at the high end of the user experience distribution, not as estimates of median or typical household experience.

8.3 ACS income carries sampling error

The ACS five-year block-group estimates are not administrative records; they are survey estimates with associated margins of error. At the block-group level, particularly for smaller block groups, the coefficient of variation on income variables can be substantial. The guide uses aggregate household income (B19025_001E) and household count (B11001_001E) rather than median income precisely because both are extensive, summable measures that survive redistribution with interpretable units.

However, the mean income ultimately derived for each civic association is still the ratio of two redistributed survey estimates. Both the numerator (aggregate income) and denominator (household count) carry ACS sampling uncertainty. In well-populated block groups this uncertainty is modest; in smaller block groups it can be large enough to materially affect the redistributed estimate. Arlington’s relatively high response rates and the use of five-year pooled estimates (2017–2021) mitigate but do not eliminate this concern [4].

The redistribution does not propagate formal uncertainty bounds. An analyst who needs confidence intervals on the civic-association estimates would need to request the block-group-level margins of error from the Census API and carry them through the redistribution algebra, a non-trivial extension that is beyond the scope of the current pipeline.

8.4 redistribute_direct rescales measures independently

The redistribute_direct function rescales each source column independently to ensure its total is preserved in the target geography. When two columns are redistributed independently and then combined into a ratio, as happens with both income (aggregate income / households) and broadband (speed-product / tests), the rescaling applied to each column need not be identical. Near the county boundary, where tile clipping and block-group edge effects are most pronounced, the two columns may be rescaled by slightly different factors, producing a ratio that is not perfectly equivalent to what a joint redistribution would yield.

In the Arlington validation, the household total is preserved within approximately 2% of the source total, well within the ACS sampling error and acceptable for a comparative county-level analysis. For applications where ratio precision at the county boundary is critical, analysts should consider masking associations that have more than a threshold fraction of their area outside the source coverage, or running a sensitivity check that compares estimates from independent and joint redistribution paths.