convert_2010_to_2020_bounds(data: DataFrame, *, geoid_col: str = 'geoid', val_col: str = 'value', state_fips: str = '51') -> pd.DataFrame
Redistribute a single year/measure of 2010-vintage values onto 2020 boundaries.
The input frame must contain exactly one row per GEOID (one year, one
measure). Each 2010 source distributes its value to the overlapping 2020
tracts by the fraction of the source area in each overlap
(area_part / area10); a source's overlaps tile it, so the fractions sum to
1 and the total is conserved (count-preserving areal interpolation, using the
Census relationship file's land-area overlaps).
Parameters:
| Name |
Type |
Description |
Default |
data
|
DataFrame
|
Input frame with at least geoid_col and val_col.
|
required
|
geoid_col
|
str
|
Name of the GEOID column (default "geoid").
|
'geoid'
|
val_col
|
str
|
Name of the value column (default "value").
|
'value'
|
state_fips
|
str
|
State FIPS for the block-group crosswalk (default Virginia, "51").
|
'51'
|
Returns:
| Type |
Description |
DataFrame
|
Two columns: geoid (2020 boundaries) and val_col (redistributed).
|
Source code in packages/sdc-census10to20/src/sdc_census10to20/convert.py
| def convert_2010_to_2020_bounds(
data: pd.DataFrame,
*,
geoid_col: str = "geoid",
val_col: str = "value",
state_fips: str = "51",
) -> pd.DataFrame:
"""Redistribute a single year/measure of 2010-vintage values onto 2020 boundaries.
The input frame must contain exactly one row per GEOID (one year, one
measure). Each 2010 source distributes its value to the overlapping 2020
tracts by the fraction of the *source* area in each overlap
(``area_part / area10``); a source's overlaps tile it, so the fractions sum to
1 and the total is conserved (count-preserving areal interpolation, using the
Census relationship file's land-area overlaps).
Parameters
----------
data : pd.DataFrame
Input frame with at least ``geoid_col`` and ``val_col``.
geoid_col : str
Name of the GEOID column (default ``"geoid"``).
val_col : str
Name of the value column (default ``"value"``).
state_fips : str
State FIPS for the block-group crosswalk (default Virginia, "51").
Returns
-------
pd.DataFrame
Two columns: ``geoid`` (2020 boundaries) and ``val_col`` (redistributed).
"""
if data[geoid_col].isna().any():
raise ValueError("geoids contain missing values")
data = data.copy()
data[geoid_col] = data[geoid_col].astype(str)
geoids = data[geoid_col].unique()
if len(data[geoid_col]) > len(geoids):
raise ValueError(
"geoids are not unique -- data cannot contain more than one entry per geoid. "
"Please double check that data only spans one year, measure, etc."
)
if data[val_col].isna().any():
warnings.warn(
"data contains missing values. the value of any new tract that overlaps "
"with a NULL value will be coerced to NULL. If this is an issue, "
"we recommend manual insertion of values based on contextual specifications.",
stacklevel=2,
)
data = data[[geoid_col, val_col]].copy()
data = data.rename(columns={val_col: "value"})
crosswalk = create_crosswalk(list(geoids), state_fips=state_fips)
joined = crosswalk.merge(data, left_on="geoid10", right_on=geoid_col, how="left")
# Areal interpolation that conserves counts: each 2010 source distributes its
# value to overlapping 2020 tracts by the fraction of the *source* area in the
# overlap (area_part / area10). A source's overlaps tile it, so the fractions
# sum to 1 and the source's full value is distributed. type_change does not
# affect the math -- the geometry in area_part/area10 already encodes same vs
# split vs moved.
joined["value"] = joined["value"] * (joined["area_part"] / joined["area10"])
redistributed = joined.groupby("geoid20", as_index=False)["value"].sum()
redistributed = redistributed.rename(columns={"geoid20": "geoid", "value": val_col})
return redistributed
|