Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate urban and rural annual VMT fractions #333

Merged
merged 1 commit into from
Jan 24, 2023

Conversation

rouille
Copy link
Collaborator

@rouille rouille commented Jan 20, 2023

Pull Request doc

Purpose

Calculate urban and rural Vehicle Miles Traveled (VMT) fractions. Closes #316 and #317.

What the code is doing

  • Load 3 dataset (census data for state and urban area and the transportation health tool) with 2 different functions
  • Calculate the total annual Vehicle Miles Traveled (VMT) in urban areas and states
  • Calculate the percentage of Vehicle Miles Traveled (VMT) in urban and rural areas

Testing

Existing unit tests

Where to look

New module generate_scaling_factors

Usage Example/Visuals

>>> from prereise.gather.demanddata.transportation_electrification.generate_scaling_factors import census_ua_url, census_state_url, tht_data_url, load_census_ua, load_census_state, load_dot_vmt_per_capita, calculate_vmt_for_ua, calculate_vmt_for_state, calculate_urban_rural_fraction
>>> census_ua = load_census_ua(census_ua_url)
>>> census_state = load_census_state(census_state_url)
>>> tht_ua, tht_state = load_dot_vmt_per_capita(tht_data_url)
>>> vmt_ua = calculate_vmt_for_ua(census_ua, tht_ua)
>>> vmt_state = calculate_vmt_for_state(census_state, tht_state)
>>> vmt_ua_perc, vmt_ra_perc = calculate_urban_rural_fraction(vmt_ua, vmt_state)
>>> vmt_ra_perc
{'AL': 0.5948209907004509, 'AZ': 0.4072678991599876, 'AR': 0.6782718495006606, 'CA': 0.2522036919615287, 'CO': 0.4365774632878532, 'CT': 0.18269470942707955, 'DE': 0.559915703167249, 'DC': 0, 'FL': 0.2923051798349785, 'GA': 0.42402572834199637, 'ID': 0.6723734528747444, 'IL': 0.28372082436406665, 'IN': 0.5298558463320291, 'IA': 0.7022578956589101, 'KS': 0.5763025206199697, 'KY': 0.8477578601781653, 'LA': 0.5246373942279056, 'ME': 0.7882474490409153, 'MD': 0.3333401978676642, 'MA': 0.11947580340547559, 'MI': 0.3789053832349696, 'MN': 0.5432410752038628, 'MS': 0.7266752412880431, 'MO': 0.5071803635012153, 'MT': 0.8600569689974888, 'NE': 0.6739144929625285, 'NV': 0.399831570902325, 'NH': 0.6009439272679777, 'NJ': 0.33585021596160347, 'NM': 0.6848299106793774, 'NY': 0.19194166956193792, 'NC': 0.5092809432132339, 'ND': 0.8361712780525931, 'OH': 0.40514557017474595, 'OK': 0.5972732841621138, 'OR': 0.5636801026484732, 'PA': 0.3984935379624328, 'RI': 0.05679462197338614, 'SC': 0.588343323457865, 'SD': 0.8225877011609205, 'TN': 0.5049657185947951, 'TX': 0.4028581131849771, 'UT': 0.38149202467572874, 'VT': 0.8391650467810423, 'VA': 0.4227252560298854, 'WA': 0.364071745825326, 'WV': 0.7005357353635477, 'WI': 0.6074652247653839, 'WY': 0.8867296939626887}
>>> vmt_ua_perc["AL"]
Birmingham, AL         0.144348
Florence, AL           0.011524
Mobile, AL             0.060350
Pensacola, FL-AL       0.001210
Gadsden, AL            0.013453
Auburn, AL             0.013323
Montgomery, AL         0.049128
Decatur, AL            0.006910
Anniston-Oxford, AL    0.015453
Huntsville, AL         0.045015
Tuscaloosa, AL         0.021803
Dothan, AL             0.014312
Columbus, GA-AL        0.008350
Name: Annual VMT, dtype: float64

Note

We will need to go over the data intake procedure for the input files

Time estimate

20min

@rouille rouille self-assigned this Jan 20, 2023
Comment on lines 148 to 170
for s in census_ua:
census_ua_format = census_ua[s].copy()
for p in pattern:
census_ua_format.index = census_ua_format.index.str.replace(
p, "", regex=True
)
common = set(tht_ua_format.index).intersection(set(census_ua_format.index))
vmt_for_ua[s] = (
pd.DataFrame(
{
"Annual VMT": [
365 * tht_ua_format.loc[i] * census_ua_format.loc[i]
for i in common
]
},
index=list(common),
)
.rename(index=format2original)
.squeeze()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can construct a data frame with UA only given the state abbreviations are suffixes of UA names, i.e. pd.concat(census_ua.values()), then we should be able to do the multiplication directly with tht_ua_format?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the UA names are not unique. For example you have:

>>> vmt_ua = calculate_vmt_for_ua(census_ua, tht_ua)
>>> vmt_ua["OH"]
Cleveland, OH                     1.450011e+10
Toledo, OH-MI                     4.215456e+09
Parkersburg, WV-OH                6.109252e+07
Canton, OH                        2.109291e+09
Middletown, OH                    7.553366e+08
Huntington, WV-KY-OH              2.850411e+08
Springfield, OH                   7.482752e+08
Lorain-Elyria, OH                 2.271439e+09
Wheeling, WV-OH                   3.209512e+08
Youngstown, OH-PA                 3.122444e+09
Dayton, OH                        6.994014e+09
Akron, OH                         5.258336e+09
Columbus, OH                      1.153874e+10
Mansfield, OH                     6.816990e+08
Weirton-Steubenville, WV-OH-PA    3.434335e+08
Lima, OH                          6.162137e+08
Newark, OH                        5.972769e+08
Cincinnati, OH-KY-IN              1.258937e+10
Name: Annual VMT, dtype: float64
>>> vmt_ua["PA"]
Hagerstown, MD-WV-PA              7.916503e+07
Harrisburg, PA                    3.792602e+09
Johnstown, PA                     3.919511e+08
Lebanon, PA                       3.038400e+08
Lancaster, PA                     2.520083e+09
Uniontown-Connellsville, PA       3.988131e+08
York, PA                          1.561355e+09
Cumberland, MD-WV-PA              2.896863e+05
Scranton, PA                      2.841143e+09
Williamsport, PA                  6.703207e+08
Allentown, PA-NJ                  4.718352e+09
Youngstown, OH-PA                 3.541347e+08
Pottstown, PA                     5.967203e+08
Altoona, PA                       5.087245e+08
Reading, PA                       1.739473e+09
Weirton-Steubenville, WV-OH-PA    2.598250e+06
Pittsburgh, PA                    1.207367e+10
Hazleton, PA                      3.577484e+08
State College, PA                 3.945960e+08
Philadelphia, PA-NJ-DE-MD         2.501896e+10
Binghamton, NY-PA                 2.692702e+07
Monessen-California, PA           4.604961e+08
Erie, PA                          1.063796e+09
Name: Annual VMT, dtype: float64
>>> vmt_ua["WV"]
Parkersburg, WV-OH                4.803244e+08
Hagerstown, MD-WV-PA              5.980238e+08
Huntington, WV-KY-OH              9.474756e+08
Wheeling, WV-OH                   5.430394e+08
Morgantown, WV                    4.670725e+08
Charleston, WV                    2.317763e+09
Weirton-Steubenville, WV-OH-PA    2.638600e+08
Cumberland, MD-WV-PA              2.101628e+07
Name: Annual VMT, dtype: float64

For example, Weirton-Steubenville, WV-OH-PA is one UA (belonging to 3 diiferent states with 3 different values)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we will have Weirton-Steubenville, WV, Weirton-Steubenville, OH and Weirton-Steubenville, PA respectively and each (UA, State) pair is unique.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other challenge here is the shapefiles for the spatial translation. From what Dan Muldrew tested the other day, I believe the UAs in those files are still the whole Weirton-Steubenville, WV-OH-PA name, so we did not want to lose the full string, at least for now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BainanXia , that said, you're correct that a cleaner way to do this would be Weirton-Steubenville, WV, Weirton-Steubenville, OH and Weirton-Steubenville, PA
If we can confirm that we won't break other processes, that would certainly be a preference.
We'll indeed have the UA and the specific state in its own column in the csv outputs later, fyi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we break such multi-state UAs into UA in each state for both population and VMT per capita (even though the latter is the same among each state). Then we will get unique pairs of UA+state so that we could get rid of the multi-layer structure, {state: UA within the state, ...}. But yeah, it doesn't matter as long as it works.

Comment on lines 198 to 209
for s in vmt_for_ua:
if s in vmt_for_state.index:
vmt_for_ua_perc[s] = vmt_for_ua[s] / vmt_for_state.loc[s]
vmt_for_ra_perc[s] = 1 - vmt_for_ua_perc[s].sum()
else:
vmt_for_ua_perc[s] = 1
vmt_for_ra_perc[s] = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could do pd.Series({s:v.sum() for s, v in vmt_for_ua.items()}) then using series operations to get the results in one shot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why we would want to sum over all urban areas in a state. At this point of the calculation we want to get, for each urban area in a state, the fraction of VMT with respect to the total in the state:

>>> vmt_ua_perc["AL"]
Birmingham, AL         0.144348
Pensacola, FL-AL       0.001210
Mobile, AL             0.060350
Florence, AL           0.011524
Tuscaloosa, AL         0.021803
Decatur, AL            0.006910
Anniston-Oxford, AL    0.015453
Dothan, AL             0.014312
Huntsville, AL         0.045015
Gadsden, AL            0.013453
Columbus, GA-AL        0.008350
Montgomery, AL         0.049128
Auburn, AL             0.013323
Name: Annual VMT, dtype: float64

and the percentage of VMT is simply the rmainder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, I was thinking rural VMT = state total VMT - sum(each UA VMT within the state), so that we have the first term and second term aligned by state on the right hand side, which can be done without looping through the state. But yeah, it doesn't matter as long as it works;)

Copy link
Collaborator

@BainanXia BainanXia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean and tidy. Thanks

if s in vmt_for_state.index:
vmt_for_ua_perc[s] = vmt_for_ua[s] / vmt_for_state.loc[s]
vmt_for_ra_perc[s] = 1 - vmt_for_ua_perc[s].sum()
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment that this condition is for DC

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for s in vmt_for_ua:
if s in vmt_for_state.index:
vmt_for_ua_perc[s] = vmt_for_ua[s] / vmt_for_state.loc[s]
vmt_for_ra_perc[s] = 1 - vmt_for_ua_perc[s].sum()
Copy link
Collaborator

@dmuldrew dmuldrew Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might check that these values are between 0 and 1 in case there is an issue with data? Or perhaps a warning message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good:

>>> for s, v in vmt_ua_perc.items():
...     if s != "DC":
...             print(f"{s}: {v.max()}")
...     else:
...             print(f"{s}: {v}")
... 
AL: 0.14434760346942355
AK: 0.26437207277844105
AZ: 0.44991303311393605
AR: 0.14908502626966288
CA: 0.2982427206720955
CO: 0.36397376132480563
CT: 0.26032021034444786
DE: 0.3423880212011323
DC: 1
FL: 0.22618050635128556
GA: 0.4205194715339222
HI: 0.4555934566561944
ID: 0.15263531331780308
IL: 0.5686896022596508
IN: 0.21845602247508317
IA: 0.11619400114840563
KS: 0.23042077253226498
KY: 0.06590201248930878
LA: 0.1167878455075878
ME: 0.11716040474540913
MD: 0.33409851003528707
MA: 0.5934979073891866
MI: 0.3462470055304457
MN: 0.4069880542887952
MS: 0.14115482249519
MO: 0.2858434202091423
MT: 0.06639716734171276
NE: 0.22775869035927454
NV: 0.48426477080479685
NH: 0.13401929285747738
NJ: 0.47782865005090186
NM: 0.21564355353847547
NY: 0.5389666383119348
NC: 0.10540342948979183
ND: 0.08904652146843647
OH: 0.1287217813988377
OK: 0.2354443740440485
OR: 0.29558155277403003
PA: 0.25133818104564754
RI: 0.9151826365467751
SC: 0.09462195359436316
SD: 0.11024035295479549
TN: 0.1736666305348097
TX: 0.179917688878257
UT: 0.28819452292365755
VT: 0.16083495321895766
VA: 0.20818156599974846
WA: 0.4318657588567066
WV: 0.12309620751263774
WI: 0.18486930899464962
WY: 0.0705196807271032

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants