-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate urban and rural annual VMT fractions #333
Conversation
prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py
Show resolved
Hide resolved
prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py
Show resolved
Hide resolved
prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py
Outdated
Show resolved
Hide resolved
prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py
Outdated
Show resolved
Hide resolved
prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py
Outdated
Show resolved
Hide resolved
prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py
Outdated
Show resolved
Hide resolved
for s in census_ua: | ||
census_ua_format = census_ua[s].copy() | ||
for p in pattern: | ||
census_ua_format.index = census_ua_format.index.str.replace( | ||
p, "", regex=True | ||
) | ||
common = set(tht_ua_format.index).intersection(set(census_ua_format.index)) | ||
vmt_for_ua[s] = ( | ||
pd.DataFrame( | ||
{ | ||
"Annual VMT": [ | ||
365 * tht_ua_format.loc[i] * census_ua_format.loc[i] | ||
for i in common | ||
] | ||
}, | ||
index=list(common), | ||
) | ||
.rename(index=format2original) | ||
.squeeze() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can construct a data frame with UA only given the state abbreviations are suffixes of UA names, i.e. pd.concat(census_ua.values())
, then we should be able to do the multiplication directly with tht_ua_format
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the UA names are not unique. For example you have:
>>> vmt_ua = calculate_vmt_for_ua(census_ua, tht_ua)
>>> vmt_ua["OH"]
Cleveland, OH 1.450011e+10
Toledo, OH-MI 4.215456e+09
Parkersburg, WV-OH 6.109252e+07
Canton, OH 2.109291e+09
Middletown, OH 7.553366e+08
Huntington, WV-KY-OH 2.850411e+08
Springfield, OH 7.482752e+08
Lorain-Elyria, OH 2.271439e+09
Wheeling, WV-OH 3.209512e+08
Youngstown, OH-PA 3.122444e+09
Dayton, OH 6.994014e+09
Akron, OH 5.258336e+09
Columbus, OH 1.153874e+10
Mansfield, OH 6.816990e+08
Weirton-Steubenville, WV-OH-PA 3.434335e+08
Lima, OH 6.162137e+08
Newark, OH 5.972769e+08
Cincinnati, OH-KY-IN 1.258937e+10
Name: Annual VMT, dtype: float64
>>> vmt_ua["PA"]
Hagerstown, MD-WV-PA 7.916503e+07
Harrisburg, PA 3.792602e+09
Johnstown, PA 3.919511e+08
Lebanon, PA 3.038400e+08
Lancaster, PA 2.520083e+09
Uniontown-Connellsville, PA 3.988131e+08
York, PA 1.561355e+09
Cumberland, MD-WV-PA 2.896863e+05
Scranton, PA 2.841143e+09
Williamsport, PA 6.703207e+08
Allentown, PA-NJ 4.718352e+09
Youngstown, OH-PA 3.541347e+08
Pottstown, PA 5.967203e+08
Altoona, PA 5.087245e+08
Reading, PA 1.739473e+09
Weirton-Steubenville, WV-OH-PA 2.598250e+06
Pittsburgh, PA 1.207367e+10
Hazleton, PA 3.577484e+08
State College, PA 3.945960e+08
Philadelphia, PA-NJ-DE-MD 2.501896e+10
Binghamton, NY-PA 2.692702e+07
Monessen-California, PA 4.604961e+08
Erie, PA 1.063796e+09
Name: Annual VMT, dtype: float64
>>> vmt_ua["WV"]
Parkersburg, WV-OH 4.803244e+08
Hagerstown, MD-WV-PA 5.980238e+08
Huntington, WV-KY-OH 9.474756e+08
Wheeling, WV-OH 5.430394e+08
Morgantown, WV 4.670725e+08
Charleston, WV 2.317763e+09
Weirton-Steubenville, WV-OH-PA 2.638600e+08
Cumberland, MD-WV-PA 2.101628e+07
Name: Annual VMT, dtype: float64
For example, Weirton-Steubenville, WV-OH-PA
is one UA (belonging to 3 diiferent states with 3 different values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we will have Weirton-Steubenville, WV
, Weirton-Steubenville, OH
and Weirton-Steubenville, PA
respectively and each (UA, State) pair is unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other challenge here is the shapefiles for the spatial translation. From what Dan Muldrew tested the other day, I believe the UAs in those files are still the whole Weirton-Steubenville, WV-OH-PA
name, so we did not want to lose the full string, at least for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BainanXia , that said, you're correct that a cleaner way to do this would be Weirton-Steubenville, WV
, Weirton-Steubenville, OH
and Weirton-Steubenville, PA
If we can confirm that we won't break other processes, that would certainly be a preference.
We'll indeed have the UA and the specific state in its own column in the csv outputs later, fyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we break such multi-state UAs into UA in each state for both population and VMT per capita (even though the latter is the same among each state). Then we will get unique pairs of UA+state so that we could get rid of the multi-layer structure, {state: UA within the state, ...}. But yeah, it doesn't matter as long as it works.
prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py
Outdated
Show resolved
Hide resolved
prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py
Outdated
Show resolved
Hide resolved
for s in vmt_for_ua: | ||
if s in vmt_for_state.index: | ||
vmt_for_ua_perc[s] = vmt_for_ua[s] / vmt_for_state.loc[s] | ||
vmt_for_ra_perc[s] = 1 - vmt_for_ua_perc[s].sum() | ||
else: | ||
vmt_for_ua_perc[s] = 1 | ||
vmt_for_ra_perc[s] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could do pd.Series({s:v.sum() for s, v in vmt_for_ua.items()})
then using series operations to get the results in one shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why we would want to sum over all urban areas in a state. At this point of the calculation we want to get, for each urban area in a state, the fraction of VMT with respect to the total in the state:
>>> vmt_ua_perc["AL"]
Birmingham, AL 0.144348
Pensacola, FL-AL 0.001210
Mobile, AL 0.060350
Florence, AL 0.011524
Tuscaloosa, AL 0.021803
Decatur, AL 0.006910
Anniston-Oxford, AL 0.015453
Dothan, AL 0.014312
Huntsville, AL 0.045015
Gadsden, AL 0.013453
Columbus, GA-AL 0.008350
Montgomery, AL 0.049128
Auburn, AL 0.013323
Name: Annual VMT, dtype: float64
and the percentage of VMT is simply the rmainder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, I was thinking rural VMT = state total VMT - sum(each UA VMT within the state)
, so that we have the first term and second term aligned by state on the right hand side, which can be done without looping through the state. But yeah, it doesn't matter as long as it works;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean and tidy. Thanks
if s in vmt_for_state.index: | ||
vmt_for_ua_perc[s] = vmt_for_ua[s] / vmt_for_state.loc[s] | ||
vmt_for_ra_perc[s] = 1 - vmt_for_ua_perc[s].sum() | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment that this condition is for DC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for s in vmt_for_ua: | ||
if s in vmt_for_state.index: | ||
vmt_for_ua_perc[s] = vmt_for_ua[s] / vmt_for_state.loc[s] | ||
vmt_for_ra_perc[s] = 1 - vmt_for_ua_perc[s].sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might check that these values are between 0 and 1 in case there is an issue with data? Or perhaps a warning message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good:
>>> for s, v in vmt_ua_perc.items():
... if s != "DC":
... print(f"{s}: {v.max()}")
... else:
... print(f"{s}: {v}")
...
AL: 0.14434760346942355
AK: 0.26437207277844105
AZ: 0.44991303311393605
AR: 0.14908502626966288
CA: 0.2982427206720955
CO: 0.36397376132480563
CT: 0.26032021034444786
DE: 0.3423880212011323
DC: 1
FL: 0.22618050635128556
GA: 0.4205194715339222
HI: 0.4555934566561944
ID: 0.15263531331780308
IL: 0.5686896022596508
IN: 0.21845602247508317
IA: 0.11619400114840563
KS: 0.23042077253226498
KY: 0.06590201248930878
LA: 0.1167878455075878
ME: 0.11716040474540913
MD: 0.33409851003528707
MA: 0.5934979073891866
MI: 0.3462470055304457
MN: 0.4069880542887952
MS: 0.14115482249519
MO: 0.2858434202091423
MT: 0.06639716734171276
NE: 0.22775869035927454
NV: 0.48426477080479685
NH: 0.13401929285747738
NJ: 0.47782865005090186
NM: 0.21564355353847547
NY: 0.5389666383119348
NC: 0.10540342948979183
ND: 0.08904652146843647
OH: 0.1287217813988377
OK: 0.2354443740440485
OR: 0.29558155277403003
PA: 0.25133818104564754
RI: 0.9151826365467751
SC: 0.09462195359436316
SD: 0.11024035295479549
TN: 0.1736666305348097
TX: 0.179917688878257
UT: 0.28819452292365755
VT: 0.16083495321895766
VA: 0.20818156599974846
WA: 0.4318657588567066
WV: 0.12309620751263774
WI: 0.18486930899464962
WY: 0.0705196807271032
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Pull Request doc
Purpose
Calculate urban and rural Vehicle Miles Traveled (VMT) fractions. Closes #316 and #317.
What the code is doing
Testing
Existing unit tests
Where to look
New module
generate_scaling_factors
Usage Example/Visuals
Note
We will need to go over the data intake procedure for the input files
Time estimate
20min