Calculate urban and rural annual VMT fractions #333

rouille · 2023-01-20T21:16:45Z

Pull Request doc

Purpose

Calculate urban and rural Vehicle Miles Traveled (VMT) fractions. Closes #316 and #317.

What the code is doing

Load 3 dataset (census data for state and urban area and the transportation health tool) with 2 different functions
Calculate the total annual Vehicle Miles Traveled (VMT) in urban areas and states
Calculate the percentage of Vehicle Miles Traveled (VMT) in urban and rural areas

Testing

Existing unit tests

Where to look

New module generate_scaling_factors

Usage Example/Visuals

>>> from prereise.gather.demanddata.transportation_electrification.generate_scaling_factors import census_ua_url, census_state_url, tht_data_url, load_census_ua, load_census_state, load_dot_vmt_per_capita, calculate_vmt_for_ua, calculate_vmt_for_state, calculate_urban_rural_fraction
>>> census_ua = load_census_ua(census_ua_url)
>>> census_state = load_census_state(census_state_url)
>>> tht_ua, tht_state = load_dot_vmt_per_capita(tht_data_url)
>>> vmt_ua = calculate_vmt_for_ua(census_ua, tht_ua)
>>> vmt_state = calculate_vmt_for_state(census_state, tht_state)
>>> vmt_ua_perc, vmt_ra_perc = calculate_urban_rural_fraction(vmt_ua, vmt_state)
>>> vmt_ra_perc
{'AL': 0.5948209907004509, 'AZ': 0.4072678991599876, 'AR': 0.6782718495006606, 'CA': 0.2522036919615287, 'CO': 0.4365774632878532, 'CT': 0.18269470942707955, 'DE': 0.559915703167249, 'DC': 0, 'FL': 0.2923051798349785, 'GA': 0.42402572834199637, 'ID': 0.6723734528747444, 'IL': 0.28372082436406665, 'IN': 0.5298558463320291, 'IA': 0.7022578956589101, 'KS': 0.5763025206199697, 'KY': 0.8477578601781653, 'LA': 0.5246373942279056, 'ME': 0.7882474490409153, 'MD': 0.3333401978676642, 'MA': 0.11947580340547559, 'MI': 0.3789053832349696, 'MN': 0.5432410752038628, 'MS': 0.7266752412880431, 'MO': 0.5071803635012153, 'MT': 0.8600569689974888, 'NE': 0.6739144929625285, 'NV': 0.399831570902325, 'NH': 0.6009439272679777, 'NJ': 0.33585021596160347, 'NM': 0.6848299106793774, 'NY': 0.19194166956193792, 'NC': 0.5092809432132339, 'ND': 0.8361712780525931, 'OH': 0.40514557017474595, 'OK': 0.5972732841621138, 'OR': 0.5636801026484732, 'PA': 0.3984935379624328, 'RI': 0.05679462197338614, 'SC': 0.588343323457865, 'SD': 0.8225877011609205, 'TN': 0.5049657185947951, 'TX': 0.4028581131849771, 'UT': 0.38149202467572874, 'VT': 0.8391650467810423, 'VA': 0.4227252560298854, 'WA': 0.364071745825326, 'WV': 0.7005357353635477, 'WI': 0.6074652247653839, 'WY': 0.8867296939626887}
>>> vmt_ua_perc["AL"]
Birmingham, AL         0.144348
Florence, AL           0.011524
Mobile, AL             0.060350
Pensacola, FL-AL       0.001210
Gadsden, AL            0.013453
Auburn, AL             0.013323
Montgomery, AL         0.049128
Decatur, AL            0.006910
Anniston-Oxford, AL    0.015453
Huntsville, AL         0.045015
Tuscaloosa, AL         0.021803
Dothan, AL             0.014312
Columbus, GA-AL        0.008350
Name: Annual VMT, dtype: float64

Note

We will need to go over the data intake procedure for the input files

Time estimate

20min

prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py

BainanXia · 2023-01-20T22:17:49Z

prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py

+    for s in census_ua:
+        census_ua_format = census_ua[s].copy()
+        for p in pattern:
+            census_ua_format.index = census_ua_format.index.str.replace(
+                p, "", regex=True
+            )
+        common = set(tht_ua_format.index).intersection(set(census_ua_format.index))
+        vmt_for_ua[s] = (
+            pd.DataFrame(
+                {
+                    "Annual VMT": [
+                        365 * tht_ua_format.loc[i] * census_ua_format.loc[i]
+                        for i in common
+                    ]
+                },
+                index=list(common),
+            )
+            .rename(index=format2original)
+            .squeeze()
+        )


Maybe we can construct a data frame with UA only given the state abbreviations are suffixes of UA names, i.e. pd.concat(census_ua.values()), then we should be able to do the multiplication directly with tht_ua_format?

The problem is that the UA names are not unique. For example you have:

>>> vmt_ua = calculate_vmt_for_ua(census_ua, tht_ua) >>> vmt_ua["OH"] Cleveland, OH 1.450011e+10 Toledo, OH-MI 4.215456e+09 Parkersburg, WV-OH 6.109252e+07 Canton, OH 2.109291e+09 Middletown, OH 7.553366e+08 Huntington, WV-KY-OH 2.850411e+08 Springfield, OH 7.482752e+08 Lorain-Elyria, OH 2.271439e+09 Wheeling, WV-OH 3.209512e+08 Youngstown, OH-PA 3.122444e+09 Dayton, OH 6.994014e+09 Akron, OH 5.258336e+09 Columbus, OH 1.153874e+10 Mansfield, OH 6.816990e+08 Weirton-Steubenville, WV-OH-PA 3.434335e+08 Lima, OH 6.162137e+08 Newark, OH 5.972769e+08 Cincinnati, OH-KY-IN 1.258937e+10 Name: Annual VMT, dtype: float64 >>> vmt_ua["PA"] Hagerstown, MD-WV-PA 7.916503e+07 Harrisburg, PA 3.792602e+09 Johnstown, PA 3.919511e+08 Lebanon, PA 3.038400e+08 Lancaster, PA 2.520083e+09 Uniontown-Connellsville, PA 3.988131e+08 York, PA 1.561355e+09 Cumberland, MD-WV-PA 2.896863e+05 Scranton, PA 2.841143e+09 Williamsport, PA 6.703207e+08 Allentown, PA-NJ 4.718352e+09 Youngstown, OH-PA 3.541347e+08 Pottstown, PA 5.967203e+08 Altoona, PA 5.087245e+08 Reading, PA 1.739473e+09 Weirton-Steubenville, WV-OH-PA 2.598250e+06 Pittsburgh, PA 1.207367e+10 Hazleton, PA 3.577484e+08 State College, PA 3.945960e+08 Philadelphia, PA-NJ-DE-MD 2.501896e+10 Binghamton, NY-PA 2.692702e+07 Monessen-California, PA 4.604961e+08 Erie, PA 1.063796e+09 Name: Annual VMT, dtype: float64 >>> vmt_ua["WV"] Parkersburg, WV-OH 4.803244e+08 Hagerstown, MD-WV-PA 5.980238e+08 Huntington, WV-KY-OH 9.474756e+08 Wheeling, WV-OH 5.430394e+08 Morgantown, WV 4.670725e+08 Charleston, WV 2.317763e+09 Weirton-Steubenville, WV-OH-PA 2.638600e+08 Cumberland, MD-WV-PA 2.101628e+07 Name: Annual VMT, dtype: float64

For example, Weirton-Steubenville, WV-OH-PA is one UA (belonging to 3 diiferent states with 3 different values)

I thought we will have Weirton-Steubenville, WV, Weirton-Steubenville, OH and Weirton-Steubenville, PA respectively and each (UA, State) pair is unique.

The other challenge here is the shapefiles for the spatial translation. From what Dan Muldrew tested the other day, I believe the UAs in those files are still the whole Weirton-Steubenville, WV-OH-PA name, so we did not want to lose the full string, at least for now

@BainanXia , that said, you're correct that a cleaner way to do this would be Weirton-Steubenville, WV, Weirton-Steubenville, OH and Weirton-Steubenville, PA
If we can confirm that we won't break other processes, that would certainly be a preference.
We'll indeed have the UA and the specific state in its own column in the csv outputs later, fyi

I was thinking we break such multi-state UAs into UA in each state for both population and VMT per capita (even though the latter is the same among each state). Then we will get unique pairs of UA+state so that we could get rid of the multi-layer structure, {state: UA within the state, ...}. But yeah, it doesn't matter as long as it works.

prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py

BainanXia · 2023-01-20T22:31:15Z

prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py

+    for s in vmt_for_ua:
+        if s in vmt_for_state.index:
+            vmt_for_ua_perc[s] = vmt_for_ua[s] / vmt_for_state.loc[s]
+            vmt_for_ra_perc[s] = 1 - vmt_for_ua_perc[s].sum()
+        else:
+            vmt_for_ua_perc[s] = 1
+            vmt_for_ra_perc[s] = 0


Maybe we could do pd.Series({s:v.sum() for s, v in vmt_for_ua.items()}) then using series operations to get the results in one shot.

Not sure I understand why we would want to sum over all urban areas in a state. At this point of the calculation we want to get, for each urban area in a state, the fraction of VMT with respect to the total in the state:

>>> vmt_ua_perc["AL"] Birmingham, AL 0.144348 Pensacola, FL-AL 0.001210 Mobile, AL 0.060350 Florence, AL 0.011524 Tuscaloosa, AL 0.021803 Decatur, AL 0.006910 Anniston-Oxford, AL 0.015453 Dothan, AL 0.014312 Huntsville, AL 0.045015 Gadsden, AL 0.013453 Columbus, GA-AL 0.008350 Montgomery, AL 0.049128 Auburn, AL 0.013323 Name: Annual VMT, dtype: float64

and the percentage of VMT is simply the rmainder

Either way, I was thinking rural VMT = state total VMT - sum(each UA VMT within the state), so that we have the first term and second term aligned by state on the right hand side, which can be done without looping through the state. But yeah, it doesn't matter as long as it works;)

BainanXia

Clean and tidy. Thanks

dmuldrew · 2023-01-23T22:42:45Z

prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py

+        if s in vmt_for_state.index:
+            vmt_for_ua_perc[s] = vmt_for_ua[s] / vmt_for_state.loc[s]
+            vmt_for_ra_perc[s] = 1 - vmt_for_ua_perc[s].sum()
+        else:


maybe add a comment that this condition is for DC

dmuldrew · 2023-01-24T20:14:37Z

prereise/gather/demanddata/transportation_electrification/generate_scaling_factors.py

+    for s in vmt_for_ua:
+        if s in vmt_for_state.index:
+            vmt_for_ua_perc[s] = vmt_for_ua[s] / vmt_for_state.loc[s]
+            vmt_for_ra_perc[s] = 1 - vmt_for_ua_perc[s].sum()


Might check that these values are between 0 and 1 in case there is an issue with data? Or perhaps a warning message?

All good:

>>> for s, v in vmt_ua_perc.items(): ... if s != "DC": ... print(f"{s}: {v.max()}") ... else: ... print(f"{s}: {v}") ... AL: 0.14434760346942355 AK: 0.26437207277844105 AZ: 0.44991303311393605 AR: 0.14908502626966288 CA: 0.2982427206720955 CO: 0.36397376132480563 CT: 0.26032021034444786 DE: 0.3423880212011323 DC: 1 FL: 0.22618050635128556 GA: 0.4205194715339222 HI: 0.4555934566561944 ID: 0.15263531331780308 IL: 0.5686896022596508 IN: 0.21845602247508317 IA: 0.11619400114840563 KS: 0.23042077253226498 KY: 0.06590201248930878 LA: 0.1167878455075878 ME: 0.11716040474540913 MD: 0.33409851003528707 MA: 0.5934979073891866 MI: 0.3462470055304457 MN: 0.4069880542887952 MS: 0.14115482249519 MO: 0.2858434202091423 MT: 0.06639716734171276 NE: 0.22775869035927454 NV: 0.48426477080479685 NH: 0.13401929285747738 NJ: 0.47782865005090186 NM: 0.21564355353847547 NY: 0.5389666383119348 NC: 0.10540342948979183 ND: 0.08904652146843647 OH: 0.1287217813988377 OK: 0.2354443740440485 OR: 0.29558155277403003 PA: 0.25133818104564754 RI: 0.9151826365467751 SC: 0.09462195359436316 SD: 0.11024035295479549 TN: 0.1736666305348097 TX: 0.179917688878257 UT: 0.28819452292365755 VT: 0.16083495321895766 VA: 0.20818156599974846 WA: 0.4318657588567066 WV: 0.12309620751263774 WI: 0.18486930899464962 WY: 0.0705196807271032

rouille requested review from dmuldrew, danlivengood, BainanXia and jenhagg January 20, 2023 21:16

rouille self-assigned this Jan 20, 2023

BainanXia reviewed Jan 20, 2023

View reviewed changes

rouille force-pushed the ben/scaling branch from 1132eed to 6879b9c Compare January 21, 2023 06:42

BainanXia approved these changes Jan 21, 2023

View reviewed changes

dmuldrew reviewed Jan 23, 2023

View reviewed changes

rouille force-pushed the ben/scaling branch from 6879b9c to 6570451 Compare January 23, 2023 23:35

feat: calculate urban and rural annual VMT fractions

a56de59

rouille force-pushed the ben/scaling branch from 6570451 to a56de59 Compare January 24, 2023 20:03

dmuldrew reviewed Jan 24, 2023

View reviewed changes

rouille merged commit 99f3688 into transportation_electrification Jan 24, 2023

rouille deleted the ben/scaling branch January 24, 2023 21:24

rouille added a commit that referenced this pull request Feb 1, 2023

feat: calculate urban and rural annual VMT fractions (#333)

8d28d0c

rouille added a commit that referenced this pull request Feb 1, 2023

feat: calculate urban and rural annual VMT fractions (#333)

8b9edad

rouille mentioned this pull request Feb 9, 2023

Generate files for regional scaling factors for urban and rural areas #344

Merged

jenhagg pushed a commit that referenced this pull request Feb 23, 2023

feat: calculate urban and rural annual VMT fractions (#333)

5b24318

jenhagg pushed a commit that referenced this pull request Feb 24, 2023

feat: calculate urban and rural annual VMT fractions (#333)

41d1af0

rouille added a commit that referenced this pull request Feb 24, 2023

feat: calculate urban and rural annual VMT fractions (#333)

dd187f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate urban and rural annual VMT fractions #333

Calculate urban and rural annual VMT fractions #333

rouille commented Jan 20, 2023 •

edited

Loading

BainanXia Jan 20, 2023

rouille Jan 20, 2023

BainanXia Jan 20, 2023

danlivengood Jan 20, 2023

danlivengood Jan 20, 2023

BainanXia Jan 21, 2023

BainanXia Jan 20, 2023

rouille Jan 21, 2023

BainanXia Jan 21, 2023

BainanXia left a comment

dmuldrew Jan 23, 2023

rouille Jan 23, 2023

dmuldrew Jan 24, 2023 •

edited

Loading

rouille Jan 24, 2023

dmuldrew Jan 24, 2023

Calculate urban and rural annual VMT fractions #333

Calculate urban and rural annual VMT fractions #333

Conversation

rouille commented Jan 20, 2023 • edited Loading

Purpose

What the code is doing

Testing

Where to look

Usage Example/Visuals

Note

Time estimate

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BainanXia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmuldrew Jan 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rouille commented Jan 20, 2023 •

edited

Loading

dmuldrew Jan 24, 2023 •

edited

Loading