Skip to content

Sprint 3: Complete ML workflow from start to finish

Past due by almost 2 years 85% complete

By the end of this sprint, we'd like to be able to run a workflow from start to finish, i.e. downloading, loading, transposing, dimensionality reduction (by time averaging), machine learning, and cross-validation.

Suggested workflow:

Datasets:
- NPN (which species?)
- MODIS (which variables/bands/products?)
- Daymet (which variables?)
Area:
- BBOX with ~1…

By the end of this sprint, we'd like to be able to run a workflow from start to finish, i.e. downloading, loading, transposing, dimensionality reduction (by time averaging), machine learning, and cross-validation.

Suggested workflow:

Datasets:
- NPN (which species?)
- MODIS (which variables/bands/products?)
- Daymet (which variables?)
Area:
- BBOX with ~15 stations inside it and several modis pixels (Dakota?)
- Specify bbox --> get NPN stations --> get modis and daymet pixels for those stations
Time:
- 2015 - 2020
- Monthly averages (for now)
Train/test splitting:
- Random / shuffle split (70% train 30% test)
ML (hardcoded/default hyperparameters):
- scikit learn (linear regression)
- scikit learn (random forest)
- merf
- EBM
Output:
- Score: RMSE, MAE
- Print parameters or persist model to disk

Loading