Sprint 3: Complete ML workflow from start to finish
By the end of this sprint, we'd like to be able to run a workflow from start to finish, i.e. downloading, loading, transposing, dimensionality reduction (by time averaging), machine learning, and cross-validation.
Suggested workflow:
Datasets:
- NPN (which species?)
- MODIS (which variables/bands/products?)
- Daymet (which variables?)
Area:
- BBOX with ~1…
By the end of this sprint, we'd like to be able to run a workflow from start to finish, i.e. downloading, loading, transposing, dimensionality reduction (by time averaging), machine learning, and cross-validation.
Suggested workflow:
Datasets:
- NPN (which species?)
- MODIS (which variables/bands/products?)
- Daymet (which variables?)
Area:
- BBOX with ~15 stations inside it and several modis pixels (Dakota?)
- Specify bbox --> get NPN stations --> get modis and daymet pixels for those stations
Time:
- 2015 - 2020
- Monthly averages (for now)
Train/test splitting:
- Random / shuffle split (70% train 30% test)
ML (hardcoded/default hyperparameters):
- scikit learn (linear regression)
- scikit learn (random forest)
- merf
- EBM
Output:
- Score: RMSE, MAE
- Print parameters or persist model to disk