You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once upon a time we accidentally deployed all of the pre-consolidation EPA CEMS parquet files to our S3 bucket, and there are more than 1000 of them.
The paths to these objects now show up as possible parquet tables in the usage metrics, which clogs up the dashboard display/legend.
These paths should never have appeared and I think in most cases were never downloaded, so we can remove them from the logging data during the ETL and have a cleaner, simpler output to work with.
Currently these paths are being filtered out of the dashboard display because they have 0 downloads, but if we go back to showing all valid paths, they'll reappear.
They do appear in the dropdown on the lefthand side of the User Metrics dashboard (which is why there are 1600+ tables)
It's possible that there was an intentional deployment of a state-year partitioned version of the data at some point way back when before we settled on the current output format, but there shouldn't be any under nightly or stable or any of the existing versioned releases.
We should also keep an eye out for other accidentally deployed files. I think it was just the partitioned parquet files that were a problem, but it's possible there were others.
Success Criteria
No more zombie EPA CEMS parquet paths show up where they shouldn't.
post-ETL data
the dropdowns in the sidebar
the data visualizations
The content you are editing has changed. Please copy your edits and refresh the page.
Overview
nightly
orstable
or any of the existing versioned releases.Success Criteria
No more zombie EPA CEMS parquet paths show up where they shouldn't.
Next steps
The text was updated successfully, but these errors were encountered: