You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The GUD datasets currently include only the release version of Fenix and they ignore any data in the org_mozilla_fenix_nightly_stable dataset. We should probably build in that support.
But it brings up a bigger question of how we want to present Fenix data to users. Should we have separate ETL pathways for the different source tables, unioning together the final results? Or should we union together these different channels as early as possible?
We could alter the org_mozilla_fenix.baseline view to be a union of the release and nightly tables, setting the normalized_channel field to "release" for rows coming from the one tables and "nightly" for rows coming from the other table. That approach would be vulnerable if there's schema drift between the two tables; it's not clear to me whether the probes are sourced independently for the different fenix channels or if we should always expect the schemas to match exactly. If the schemas ever didn't match, the view would return errors, which would be a bad user experience.
It would certainly be possible to union the two tables at the clients_daily level and let rows from nightly flow through that way.
Or we could duplicate all the queries from Fenix release to Fenix nightly. This is the purest solution, but leads to code duplication and proliferation of tasks in Airflow.
The GUD datasets currently include only the release version of Fenix and they ignore any data in the
org_mozilla_fenix_nightly_stable
dataset. We should probably build in that support.But it brings up a bigger question of how we want to present Fenix data to users. Should we have separate ETL pathways for the different source tables, unioning together the final results? Or should we union together these different channels as early as possible?
We could alter the
org_mozilla_fenix.baseline
view to be a union of the release and nightly tables, setting thenormalized_channel
field to "release" for rows coming from the one tables and "nightly" for rows coming from the other table. That approach would be vulnerable if there's schema drift between the two tables; it's not clear to me whether the probes are sourced independently for the different fenix channels or if we should always expect the schemas to match exactly. If the schemas ever didn't match, the view would return errors, which would be a bad user experience.It would certainly be possible to union the two tables at the clients_daily level and let rows from nightly flow through that way.
Or we could duplicate all the queries from Fenix release to Fenix nightly. This is the purest solution, but leads to code duplication and proliferation of tasks in Airflow.
cc @fbertsch @relud
The text was updated successfully, but these errors were encountered: