You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In discussion with @6a68, we've identified a subset of the FxA log data (amplitudeEvent messages) that we want to process in real time via Pub/Sub, but which don't need to be sent to BigQuery.
We are planning to keep the current Stackdriver BigQuery pipeline in place as the canonical source for scheduled queries to create derived tables, and these amplitudeEvents will be covered by that pipeline. But we also want to have Stackdriver's Pub/Sub output send these amplitudeEvents through the Decoder so that we get a chance to do more rigorous schema validation on the events before routing to amplitude, and we want to take advantage of the existing support for error output so that we don't drop non-conforming messages.
For a first pass, I think it's fine to simply let these amplitudeEvents also flow to a live table in BigQuery, even though they'll be a duplicate of rows loaded via the Stackdriver BQ integration; we can apply a short retention period to the associated stable table to reduce cost if needed. Longer-term, we may want to consider adding configuration in the pipeline to specify a subset of docTypes that are for PubSub output only.
In discussion with @6a68, we've identified a subset of the FxA log data (amplitudeEvent messages) that we want to process in real time via Pub/Sub, but which don't need to be sent to BigQuery.
We are planning to keep the current Stackdriver BigQuery pipeline in place as the canonical source for scheduled queries to create derived tables, and these amplitudeEvents will be covered by that pipeline. But we also want to have Stackdriver's Pub/Sub output send these amplitudeEvents through the Decoder so that we get a chance to do more rigorous schema validation on the events before routing to amplitude, and we want to take advantage of the existing support for error output so that we don't drop non-conforming messages.
For a first pass, I think it's fine to simply let these amplitudeEvents also flow to a live table in BigQuery, even though they'll be a duplicate of rows loaded via the Stackdriver BQ integration; we can apply a short retention period to the associated stable table to reduce cost if needed. Longer-term, we may want to consider adding configuration in the pipeline to specify a subset of docTypes that are for PubSub output only.
cc @whd @relud
The text was updated successfully, but these errors were encountered: