-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add derived stub attribution logs #4557
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
78618d3
Add derived stub attribution logs
fbertsch 8230df0
Rename view
fbertsch f7bf998
Use correct dataset name in view
fbertsch 8cd7312
Skip dryrun; no access
fbertsch b44fa60
Merge branch 'main' into stub_attribution_triplets
fbertsch File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
sql/moz-fx-data-shared-prod/stub_attribution_service/dataset_metadata.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
friendly_name: Stub Attribution Service | ||
description: |- | ||
Stub attribution service data, usually from the logs. | ||
dataset_base_acl: view_restricted | ||
user_facing: true | ||
labels: {} | ||
default_table_workgroup_access: | ||
- role: roles/bigquery.dataViewer | ||
members: | ||
- workgroup:data-science/duet | ||
workgroup_access: | ||
- role: roles/bigquery.dataViewer | ||
members: | ||
- workgroup:data-science/duet |
7 changes: 7 additions & 0 deletions
7
sql/moz-fx-data-shared-prod/stub_attribution_service/dl_token_ga_attribution_lookup/view.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
CREATE OR REPLACE VIEW | ||
`moz-fx-data-shared-prod.stub_attribution_service.dl_token_ga_attribution_lookup` | ||
AS | ||
SELECT | ||
* | ||
FROM | ||
`moz-fx-data-shared-prod.stub_attribution_service_derived.dl_token_ga_attribution_lookup_v1` |
15 changes: 15 additions & 0 deletions
15
sql/moz-fx-data-shared-prod/stub_attribution_service_derived/dataset_metadata.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
friendly_name: Stub Attribution Service Derived | ||
description: |- | ||
Stub Attribution Service data. | ||
Separated into a new dataset to ensure correct workgroup access. | ||
dataset_base_acl: derived_restricted | ||
user_facing: false | ||
labels: {} | ||
default_table_workgroup_access: | ||
- role: roles/bigquery.dataViewer | ||
members: | ||
- workgroup:data-science/duet | ||
workgroup_access: | ||
- role: roles/bigquery.dataViewer | ||
members: | ||
- workgroup:data-science/duet |
6 changes: 6 additions & 0 deletions
6
...shared-prod/stub_attribution_service_derived/dl_token_ga_attribution_lookup_v1/checks.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
#fail | ||
{{ is_unique(['dl_token', 'ga_client_id', 'stub_session_id']) }} | ||
|
||
#fail | ||
{{ min_row_count(1000) }} | ||
|
23 changes: 23 additions & 0 deletions
23
...red-prod/stub_attribution_service_derived/dl_token_ga_attribution_lookup_v1/metadata.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
friendly_name: DL Token GA Attribution Lookup | ||
description: |- | ||
This table lets you lookup GA attribution data for dl_tokens. | ||
|
||
1 row per-(dl_token, ga_client_id, stub_session_id) triplet. | ||
|
||
dl_token - Available in Stub Attribution Service and Telemetry | ||
ga_client_id - Available in Stub Attribution Service and GA | ||
stub_session_id - Available in Stub Attribution Service and GA | ||
owners: | ||
- [email protected] | ||
labels: | ||
incremental: true | ||
owner1: [email protected] | ||
scheduling: | ||
dag_name: bqetl_mozilla_org_derived | ||
date_partition_parameter: null | ||
parameters: ["download_date:DATE:{{ds}}"] | ||
bigquery: | ||
clustering: | ||
fields: [first_seen_date] | ||
references: {} | ||
deprecated: false |
35 changes: 35 additions & 0 deletions
35
...-shared-prod/stub_attribution_service_derived/dl_token_ga_attribution_lookup_v1/query.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
WITH historical_triplets AS ( | ||
SELECT | ||
dl_token, | ||
ga_client_id, | ||
stub_session_id, | ||
first_seen_date, | ||
FROM | ||
stub_attribution_service_derived.dl_token_ga_attribution_lookup_v1 | ||
), | ||
new_downloads AS ( | ||
SELECT DISTINCT | ||
mozfun.ga.nullify_string(jsonPayload.fields.dltoken) AS dl_token, | ||
mozfun.ga.nullify_string(jsonPayload.fields.visit_id) AS ga_client_id, | ||
mozfun.ga.nullify_string(jsonPayload.fields.session_id) AS stub_session_id, | ||
@download_date AS first_seen_date, | ||
FROM | ||
`moz-fx-stubattribut-prod-32a5`.stubattribution_prod.stdout | ||
WHERE | ||
DATE(timestamp) = @download_date | ||
) | ||
SELECT | ||
dl_token, | ||
ga_client_id, | ||
stub_session_id, | ||
-- Least and greatest return NULL if any input is NULL, so we coalesce each value first | ||
LEAST( | ||
COALESCE(_previous.first_seen_date, _current.first_seen_date), | ||
COALESCE(_current.first_seen_date, _previous.first_seen_date) | ||
) AS first_seen_date, | ||
FROM | ||
historical_triplets AS _previous | ||
FULL OUTER JOIN | ||
new_downloads AS _current | ||
USING | ||
(dl_token, ga_client_id, stub_session_id) |
17 changes: 17 additions & 0 deletions
17
...hared-prod/stub_attribution_service_derived/dl_token_ga_attribution_lookup_v1/schema.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
fields: | ||
- name: dl_token | ||
mode: NULLABLE | ||
type: STRING | ||
description: "A download token (dl_token). Associated with a single Firefox binary generated by the stub attribution service." | ||
- name: ga_client_id | ||
mode: NULLABLE | ||
type: STRING | ||
description: "Uniquely identifiers a GA client, using a cookie on moz.org." | ||
- name: stub_session_id | ||
mode: NULLABLE | ||
type: STRING | ||
description: "An ID identifying a single stub attribution session. Can be found in GA logs, in the 'Stub Session ID' Event." | ||
- name: first_seen_date | ||
mode: NULLABLE | ||
type: DATE | ||
description: "The first date we saw this triplet." |
46 changes: 46 additions & 0 deletions
46
...tribution_lookup_v1/moz-fx-stubattribut-prod-32a5.stubattribution_prod.stdout.schema.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
[ | ||
{ | ||
"fields": [ | ||
{ | ||
"fields": [ | ||
{ | ||
"mode": "NULLABLE", | ||
"name": "log_type", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"mode": "NULLABLE", | ||
"name": "visit_id", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"mode": "NULLABLE", | ||
"name": "dltoken", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"mode": "NULLABLE", | ||
"name": "session_id", | ||
"type": "STRING" | ||
} | ||
], | ||
"mode": "NULLABLE", | ||
"name": "fields", | ||
"type": "RECORD" | ||
}, | ||
{ | ||
"mode": "NULLABLE", | ||
"name": "timestamp", | ||
"type": "FLOAT" | ||
} | ||
], | ||
"mode": "NULLABLE", | ||
"name": "jsonPayload", | ||
"type": "RECORD" | ||
}, | ||
{ | ||
"mode": "NULLABLE", | ||
"name": "timestamp", | ||
"type": "TIMESTAMP" | ||
} | ||
] |
16 changes: 16 additions & 0 deletions
16
..._lookup_v1/stub_attribution_service_derived.dl_token_ga_attribution_lookup_v1.schema.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
- name: dl_token | ||
mode: NULLABLE | ||
type: STRING | ||
description: "A download token (dl_token). Associated with a single Firefox binary generated by the stub attribution service." | ||
- name: ga_client_id | ||
mode: NULLABLE | ||
type: STRING | ||
description: "Uniquely identifiers a GA client, using a cookie on moz.org." | ||
- name: stub_session_id | ||
mode: NULLABLE | ||
type: STRING | ||
description: "An ID identifying a single stub attribution session. Can be found in GA logs, in the 'Stub Session ID' Event." | ||
- name: first_seen_date | ||
mode: NULLABLE | ||
type: DATE | ||
description: "The first date we saw this triplet." |
12 changes: 12 additions & 0 deletions
12
...attribution_service_derived/dl_token_ga_attribution_lookup_v1/test_single_day/expect.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
- dl_token: dltoken_1 | ||
ga_client_id: ga_client_id_1 | ||
stub_session_id: stub_session_id_1 | ||
first_seen_date: 2023-03-31 | ||
- dl_token: dltoken_2 | ||
ga_client_id: also_present_today | ||
stub_session_id: stub_session_id_2 | ||
first_seen_date: 2023-01-01 | ||
- dl_token: dltoken_3 | ||
ga_client_id: only_present_historically | ||
stub_session_id: stub_session_id_3 | ||
first_seen_date: 2023-01-01 |
21 changes: 21 additions & 0 deletions
21
..._lookup_v1/test_single_day/moz-fx-stubattribut-prod-32a5.stubattribution_prod.stdout.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
- jsonPayload: | ||
fields: | ||
visit_id: ga_client_id_1 | ||
dltoken: dltoken_1 | ||
session_id: stub_session_id_1 | ||
log_type: download_started | ||
timestamp: '2023-03-31 01:16:43.101135 UTC' | ||
- jsonPayload: | ||
fields: | ||
visit_id: ga_client_id_1 | ||
dltoken: dltoken_1 | ||
session_id: stub_session_id_1 | ||
log_type: download_started | ||
timestamp: '2023-03-31 01:16:43.101135 UTC' | ||
- jsonPayload: | ||
fields: | ||
visit_id: also_present_today | ||
dltoken: dltoken_2 | ||
session_id: stub_session_id_2 | ||
log_type: download_started | ||
timestamp: '2023-03-31 01:16:43.101135 UTC' |
4 changes: 4 additions & 0 deletions
4
...ution_service_derived/dl_token_ga_attribution_lookup_v1/test_single_day/query_params.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
- name: download_date | ||
type: DATE | ||
value: 2023-03-31 |
8 changes: 8 additions & 0 deletions
8
...1/test_single_day/stub_attribution_service_derived.dl_token_ga_attribution_lookup_v1.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
- dl_token: dltoken_3 | ||
ga_client_id: only_present_historically | ||
stub_session_id: stub_session_id_3 | ||
first_seen_date: 2023-01-01 | ||
- dl_token: dltoken_2 | ||
ga_client_id: also_present_today | ||
stub_session_id: stub_session_id_2 | ||
first_seen_date: 2023-01-01 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whd I moved to a new dataset so we don't have to restrict the GA data. Does this setup for the derived and view datasets look good on your end to restrict dev access to just Airflow (unless I later add ppl to tf) and query access to just the duet wg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks correct from an access perspective in that it matches the uniform access archetype.
It isn't necessary to configure access this way, but it's probably the easiest to reason about. The other option is to leverage a single dataset with a restricted base ACL and specific workgroup-confidential tables configured and have
default_table_workgroup_access
set toworkgroup:mozilla-confidential
to allow for mixed ACL management. Table automation is all hooked up to support this and I'd be interested in seeing if it works (we haven't set up any deprecation configuration yet AFAIK) but splitting out the workgroup-confidential data like this PR does also works.