MPP-3352: Add first Glean metrics to measure email mask usage #4452

jwhitlock · 2024-02-28T19:50:57Z

Add Glean metrics for measuring P0 email mask usage. This includes email mask creation, label updates, and deletion, as well as receiving emails (forwarded or blocked) through those masks.

This is tracked as epic MPP-3352. The related bug for data review is bug 1882565.

Some parts of this PR:

Add the requirements glean_parser, to generate Python code from Glean YAML, and django-ipware, to extract the user's IP address from headers
Add a Glean metrics YAML file at telemetry/glean/relay-server-metrics.yaml. In the future, this folder can hold front-end or other application metrics, and specification for other telemetry methods.
Add the generated Python code at privaterelay/glean/server_events.py
Add a CircleCI job to confirm that the generated Python code has not been modified, as well as code to regenerate the file after a requirement or metrics update.
Add privaterelay/glean_interface.py to simplify data extraction and opt-out checking, and to publish events as log entries formatted with MozLog. Add tests for the emitted logs.
Add Profile properties to determine the Relay user's subscription from current data. In the future, we may query Mozilla Accounts for more detailed information on the user's plan term (monthly or yearly).
Add glean event hooks to the API and email processing. Add tests for the emitted logs.

Glean Data

This adds five Glean events that have been identified as critical for creating a user engagement model for the email mask feature:

email_mask.created - An email mask is created
email_mask.label_updated - An email mask's label is updated
email_mask.deleted - An email mask is deleted
email.forwarded - An email to or from an email mask is forwarded
email.blocked - An email to or from an email mask is not forwarded

Some data is sent with each event type, identified with a * below. Others are specific to event types.

Event	Measurement Name	Measurement Description
`*`	`#client_id`	Firefox client ID
`*`	`#fxa_id`	Mozilla accounts user ID
`*`	`#platform`	Relay client platform
`*`	`#n_random_masks`	Number of random masks
`*`	`#n_domain_masks`	Number of premium subdomain masks
`*`	`#n_deleted_random_masks`	Number of deleted random masks
`*`	`#n_deleted_domain_masks`	Number of deleted domain masks
`*`	`#date_joined_relay`	Timestamp for joining Relay, seconds since epoch
`*`	`#premium_status`	Subscription type and term
`*`	`#date_joined_premium`	Timestamp for starting `premium_status` subscription, seconds since epoch, -1 if not subscribed
`*`	`#has_extension`	The user has the Relay Add-on
`*`	`#date_got_extension`	Timestamp for adding Relay Add-on, seconds since epoch, -1 if not used
`*`	`#mask_id`	Mask ID, 'R' or 'D' followed by a number
`*`	`#is_random_mask`	The mask is a random mask, instead of a domain mask
`email_mask.created`	`#created_by_api`	The mask was created via the API, rather than an incoming email
`email_mask.created`	`#has_website`	The mask was created by the Add-on or integration on a website
`email.*`	`#is_reply`	The email is a reply from the Relay user
`email.blocked`	`#reason`	Code describing why the email was blocked
`email.blocked`	`#can_retry`	The email processes can retry the email, so it may fail multiple times

How to test:

Install the requirements
Run the script to regenerate the Python interface:
bash .circleci/python_job.bash run build_glean
- There are no errors. git does not show any changed files
Run the script to check the Python generated code:
bash .circleci/python_job.bash run check_glean
- There are no errors. Final output is ✓ Files are identical

Run the server. Log in as a user with Mozilla accounts telemetry turned on (the default). Create a mask.

Among others, there should be a glean-server-event log, followed by a request.summary log, entry like:

{"Timestamp": 1709148528194074112, "Type": "glean-server-event", "Logger": "fx-private-relay", "Hostname": "jwhitlock-MB-M2.local", "EnvVersion": "2.0", "Severity": 6, "Pid": 87746, "Fields": {"document_namespace": "relay-backend", "document_type": "events", "document_version": "1", "document_id": "695328d4-24fb-42b3-bd4d-853e92efbc0d", "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0", "ip_address": "", "payload": "{\"metrics\": {}, \"events\": [{\"category\": \"email_mask\", \"name\": \"created\", \"extra\": {\"client_id\": \"\", \"fxa_id\": \"fac5c63eb1c04dd4b9b0192f1a5483cb\", \"platform\": \"\", \"n_random_masks\": \"4\", \"n_domain_masks\": \"0\", \"n_deleted_random_masks\": \"1\", \"n_deleted_domain_masks\": \"0\", \"date_joined_relay\": \"1709058088\", \"premium_status\": \"email_unknown\", \"date_joined_premium\": \"-1\", \"has_extension\": \"false\", \"date_got_extension\": \"-1\", \"mask_id\": \"R7\", \"is_random_mask\": \"true\", \"created_by_api\": \"true\", \"has_website\": \"false\"}, \"timestamp\": 1709148528194}], \"ping_info\": {\"seq\": 0, \"start_time\": \"2024-02-28T19:28:48.194018+00:00\", \"end_time\": \"2024-02-28T19:28:48.194018+00:00\"}, \"client_info\": {\"telemetry_sdk_build\": \"glean_parser v12.0.1\", \"first_run_date\": \"Unknown\", \"os\": \"Unknown\", \"os_version\": \"Unknown\", \"architecture\": \"Unknown\", \"app_build\": \"Unknown\", \"app_display_version\": \"unknown\", \"app_channel\": \"local\"}}", "msg": "glean-server-event"}}
{"Timestamp": 1709148528194563072, "Type": "request.summary", "Logger": "fx-private-relay", "Hostname": "jwhitlock-MB-M2.local", "EnvVersion": "2.0", "Severity": 6, "Pid": 87746, "Fields": {"errno": 0, "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0", "lang": "en-US,en;q=0.5", "method": "POST", "path": "/api/v1/relayaddresses/", "uid": 3, "rid": "677474f3-172b-4e06-81f0-60ef2a558d8d", "t": 61}}

Change the label of the mask

Among others, there should be a glean-server-event log, followed by a request.summary log, like:

{"Timestamp": 1709148608654010880, "Type": "glean-server-event", "Logger": "fx-private-relay", "Hostname": "jwhitlock-MB-M2.local", "EnvVersion": "2.0", "Severity": 6, "Pid": 87746, "Fields": {"document_namespace": "relay-backend", "document_type": "events", "document_version": "1", "document_id": "374e0998-ec59-46e9-a5f3-f26f7bdb3ece", "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0", "ip_address": "", "payload": "{\"metrics\": {}, \"events\": [{\"category\": \"email_mask\", \"name\": \"label_updated\", \"extra\": {\"client_id\": \"\", \"fxa_id\": \"fac5c63eb1c04dd4b9b0192f1a5483cb\", \"platform\": \"\", \"n_random_masks\": \"4\", \"n_domain_masks\": \"0\", \"n_deleted_random_masks\": \"1\", \"n_deleted_domain_masks\": \"0\", \"date_joined_relay\": \"1709058088\", \"premium_status\": \"email_unknown\", \"date_joined_premium\": \"-1\", \"has_extension\": \"false\", \"date_got_extension\": \"-1\", \"mask_id\": \"R7\", \"is_random_mask\": \"true\"}, \"timestamp\": 1709148608653}], \"ping_info\": {\"seq\": 0, \"start_time\": \"2024-02-28T19:30:08.653926+00:00\", \"end_time\": \"2024-02-28T19:30:08.653926+00:00\"}, \"client_info\": {\"telemetry_sdk_build\": \"glean_parser v12.0.1\", \"first_run_date\": \"Unknown\", \"os\": \"Unknown\", \"os_version\": \"Unknown\", \"architecture\": \"Unknown\", \"app_build\": \"Unknown\", \"app_display_version\": \"unknown\", \"app_channel\": \"local\"}}", "msg": "glean-server-event"}}
{"Timestamp": 1709148608654702080, "Type": "request.summary", "Logger": "fx-private-relay", "Hostname": "jwhitlock-MB-M2.local", "EnvVersion": "2.0", "Severity": 6, "Pid": 87746, "Fields": {"errno": 0, "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0", "lang": "en-US,en;q=0.5", "method": "PATCH", "path": "/api/v1/relayaddresses/7/", "uid": 3, "rid": "d041f4b3-376d-40a0-8188-96073ab894aa", "t": 50}}

In the profile menu, select "Manage your account" to go to Mozilla accounts. Under "Data Collection and Use", "Help improve ⁨Mozilla accounts⁩", turn from "On" to "Off". Return to Relay, sign out and sign in again. Update the label for a mask.

There is no glean-server-event log, just the request.summary log, like:

{"Timestamp": 1709148843867493888, "Type": "request.summary", "Logger": "fx-private-relay", "Hostname": "jwhitlock-MB-M2.local", "EnvVersion": "2.0", "Severity": 6, "Pid": 87746, "Fields": {"errno": 0, "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0", "lang": "en-US,en;q=0.5", "method": "GET", "path": "/api/v1/relayaddresses/", "uid": 3, "rid": "0abf6db1-7bd6-476f-a688-f73c9f38297e", "t": 5}}

groovecoder

Looked thru all the code; still need to test and spot-check. Biggest blocker is that we should NOT send a glean event when the email is blocked because of an error. We should only send a glean even when the email is blocked because of a user action.

api/tests/views_tests.py

api/views/__init__.py

groovecoder · 2024-03-04T16:37:39Z

emails/models.py

+    @property
+    def plan(self) -> Literal["free", "email", "phone", "bundle"]:
+        """The user's Relay plan as a string."""
+        if self.has_premium:
+            if self.has_phone:
+                return "bundle" if self.has_vpn else "phone"
+            else:
+                return "email"
+        else:
+            return "free"


question (non-blocking): should we consider using @cached_property for this? I seem to remember that has_premium or has_phone can be quite slow when they access self.fxa?

This function is doing very little that would justify caching. If has_premium and has_phone are slow, should they be the cached properties? Or fxa?

Possibly, but I think caching those caused some other stale data issues last time we tried it. I thought maybe caching at this level wouldn't cause those issues?

emails/models.py

privaterelay/glean_interface.py

groovecoder · 2024-03-04T19:10:27Z

telemetry/glean/relay-server-metrics.yaml

+    data_reviews:
+      - https://bugzilla.mozilla.org/show_bug.cgi?id=1882565
+    notification_emails:
+      - [email protected]


suggestion (non-blocking): [email protected] ?

The Glean documents have no guidance, so I reached out to internal Slack channel. According to @perrymcmanis144, we have to have a single person for metrics that do not expire. We could add a mailing list, but not replace the single person email.

However, [email protected] doesn't look like a good choice to me. There are many people on that list, including people no longer on the project. Once I have some experience with the kinds of emails sent to this address, we could add a maintained list with the people who should get notices and can do something about them.

Sorry for the drive-by, I just wanted to add some additional context to what Perry shared with you.

The Glean docs lack guidance here because this is a Data Stewarding concern (see the wiki on Data Collection for more info). That being said, I'll file a bug in Glean to try and add some breadcrumbs to this information to make it easier to understand the expectations here.

Permanent data-collection policy from Trust & Legal currently requires an IC to be responsible for the collection, in order to audit it from time to time to determine that it is still useful and needed. Temporary collections that expire don't run the same risk of indefinitely collecting some data that might not be useful or problematic.

As a side-note, many teams/project create a specific group email to serve as a backup for an IC bearing this responsibility (should the IC leave or not be available, etc.).

telemetry/glean/relay-server-metrics.yaml

groovecoder · 2024-03-04T19:26:17Z

privaterelay/tests/glean_tests.py

+    """Check and return parts of the Glean payload that vary over test runs."""
+    event_ts_ms = payload["events"][0]["timestamp"]
+    event_time = datetime.fromtimestamp(event_ts_ms / 1000.0)
+    assert 0 < (datetime.now() - event_time).total_seconds() < 0.5


question (non-blocking): why do we assert that the event time must have been less than 0.5 seconds ago?

The timestamp should be in milliseconds. I'm trying to find a way to say "if we calculate the timestamp wrong, then it will be off by a factor of 1,000". This only tests the x1000 bigger, however, not the x1000 smaller. Maybe it is not worth it.

jwhitlock · 2024-03-06T21:41:45Z

@groovecoder, this is ready for re-review. Changes since the last review:

The data review in bug 1882565 was approved. I'll return to that with the changes made since the review opened.
I added the user to create_expected_glean_event, to centralize the boilerplate for checking user-derived fields
I created an AddressViewSet base class to store the generalized perform_create, perform_update, and perform_destroy methods. I could not figure out a way to do this as a mixin that made mypy happy.
Add comments for the leading 'R' and 'D' for mask_id metrics.
Fixed a typo users → user
Use a new function log_email_dropped to log details, with message email_dropped, for emails dropped for reasons other than mask blocking settings. Tests are updated to check for the new logs.
Glean email_blocked events are now only for emails dropped due to mask blocking settings. Dropped can_retry from the event extra fields, since these can not be re-tried like some errors.
Add some comments around turning Glean event time fields into Python datetimes.
Add # pragma: no cover for some code that only runs when tests fail
Fix a test that was exiting early instead of running all assertions.

I've opened two issues for follow-on work:

MPP-3764 - Store subscription details for users
MPP-3765 - Distinguish between Firefox Integration calls and other API calls

groovecoder · 2024-03-07T14:16:17Z

emails/tests/views_tests.py

+            assert drop_log is None, "duplicate email_dropped log entry"
+            drop_log = record
+    assert drop_log is not None, "email_dropped log entry not found."
+    assert drop_log.levelno == logging.INFO


question (non-blocking): This looks like emails are logged as "dropped" when there's an error, so should they be logged at logging.ERROR level instead?

I only use ERROR logging if I want SREs to monitor and to get tracking Sentry issues. I didn't want that for these, but do you?

groovecoder · 2024-03-07T15:28:11Z

emails/tests/views_tests.py

@@ -724,6 +810,16 @@ def successful_reply_test_implementation(
        assert (last_en := self.relay_address.user.profile.last_engagement) is not None
        assert last_en > self.pre_reply_last_engagement

+    def assert_my_log_email_dropped(


quibble:

Suggested change

def assert_my_log_email_dropped(

def assert_reply_log_email_dropped(

I like it, thanks! I went with assert_log_reply_email_dropped

groovecoder · 2024-03-07T16:57:32Z

emails/tests/views_tests.py

+            event_time=timestamp,
+        )
+
+    def assert_my_log_email_dropped(


quibble:

Suggested change

def assert_my_log_email_dropped(

def assert_bounce_log_email_dropped(

There's more than bounce tests, so I went with assert_log_incoming_email_dropped

groovecoder · 2024-03-07T20:00:26Z

privaterelay/tests/glean_tests.py

+        },
+    )
+    request_data = RequestData.from_request(request)
+    assert request_data.user_agent == "Mozilla/5.0 Firefox/125.0"


quibble (non-blocking): test name says this test is for ip - do we need to assert on the user agent?

I added a test for just the user agent, and dropped the user agent from the IP tests.

The initial metrics for measuring mask usage are in telemetry/glean/relay-server-metrics.yaml. This new folder can be used for front-end Glean metrics and other metrics specifications. glean_parser is used to generate privaterelay/glean/server_events.py and the empty __init__.py. The generated file is post-processed with black. A function in .circleci/python_job.bash has the generation commands and the post-processing commands. This is used by a new job to validate in CircleCI that the generated file was not hand-modified. It can also be used to regenerate the file locally (tested on Linux and macOS).

These should eventually be stored on the profile, but for now these properties provide the known subscription status for the user.

RelayGleanLogger extends the EventsServerEventLogger to emit records as logs instead of print() statements. It takes data as User and *Address models, and extracts the data needed for the .record_* methods.

Pass the related user to the test helper create_expected_glean_event, so that the user-specific values such as fxa_id and date_joined_relay can be extracted in the helper rather than each test function.

AddressViewSet is a parent class for RelayAddressViewSet and DomainAddressViewSet, to avoid duplicating glean event logging.

Because RelayAddress and DomainAddress are in different tables, a row ID might be used between them. Use a leading letter "R" or "D" to distinguish between the two.

Send Glean events for blocked emails due to mask setting, such as block all or block promotional. Emit non-Glean logs for other reasons emails are blocked.

Coverage showed the (incorrect) metrics tests was never executed.

jwhitlock requested review from groovecoder and say-yawn February 28, 2024 19:57

groovecoder requested changes Mar 4, 2024

View reviewed changes

jwhitlock force-pushed the add-glean-mpp-3352 branch from bf15260 to f84a5ae Compare March 5, 2024 20:42

jwhitlock requested a review from groovecoder March 6, 2024 21:45

groovecoder approved these changes Mar 7, 2024

View reviewed changes

jwhitlock added 23 commits March 8, 2024 10:59

Make RELAY_CHANNEL_NAME a public type

a53db83

Add plan data as profile properties

8bde20b

These should eventually be stored on the profile, but for now these properties provide the known subscription status for the user.

Add metrics_id properties to masks

2257058

Add RelayGleanLogger, tests

f9f48c1

RelayGleanLogger extends the EventsServerEventLogger to emit records as logs instead of print() statements. It takes data as User and *Address models, and extracts the data needed for the .record_* methods.

Add glean events to API

9bbfe15

Add glean events to email processing

88e4fbb

Do not emit metrics if the user has opted-out

69c6509

Add bug URLs to Glean metrics YAML

9f3c33c

Pass user to create_expected_glean_event

ef4d5de

Pass the related user to the test helper create_expected_glean_event, so that the user-specific values such as fxa_id and date_joined_relay can be extracted in the helper rather than each test function.

Create AddressViewSet base class

3106cc6

AddressViewSet is a parent class for RelayAddressViewSet and DomainAddressViewSet, to avoid duplicating glean event logging.

Explain why row ID isn't sufficent for uniqueness.

51e20a1

Because RelayAddress and DomainAddress are in different tables, a row ID might be used between them. Use a leading letter "R" or "D" to distinguish between the two.

Adjust descriptions

700454a

Only send Glean events for emails blocked by mask

985e3a3

Send Glean events for blocked emails due to mask setting, such as block all or block promotional. Emit non-Glean logs for other reasons emails are blocked.

Remove can_retry from email_blocked event

899f2fd

Remove draft comment from relay-server-metrics

f5beadd

Add name for error log

7dac89a

Omit data in email_dropped log when user opts-out

9714417

Add comments for payload value checks

f0a2377

Skip coverage for fail-only code

84c7e56

Fix test of _get_keys_from_headers

4dd457c

Coverage showed the (incorrect) metrics tests was never executed.

Rename to assert_log_reply_email_dropped

d305176

Rename to assert_log_incoming_email_dropped

1c2dc20

Split up RequestData tests

37e8b6c

jwhitlock force-pushed the add-glean-mpp-3352 branch from 1a9d4e0 to 37e8b6c Compare March 8, 2024 17:03

jwhitlock added this pull request to the merge queue Mar 8, 2024

Merged via the queue into main with commit a942ac3 Mar 8, 2024
28 checks passed

jwhitlock deleted the add-glean-mpp-3352 branch March 8, 2024 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPP-3352: Add first Glean metrics to measure email mask usage #4452

MPP-3352: Add first Glean metrics to measure email mask usage #4452

jwhitlock commented Feb 28, 2024

groovecoder left a comment

groovecoder Mar 4, 2024

jwhitlock Mar 5, 2024

groovecoder Mar 6, 2024

groovecoder Mar 4, 2024

jwhitlock Mar 5, 2024

travis79 Mar 6, 2024

groovecoder Mar 4, 2024

jwhitlock Mar 5, 2024 •

edited

Loading

jwhitlock commented Mar 6, 2024 •

edited

Loading

groovecoder Mar 7, 2024

jwhitlock Mar 8, 2024

groovecoder Mar 7, 2024

jwhitlock Mar 8, 2024 •

edited

Loading

groovecoder Mar 7, 2024

jwhitlock Mar 8, 2024

groovecoder Mar 7, 2024

jwhitlock Mar 8, 2024

	def assert_my_log_email_dropped(
	def assert_reply_log_email_dropped(

	def assert_my_log_email_dropped(
	def assert_bounce_log_email_dropped(

MPP-3352: Add first Glean metrics to measure email mask usage #4452

MPP-3352: Add first Glean metrics to measure email mask usage #4452

Conversation

jwhitlock commented Feb 28, 2024

Glean Data

How to test:

groovecoder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwhitlock Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

jwhitlock commented Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwhitlock Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwhitlock Mar 5, 2024 •

edited

Loading

jwhitlock commented Mar 6, 2024 •

edited

Loading

jwhitlock Mar 8, 2024 •

edited

Loading