-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPP-3352: Add first Glean metrics to measure email mask usage #4452
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked thru all the code; still need to test and spot-check. Biggest blocker is that we should NOT send a glean event when the email is blocked because of an error. We should only send a glean even when the email is blocked because of a user action.
@property | ||
def plan(self) -> Literal["free", "email", "phone", "bundle"]: | ||
"""The user's Relay plan as a string.""" | ||
if self.has_premium: | ||
if self.has_phone: | ||
return "bundle" if self.has_vpn else "phone" | ||
else: | ||
return "email" | ||
else: | ||
return "free" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (non-blocking): should we consider using @cached_property
for this? I seem to remember that has_premium
or has_phone
can be quite slow when they access self.fxa
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is doing very little that would justify caching. If has_premium
and has_phone
are slow, should they be the cached properties? Or fxa
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but I think caching those caused some other stale data issues last time we tried it. I thought maybe caching at this level wouldn't cause those issues?
data_reviews: | ||
- https://bugzilla.mozilla.org/show_bug.cgi?id=1882565 | ||
notification_emails: | ||
- [email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): [email protected]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Glean documents have no guidance, so I reached out to internal Slack channel. According to @perrymcmanis144, we have to have a single person for metrics that do not expire. We could add a mailing list, but not replace the single person email.
However, [email protected]
doesn't look like a good choice to me. There are many people on that list, including people no longer on the project. Once I have some experience with the kinds of emails sent to this address, we could add a maintained list with the people who should get notices and can do something about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the drive-by, I just wanted to add some additional context to what Perry shared with you.
The Glean docs lack guidance here because this is a Data Stewarding concern (see the wiki on Data Collection for more info). That being said, I'll file a bug in Glean to try and add some breadcrumbs to this information to make it easier to understand the expectations here.
Permanent data-collection policy from Trust & Legal currently requires an IC to be responsible for the collection, in order to audit it from time to time to determine that it is still useful and needed. Temporary collections that expire don't run the same risk of indefinitely collecting some data that might not be useful or problematic.
As a side-note, many teams/project create a specific group email to serve as a backup for an IC bearing this responsibility (should the IC leave or not be available, etc.).
"""Check and return parts of the Glean payload that vary over test runs.""" | ||
event_ts_ms = payload["events"][0]["timestamp"] | ||
event_time = datetime.fromtimestamp(event_ts_ms / 1000.0) | ||
assert 0 < (datetime.now() - event_time).total_seconds() < 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (non-blocking): why do we assert that the event time must have been less than 0.5 seconds ago?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp should be in milliseconds. I'm trying to find a way to say "if we calculate the timestamp wrong, then it will be off by a factor of 1,000". This only tests the x1000 bigger, however, not the x1000 smaller. Maybe it is not worth it.
bf15260
to
f84a5ae
Compare
@groovecoder, this is ready for re-review. Changes since the last review:
I've opened two issues for follow-on work: |
assert drop_log is None, "duplicate email_dropped log entry" | ||
drop_log = record | ||
assert drop_log is not None, "email_dropped log entry not found." | ||
assert drop_log.levelno == logging.INFO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (non-blocking): This looks like emails are logged as "dropped" when there's an error, so should they be logged at logging.ERROR
level instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only use ERROR
logging if I want SREs to monitor and to get tracking Sentry issues. I didn't want that for these, but do you?
emails/tests/views_tests.py
Outdated
@@ -724,6 +810,16 @@ def successful_reply_test_implementation( | |||
assert (last_en := self.relay_address.user.profile.last_engagement) is not None | |||
assert last_en > self.pre_reply_last_engagement | |||
|
|||
def assert_my_log_email_dropped( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quibble:
def assert_my_log_email_dropped( | |
def assert_reply_log_email_dropped( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, thanks! I went with assert_log_reply_email_dropped
emails/tests/views_tests.py
Outdated
event_time=timestamp, | ||
) | ||
|
||
def assert_my_log_email_dropped( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quibble:
def assert_my_log_email_dropped( | |
def assert_bounce_log_email_dropped( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's more than bounce tests, so I went with assert_log_incoming_email_dropped
privaterelay/tests/glean_tests.py
Outdated
}, | ||
) | ||
request_data = RequestData.from_request(request) | ||
assert request_data.user_agent == "Mozilla/5.0 Firefox/125.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quibble (non-blocking): test name says this test is for ip
- do we need to assert on the user agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for just the user agent, and dropped the user agent from the IP tests.
The initial metrics for measuring mask usage are in telemetry/glean/relay-server-metrics.yaml. This new folder can be used for front-end Glean metrics and other metrics specifications. glean_parser is used to generate privaterelay/glean/server_events.py and the empty __init__.py. The generated file is post-processed with black. A function in .circleci/python_job.bash has the generation commands and the post-processing commands. This is used by a new job to validate in CircleCI that the generated file was not hand-modified. It can also be used to regenerate the file locally (tested on Linux and macOS).
These should eventually be stored on the profile, but for now these properties provide the known subscription status for the user.
RelayGleanLogger extends the EventsServerEventLogger to emit records as logs instead of print() statements. It takes data as User and *Address models, and extracts the data needed for the .record_* methods.
Pass the related user to the test helper create_expected_glean_event, so that the user-specific values such as fxa_id and date_joined_relay can be extracted in the helper rather than each test function.
AddressViewSet is a parent class for RelayAddressViewSet and DomainAddressViewSet, to avoid duplicating glean event logging.
Because RelayAddress and DomainAddress are in different tables, a row ID might be used between them. Use a leading letter "R" or "D" to distinguish between the two.
Send Glean events for blocked emails due to mask setting, such as block all or block promotional. Emit non-Glean logs for other reasons emails are blocked.
Coverage showed the (incorrect) metrics tests was never executed.
1a9d4e0
to
37e8b6c
Compare
Add Glean metrics for measuring P0 email mask usage. This includes email mask creation, label updates, and deletion, as well as receiving emails (forwarded or blocked) through those masks.
This is tracked as epic MPP-3352. The related bug for data review is bug 1882565.
Some parts of this PR:
glean_parser
, to generate Python code from Glean YAML, anddjango-ipware
, to extract the user's IP address from headerstelemetry/glean/relay-server-metrics.yaml
. In the future, this folder can hold front-end or other application metrics, and specification for other telemetry methods.privaterelay/glean/server_events.py
privaterelay/glean_interface.py
to simplify data extraction and opt-out checking, and to publish events as log entries formatted with MozLog. Add tests for the emitted logs.Profile
properties to determine the Relay user's subscription from current data. In the future, we may query Mozilla Accounts for more detailed information on the user's plan term (monthly or yearly).Glean Data
This adds five Glean events that have been identified as critical for creating a user engagement model for the email mask feature:
email_mask.created
- An email mask is createdemail_mask.label_updated
- An email mask's label is updatedemail_mask.deleted
- An email mask is deletedemail.forwarded
- An email to or from an email mask is forwardedemail.blocked
- An email to or from an email mask is not forwardedSome data is sent with each event type, identified with a
*
below. Others are specific to event types.*
#client_id
*
#fxa_id
*
#platform
*
#n_random_masks
*
#n_domain_masks
*
#n_deleted_random_masks
*
#n_deleted_domain_masks
*
#date_joined_relay
*
#premium_status
*
#date_joined_premium
premium_status
subscription, seconds since epoch, -1 if not subscribed*
#has_extension
*
#date_got_extension
*
#mask_id
*
#is_random_mask
email_mask.created
#created_by_api
email_mask.created
#has_website
email.*
#is_reply
email.blocked
#reason
email.blocked
#can_retry
How to test:
bash .circleci/python_job.bash run build_glean
git
does not show any changed filesbash .circleci/python_job.bash run check_glean
✓ Files are identical
glean-server-event
log, followed by arequest.summary
log, entry like:glean-server-event
log, followed by arequest.summary
log, like:glean-server-event
log, just therequest.summary
log, like: