Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPP-3932: Disable masks on complaints, round 2 #5115

Merged
merged 37 commits into from
Oct 24, 2024

Conversation

jwhitlock
Copy link
Member

@jwhitlock jwhitlock commented Oct 16, 2024

This replaces #5114, and goes a lot farther.

This rewrites _handle_complaint, moving most of the logic into three new helper functions:

  • _get_complaint_data gets the relevant data out of the complaint notification, with lots of checks and logging if the message is not what we're expecting.
  • _gather_complainers cross-checks the sender and receiver data against the database, getting the related users and masks (RelayAddress, DomainAddress). It also checks for the developer "simulate_complaint" mode, and swaps the real user email back in.
  • _reduce_future_complaints takes the actions to reduce future complaints.
    • On the first complaint, turns on the auto_block_spam flag to drop emails that AWS SES identifies as spam (same as before)
    • On the second complaint, it turns the associated mask to block emails and sends an email to the user. This has been altered to look at the From: header data, which has the mask, rather than the source field, which is our reply address.
    • On future complaints (how?), it does nothing but takes note that it did nothing (new)

This leaves _handle_complaint to log the complaint in developer mode, log what happened for each detected Relay user, and emit a counter.

I also extended the developer mode to correctly work for domain masks, added _get_address_if_exists, removed a mypy override, and fixed Heroku Redis... This PR can definitely be broken into smaller bits.

How to test:

The test suite has been expanded to cover the new code. There are lots of new tests.

I pushed this branch to Heroku for manual testing. There are two issues I've seen that are now fixed on Heroku:

  • Each forwarded email appears to result in two complaints - Misconfiguration of SES on dev server
  • In the email "Firefox Relay⁩ has deactivated one of your email masks", there's some extra ??? and the links do not work. - Fixed the HTML template

Here's the test process:

  • Deploy this branch
  • Send a test email to a mask. It should be forwarded.
  • Load the Papertrail app in Heroku. Filter on program:"app/worker.1". You'll see logs related to your email.
  • Add the developer_mode flag to your Heroku user. You can run a Heroku bash shell:
    heroku run --app fx-private-relay bash
    then run
    ./manage.py waffle_flag developer_mode --append --user "<[email protected]>"
  • Create a random mask, give it the description "DEV:simulate_complaint" (no spaces)
  • Send an email to that mask
    • The email is not delivered to you, but instead to the AWS SES Complaint Simulator address
    • You should see several logs with "msg": "_handle_received: developer_mode". The "notification_gza85" fields can be combined and passed to emails.utils.decode_dict_gza85 to get the original delivery notification.
    • After a few more messages, you should see a log with "msg": "_handle_complaint: developer mode". The "notification_gza85" fields can be combined and passed to emails.utils.decode_dict_gza85 to get the original complaint notification.
    • The next log is "msg": "complaint_notification", and has the standard log for a complaint. The "relay_action" should be "auto_block_spam"
    • There will be other logs as well, but these are the important ones.
  • Send a second email to that mask
    • The logs repeat, with "relay_action" as no_action since auto_block_spam is already on.
  • Add the disable_mask_on_complaint flag to your Heroku user. You can run a Heroku bash shell:
    heroku run --app fx-private-relay bash
    then run
    ./manage.py waffle_flag disable_mask_on_complaint --append --user "<[email protected]>"
  • Send a third email to that mask.
    • The logs repeat. The "relay_action" is "disable_mask" (and then no_action if there is a second complaint)
  • Check your email after a minute
    • You should get the email "Firefox Relay⁩ has deactivated one of your email masks." to your real address. The links should go to the email dashboard.
  • Go to the Email Dashboard, refreshing if needed (such as going to FAQ and back to the dashboard)
    • The mask is now set to Blocking

@jwhitlock jwhitlock requested a review from groovecoder October 16, 2024 23:44
Copy link
Member Author

@jwhitlock jwhitlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa this is big for just 3 files. You should probably skim emails/tests/views_tests.py, and focus on emails/views.py. Here's some comments that might help.

emails/tests/views_tests.py Show resolved Hide resolved
emails/tests/views_tests.py Outdated Show resolved Hide resolved
emails/tests/views_tests.py Show resolved Hide resolved
emails/tests/views_tests.py Outdated Show resolved Hide resolved
emails/tests/views_tests.py Show resolved Hide resolved
emails/views.py Show resolved Hide resolved
emails/views.py Show resolved Hide resolved
emails/views.py Show resolved Hide resolved
emails/views.py Show resolved Hide resolved
@@ -58,7 +58,6 @@ module = [
"django_ftl",
"django_ftl.bundles",
"google.*",
"google_measurement_protocol",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only line I meant to change in here. The rest were done by my editor's TOML formatter.

@jwhitlock
Copy link
Member Author

There are indeed 2 complaints for each time the simulator is used on dev. For one pair, the first has a timestamp and arrivalDate of 2024-10-16T23:15:27.100Z. For the second, the timestamp was "timestamp": "2024-10-16T23:15:27.000Z (0.1 seconds earlier) and arrivalDate was 2024-10-16T23:15:27.167Z (0.06 seconds later). The other differences

  • The first had eventType: Complaint and the second had notificationType: Complaint
  • The second had mail.sourceIP and mail.callerIdentity, the first did not
  • The first had mail.commonHeaders.messageId, the second did not
  • The first had mail.tags with several items list ses:source-tls-version, the second did not.

I'm going to see if there is a similar pattern in production complaints. If not, this may be a dev misconfiguration, or a quirk of the simulator. It looks like we might want to ignore either eventType or notificationType.

@jwhitlock
Copy link
Member Author

No, production does not have this pattern.

Digging in, there are two ways to monitor complaints:

  • Set up event publishing with a configuration set, and specify this configuration set when sending email. This will send "eventType": "Complaint" objects to your configured destination (SNS -> SQS for us)
  • Set up a domain identity for your email sending domain, and configure Email feedback forwarding. This will send "notificationType": "Complaint" objects to your configured destination (SNS -> SQS for us).

We have both configured on the development server. Looking at my notes, it is possible I enabled this, as I was looking into complaint formats in May 2022. I turned off email feedback forwarding of complaints for our dev domain identity. This will also turn off duplicate bounce notifications and the unhandled delivery notifications.

On stage, we have not configured the SNS topics for the email feedback forwarding, no changes there.

Copy link
Member

@groovecoder groovecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great update! I haven't looked at the test file yet, and it looks like you updated it again since I started the review, but here's my first few comments.

emails/views.py Outdated
if key in source:
return source[key], True
logger.error(
"_get_complaint_data: Unexpected message",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): this seems more like "unexpected message format" - i.e., a message that's missing a key we expect, rather than an unexpected message?

Suggested change
"_get_complaint_data: Unexpected message",
"_get_complaint_data: Unexpected message format",

Comment on lines +1865 to +1874
# Only present for feedback reports
user_agent = complaint.get("userAgent", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (non-blocking): does this mean we can use the userAgent field or value to know whether the end user or their email service provider marked the email as spam?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. The agent is the mail provider's agent. All complaints from yahoo.com, ymail.com, yahoo.com.mx, yahoo.de, etc. have agent Yahoo!-Mail-Feedback/2.0. A lot use ReturnPathFBL/2.0. Microsoft leaves it blank. See the link "Relay Bounces and Complaints v2" on the Relay Engineering page on Confluence.


T = TypeVar("T")

def get_or_log(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment (non-blocking): This felt almost generic enough to try to turn into a more general re-usable function. But after a few minutes with ChatGPT, I gave up trying to refactor it.

emails/views.py Outdated
for email_address, extra_data in complaint_data.complained_recipients:
local, domain = email_address.split("@", 1)

# For developer mode complaint simulation, swap with developer's email
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (non-blocking): this code seems specific to AWS complaint simulator, but not necessarily to the "developer mode" controlled by the mask label?

If it is, can we link from here to a doc/comment about the "developer mode"?

Suggested change
# For developer mode complaint simulation, swap with developer's email
# For AWS complaint simulation, swap with developer's email

Copy link
Member

@groovecoder groovecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments are "dupes" of 4 main issues:

suggestion (non-blocking): When I woke up this morning I had to remember what the new "developer mode" did. It would be great to copy some of the explanation from that PR into a more permanent doc or comment in the code-base somewhere.

suggestion (non-blocking): Similarly, a lot of the doc-blocks for the new tests assume high-context knowledge, which we may not have next time we have to touch this code. So I made some suggestions for updating the doc-blocks to add more context.

correction (blocking): A couple of the test doc-blocks have copy-pasted incorrect text.

question (blocking): It looks like, unlike the code in process_emails_from_sqs, this complaint-handling code is run in the context of an HTTP callback request from AWS SNS. What does SNS do if it receives a 404 from Relay? Answer

privaterelay/settings.py Outdated Show resolved Hide resolved

@override_flag("developer_mode", active=True)
def test_complaint_developer_mode_missing_mask(self):
"""If the simulator email can not be turned into a mask, it is not changed."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sugguestion (non-blocking): add a little more detail here? I don't understand what "simulator email can not be turned into a mask" means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to move these "cover all the edge cases" tests to a new class, so ComplaintHandlingTest is more like integration tests.


@override_flag("developer_mode", active=True)
def test_complaint_developer_mode_domain_address(self):
"""If the simulator email can not be turned into a mask, it is not changed."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same sugguestion (non-blocking): add a little more detail here? I don't understand what "simulator email can not be turned into a mask" means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm moving these tests of _gather_complainers out of ComplaintHandlingTest into a focused test case.

assert rec2.msg == "complaint_notification"

def test_complaint_no_complained_recipients_error_logged(self):
"""Without complained recipients, an error is logged, but matches on From"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): I had to read thru the code and test fixture to understand what this test was doing.

Suggested change
"""Without complained recipients, an error is logged, but matches on From"""
"""
Without complainedRecipients, we can still match the user on From:,
perform the appropriate action, and log an error
"""

)

def test_complaint_empty_complained_recipients_error_logged(self):
"""With empty complained recipients, an error is logged, but matches on From"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same suggestion (non-blocking):

Suggested change
"""With empty complained recipients, an error is logged, but matches on From"""
"""
With complainedRecipients empty, we can still match the user on From:,
perform the appropriate action, and log an error
"""

emails/tests/views_tests.py Show resolved Hide resolved
emails/tests/views_tests.py Outdated Show resolved Hide resolved
emails/tests/views_tests.py Outdated Show resolved Hide resolved
emails/tests/views_tests.py Outdated Show resolved Hide resolved
emails/tests/views_tests.py Outdated Show resolved Hide resolved
The "source" was the mask, but AWS SES uses the generic Reply-To address.
The messageId is set, probably by SES when sending. The null values in
commonHeaders are not present.

These changes break the mask disable test, which uses the "source". It
is marked as an expected failure for now.
If _get_address is called with create=False, pass create=False when
calling _get_domain_address.

When create=False, _get_domain_address logs less for bad domains,
subdomains, etc. It raises DomainAddress.DoesNotExist().

The new _get_address_if_exists calls _get_address(create=False), catches
the expected exceptions, and returns None.

Add tests for existing and new behavior.
Rewrite _handle_complaint to merge user collection and mask processing.
This changes how complaints are handled:
* On the first complaint, the auto_block_spam flag is set
* On the second complaint, the mask is blocked
* If the complaint comes from the AWS SES complaint simulator, swap in
  the developer's email address for the simulator email address
* Log an error if the JSON message is missing mail.commonHeaders.from
* Log what action was taken: auto_block_spam, disable_mask, no_action
* Log if a mask was found when disabling the mask
Rather than initializing waffle flags each time they are used,
initialize once in _sns_message.
@jwhitlock jwhitlock force-pushed the fix-dev-complaint-log-mpp-3932 branch from 1bd8030 to a0b19f9 Compare October 21, 2024 20:00
@jwhitlock
Copy link
Member Author

@groovecoder, this should be ready for recheck.

Main changes:

  • Added docs/developer_mode.md
  • Use the metrics ID in the complaint simulator email, like [email protected]. Added _get_mask_by_metrics_id function and GetMaskByMetricsIdTest test case.
  • Added GetComplaintDataTest test case to test _get_complaint_data, and GatherComplainersTest test case to test _gather_complainers. These now have "happy path" tests, as well as the edge case tests to get full coverage. This makes these tests more focused and the test data easier to understand. I've moved the similar tests out of ComplaintHandlingTest, leaving it to test the most important end-to-end test cases. I could do something similar for _reduce_further_complaints, but that seemed unneeded.

Smaller changes:

  • Added a log_group_id to the developer_mode log, to make it a bit easier to get all the logs for a long notification.
  • Updated docstrings and comments as suggested
  • Rebase on main

@jwhitlock jwhitlock requested a review from groovecoder October 21, 2024 20:09
Copy link
Member

@groovecoder groovecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 1 small change to the doc file name to make it easier to find for future ENGRs (including us).

@@ -1325,6 +1414,532 @@ def test_build_disabled_mask_for_spam_email(self):
)


class GetComplaintDataTest(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: this is a great test that spells out all the scenarios very clearly. thanks!

assert getattr(err_log, "found_keys") == "complainedRecipients"


class GatherComplainersTest(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: another clear test case. I was able to quickly scan the docstrings to understand each test.

emails/views.py Show resolved Hide resolved
@jwhitlock jwhitlock requested a review from groovecoder October 24, 2024 16:41
@groovecoder groovecoder added this pull request to the merge queue Oct 24, 2024
Merged via the queue into main with commit 9cd189e Oct 24, 2024
29 checks passed
@groovecoder groovecoder deleted the fix-dev-complaint-log-mpp-3932 branch October 24, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants