Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve email validation regexp #4341

Closed
wants to merge 2 commits into from
Closed

Improve email validation regexp #4341

wants to merge 2 commits into from

Conversation

johnpitchko
Copy link
Contributor

@johnpitchko johnpitchko commented Apr 14, 2022

Description

URI::MailTo::EMAIL_REGEXP matches email addresses based on the specification in the original email RFC. Its pretty broad and not a good reflection of email addresses used in the real world.

For example, URI::MailTo::EMAIL_REGEXP will return true for addresses such as test@test:

irb(main):004:0> 'test@test'.match? URI::MailTo::EMAIL_REGEXP
=> true

While that is technically a valid address according to the RFC, in practicality nobody uses an address in this style.

This patch better matches the significant majority of email addresses in reality. Tightening up the regex used for validation should prevent customers from inputting incorrect (but valid) email addresses.

Checklist:

  • I have followed Pull Request guidelines
  • I have added a detailed description into each commit message
  • I have updated Guides and README accordingly to this change (if needed)
  • I have added tests to cover this change (if needed)
  • I have attached screenshots to this PR for visual changes (if needed)

@waiting-for-dev
Copy link
Contributor

Thanks for your contribution, @johnpitchko. I'm only concerned that we had security issues in the past related to a custom email regexp (see ff33a8b). It's always safer to use a battle-tested regexp.

@kennyadsl
Copy link
Member

@waiting-for-dev you are right. What if we make this configurable and use some documentation to explain:

  1. why we are on the safe side with the default.
  2. which may be other possible values that have sense for this configuration, eg. the one @johnpitchko proposed.

What do you think?

@johnpitchko johnpitchko reopened this Apr 16, 2022
@johnpitchko
Copy link
Contributor Author

Apologies all! I really mucked up this PR. I didn't realize that I opened it against v2.11 branch rather than master. Would it be cleaner to close and re-open against master? Or keep this one w/ the comments?

record.errors.add(attribute, :invalid, { value: value }.merge!(options))
end
end

def valid_email?(value)
EMAIL_REGEXP.match?(value) && EMAIL_DOMAIN_REGEXP.match?(value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the key issue is that URI::MailTo::EMAIL_REGEXP doesn't validate that addresses end with a domain suffix. Perhaps you can maintain the protection of the original regex and just add an additional validation that the address includes a domain suffix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer using a single step for the validation, it's just a matter to find a good and battle testes regexp 🙂

@waiting-for-dev
Copy link
Contributor

What if we make this configurable and use some documentation to explain:

I'm ok with that. We could also use by default or recommend devise's one. From where did you take yours, @johnpitchko? As long as it's been exposed a lot in the past, I think it would be ok.

@johnpitchko
Copy link
Contributor Author

johnpitchko commented Apr 18, 2022

What if we make this configurable and use some documentation to explain:

I'm ok with that. We could also use by default or recommend devise's one. From where did you take yours, @johnpitchko? As long as it's been exposed a lot in the past, I think it would be ok.

The original one? Why StackOverflow of course.

There are several at that link, but not all address the original issue which was validating address that did not include a domain suffix.

@waiting-for-dev
Copy link
Contributor

Why StackOverflow of course

Haha. Got it. But, could you be more precise? I mean, if that's one broadly used, it would be ok. But if it's just from an answer without references, I'd prefer not to use it by default. Anyway, as @kennyadsl said, it'd be nice to make it something configurable.

@waiting-for-dev
Copy link
Contributor

Hey @johnpitchko, would you still like to work on this PR? Is it still a use case you need to cover?

@johnpitchko
Copy link
Contributor Author

Hey @johnpitchko, would you still like to work on this PR? Is it still a use case you need to cover?

Hello @waiting-for-dev my apologies! Yes we still have this use case but I have not been able to find a regex for email addresses that meet your criteria.

If you have some guidance that you can share, please share and I will commit to making progress on this.

@waiting-for-dev
Copy link
Contributor

Thanks for getting back, @johnpitchko. Yeah, that's a topic as there's not such a thing as universally valid email addresses. We can say that the validation only happens if the user gets a confirmation email. Providers can ignore RFC documentation and implement whatever they want in practical terms.

For that reason, I'd opt for simplicity and security and keep using the one provided in the Ruby distribution. However, as suggested above, I think it'd be a great idea to leave users the option to configure it to something else.

What do you think, @johnpitchko? Would you still be interested in working on it?

@waiting-for-dev
Copy link
Contributor

After #4341 has been merged, I think we can close it as we're giving enough flexibility to the user. Thanks for looking into it, @johnpitchko!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants