Closed Bug 1423439 Opened 3 years ago Closed 3 years ago

Mailsploit: Strip highly unusual characters from the email address

Categories

(MailNews Core :: MIME, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: BenB, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

This is part of bug 1423430, and a continuation of bug 1423437. See there for background.

4. Strip highly unusual characters in email addresses that hardly any real email address uses. In effect, create a whitelist of allowed characters in the email address, and strip everything else.

Allowed should be:
* alpha characters a-z and A-Z
* numeric characters 0-9
* ".", "-", "_", "+"

This is mostly about the local part, because the domain part should already be very strict. I'm proposing to make the local part just as strict, and allow only those addresses that are widely used. Anything that appears in less than 0.0001% of legit emails (1 in a million) should be considered unusual and unsupported.

5. Remove support for international domains

This proposal is likely to be controversial, because it will unsupport international characters in email addresses. I do think they were a bad idea to start with, and they caused all kinds of spoofing and compatibility and usability problems. Their owners cannot rely on them working everywhere anyway, because a lot of software does not support them.

We should use the plain ASCII fallback representation of these domains, as if we had never heard of international domain names.

Please note that I suggest this only for the email address (both local and domain part), not for the display/real name.
I definitively support your request to sanitize the local part of an email address, but I think your request is going too far. Thunderbird should be conform to the RFC, which it will not be following your request. The corresponding RFCs are RFC 5322 and maybe also RFC 5321. Wikipedia has a short summary on the allowed characters in the local part under [1].

With your request "4." you kill some features like comments (in round brackets) and others, which is undesirable.

For your request "5." as a side effect of "4." I would vote strongly against: If there are security issues in the handling of IDN they should be addressed directly instead of removing IDN as a whole. Compatibility issues are there exactly because some developers of software do not follow the standards (leaving out problems of poorly formulated standards). TB should not mimic such bad practice and where ever follow the standards as close as it's possible. There are certainly users out there who rely on IDN.

[1] https://en.wikipedia.org/wiki/Email_address#Local-part
Group: mail-core-security
Group: mail-core-security
To 4., we might consider using a blacklist instead of a whitelist. We should achieve the following:
* Remove any control and/or non-printable characters. Most of them are forbidden by the spec anyways.
* Remove any characters that may be causing harm downstream. e.g "\", """ (double quote) and similar. No sane person has these in their email address. If they do, they wouldn't get much email, because most software probably would bark at them.
* (Possibly) Avoid homographs, compare https://wiki.mozilla.org/IDN_Display_Algorithm
I'm going to mark this WONTFIX. We need to stay RFC compliant, and also, IDN support should be improved not removed.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Magnus, I think you misunderstood.

1. The RFC does forbid control characters.
2. I do not suggest to remove IDN support
3. The RFCs are 30 years old and allow cases that today are unnecessary and not safe anymore.

Please reconsider.
(In reply to Ben Bucksch (:BenB) from comment #4)
> Magnus, I think you misunderstood.
> 
> 1. The RFC does forbid control characters.

Yes, I'm not opposed to stripping acsii control chars.

> 2. I do not suggest to remove IDN support

You wrote "5. Remove support for international domains"

> 3. The RFCs are 30 years old and allow cases that today are unnecessary and
> not safe anymore.

rfc 5322 is from 2008. I don't think people ever had strange looking addresses. The rfc just lists what's not allowed and tries not to limit potential use cases, just like we have to do.
You need to log in before you can comment on or make changes to this bug.