Sender Email Address Spoofing by extraneous white space in display name
Categories
(Thunderbird :: Message Reader UI, defect)
Tracking
(thunderbird_esr68 fixed, thunderbird69 fixed, thunderbird70 fixed)
People
(Reporter: 0xsobky, Assigned: mkmelin)
References
(Regressed 1 open bug)
Details
(Keywords: csectype-spoof, sec-low)
Attachments
(6 files, 7 obsolete files)
38.22 KB,
image/png
|
Details | |
4.39 KB,
text/plain
|
Details | |
2.86 KB,
patch
|
jorgk-bmo
:
review+
|
Details | Diff | Splinter Review |
10.47 KB,
patch
|
jorgk-bmo
:
review+
jorgk-bmo
:
approval-comm-beta+
jorgk-bmo
:
approval-comm-esr68+
|
Details | Diff | Splinter Review |
84.55 KB,
image/png
|
Details | |
4.17 KB,
text/plain
|
Details |
Assignee | ||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
Updated•6 years ago
|
Comment 5•6 years ago
|
||
Updated•6 years ago
|
Comment 6•6 years ago
|
||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 9•6 years ago
|
||
Assignee | ||
Comment 10•6 years ago
|
||
Assignee | ||
Comment 11•6 years ago
|
||
Reporter | ||
Comment 12•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Comment 15•6 years ago
|
||
I think we cannot use the suggestion to limit the display name to ASCII, because we want to show international characters in sender names, right?
We could detect "name" subsets that attempt to look like an email address. I think the name field wasn't intended to be used for the characters '<', '>', '@', and maybe it's acceptable to simply display them differently, to avoid confusion.
change < to {
change > to }
change @ to /
Alternatively, we could decide to not show such misleading strings at all, and show it as
(Misleading characters in sender name) <sender@from.com>
Comment 16•6 years ago
|
||
Thoughts?
Assignee | ||
Comment 17•6 years ago
|
||
I don't think replacing is the way to go. There are many potential use cases where special chars can be used (and @ is inparticular rather common, e.g. for mailing lists).
What I've been thinking we should do is gather the specials we can come up with like the patch attached ealier, but also have a hard cap on length we're showing in the UI, so that it's normally not possible to push out of view.
Comment 18•6 years ago
|
||
0xsobky, bug 1550431 is another spoffing issue related to email headers.
Would you allow the reporter of bug 1550431 to be cc'ed to this bug?
Comment 19•6 years ago
|
||
It you don't want to replace it, alternatively we could use a different method to clarify to the user that the @ isn't part of the real sender/to email address.
For example, the email address could be shown bold, the name parts not-bold.
The order could be changed: Always show the email address first, then add "Name: ", and only afterwards append the display name.
Comment 20•6 years ago
|
||
Magnus, if you find an example where a mailing list legitimately includes the @ character in their display name, or gets automatically by a daemon as part of the display name, please show me.
Assignee | ||
Comment 21•6 years ago
|
||
I don't seem to have a real example handy, but I think commonly especially by ticketing systems (iirc github also, at least used to). Like "John <john@example> via mailing list Foo Bar" <notifications@example.com>)
Assignee | ||
Comment 22•6 years ago
|
||
Email just has too many unexpected real usages that are real use cases.
Why not the hard cap of something like 60 chars for display name during display, like comment 16?
Comment 23•6 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #21)
I don't seem to have a real example handy, but I think commonly especially by ticketing systems (iirc github also, at least used to). Like "John <john@example> via mailing list Foo Bar" <notifications@example.com>)
I searched my archive of email.
I found a few examples of the "via mailing list" scenario, but it never included a full email address.
Rather, it was worded like this:
From: someone lastname via dev-security <dev-security@lists.mozilla.org>
In other emails, more than one @ was found only, if the sender name is set identical to the sender's email address.
Comment 24•6 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #22)
Email just has too many unexpected real usages that are real use cases.
Why not the hard cap of something like 60 chars for display name during display, like comment 16?
Even if the From is shown as ...
Alice <alice@mozilla.com> - Magnus <magnus@thunderbird.net>
... it can still be considered confusing. When you're in a hurry, you might not read through the end. You might notice the important part that you're expecting at the beginning, and you might stop parsing/reading after it.
Comment 25•6 years ago
•
|
||
(In reply to Kai Engert (:kaie:) from comment #20)
Magnus, if you find an example where a mailing list legitimately includes the @ character in their
display name, or gets automatically by a daemon as part of the display name, please show me.
Yahoo's Barcelona Freecycle list has hundreds/thousands of these:
From: "Antonia Palas APalas007@gmail.com [Barcelona-Freecycle]" <Barcelona-Freecycle-noreply@yahoogroups.com>
EDIT: edited original address (which was of course publicly visible on the Yahoo list)
Is that the format you're after?
Comment 26•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #25)
Is that the format you're after?
Yes, thanks. So it's confirmed, the @ sign is indeed used in the name part of email recipients...
Comment 27•6 years ago
|
||
I've skimmed through RFCs. Below are excerpts of definitions from rfc 5322:
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
display-name = phrase
phrase = 1*word / obs-phrase
word = atom / quoted-string
atext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; characters not including
"$" / "%" / ; specials. Used for atoms.
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
atom = [CFWS] 1*atext [CFWS]
specials = "(" / ")" / ; Special characters that do
"<" / ">" / ; not appear in atext
"[" / "]" /
":" / ";" /
"@" / "\" /
"," / "." /
DQUOTE
rfc 2047 defines how to encode unicode in mail headers, like in the spoof example attached to this bug. A relevant excerpt is:
(3) As a replacement for a 'word' entity within a 'phrase', for example,
one that precedes an address in a From, To, or Cc header. The ABNF
definition for 'phrase' from RFC 822 thus becomes:
phrase = 1*( encoded-word / word )
In this case the set of characters that may be used in a "Q"-encoded
'encoded-word' is restricted to: <upper and lower case ASCII
letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_"
(underscore, ASCII 95.)>. An 'encoded-word' that appears within a
'phrase' MUST be separated from any adjacent 'word', 'text' or
'special' by 'linear-white-space'.
Comment 28•6 years ago
|
||
(In reply to Kai Engert (:kaie:) from comment #26)
Yes, thanks. So it's confirmed, the @ sign is indeed used in the name part of email recipients...
That was never in doubt, see "From: "Bugzilla@Mozilla" <bugzilla-daemon@mozilla.org>". But you wanted it from a mailing list. Prior discussion about @ in display names (and people who wanted to remove that) in bug 1423440.
Comment 29•6 years ago
|
||
Thanks Jörg. I had missed that even bugzilla uses it in inside display-name...
It's interesting that all examples don't display the '<' and '>' characters.
Could we strip those from the display name, and never show them?
Assignee | ||
Comment 30•6 years ago
|
||
This gets rid of a bunch of unwanted characters. While they are not illegal, they should be super unusual in practice, so I think we can get away with removing them.
Assignee | ||
Comment 31•6 years ago
|
||
The try run was pretty happy (fixed the small things it found). https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=4c2ae3db74786f6b403af4682d8b3853a1657229&selectedJob=254263226
Comment 32•6 years ago
|
||
Magnus, did you expect that your patch has any effect for the attached spoof.eml ? I don't see any difference.
Should the generated expressions get updated once a year? If yes, how could we remember to do that?
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 33•6 years ago
|
||
This patch wouldn't have any effect on spoof.eml since that was fixed already with the earlier patch.
I guess they should get updated. I've added a code comment about that.
Assignee | ||
Comment 34•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 35•6 years ago
•
|
||
Comment 36•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Comment 37•6 years ago
|
||
You're right that the spacings should be replaced with space.
But for the other classes... well I looked through them even more closely now. I think it's possible that the Mn class removes some things wanted, in some cases. However, some points:
- they may very well show as space if the font used on the system do not support them
- for the combiners usually the "already combined" character is used, and the combination way of writing it is quite rare (e.g. a\u0302 vs â, or a\u030A vs å).
So in summary, I would still remove them for the common good.
For the SAMWAT case you quote, I think that is also not something legit found in a from address. "used for writing Samvat era dates in Urdu" - https://scriptsource.org/cms/scripts/page.php?item_id=character_detail&key=U000604. We don't have so support stuff like that in the display name of emails, when there are clear drawbacks from doing so.
Comment 38•6 years ago
|
||
But you plan to cut into JS Mime's Token()
function. Is that only used for addressing headers or also the subject header? I added debug in it and only saw parts of addresses go past, no subjects. In this case being more restrictive it possibly OK.
Assignee | ||
Comment 39•6 years ago
|
||
Yes it's used for all headers, but I don't think that makes a significant difference. We want to allow as much as possible, but I'm concerned what is shown would vary from system to system and some of what's showing on my system will be blank on others, and vice versa.
Comment 40•6 years ago
|
||
Hmm, I'm not sure why you're saying that it's used for all headers since that would make the patch less viable. I rebuilt the MSF file and still didn't see any subjects going past with my debugging added in that function.
As you know, subjects can be encoded in raw UTF-8 and we really shouldn't mess with that too much, if at all.
Assignee | ||
Comment 41•6 years ago
|
||
By the time they get to new Token, it is always UTF-8, otherwise this whole thing would be pretty useless.
It is only used for addresses, dates, and content type (looking at it more closely).
Assignee | ||
Comment 42•6 years ago
|
||
Or well, decoded from utf-8 to the internal string representation.
Assignee | ||
Comment 43•6 years ago
|
||
Just removing some (well, a lot) for the Mn class
Comment 44•6 years ago
•
|
||
Assignee | ||
Comment 45•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #44)
Can you add some tests like this:
We already have that, though I added some more tabs now
Why are we maintaining tabs and not replace them with spaces?
We aren't but didn't want to mess with them in the token stage since they can have meaning, and of course, since for the replacing of their class, we probably want them to be " " and not "".
Why are we replacing \u0600-\u0605, they seem printable. As I said, I found
a webpage with , so why wouldn't that be valid in a display name?
Like I wrote in comment 37, that would be a rather strange display name. At any case, they are all formatting per definition, so we can certainly drop them for reasons of getting unformatted output too.
Can you explain in a way I can retrace the steps where
\uD804[\uDCBD\uDCCD]|\uD80D[\uDC30-\uDC38]|\uD82F[\uDCA0- \uDCA3]|\uD834[\uDD73-\uDD7A]|\uDB40[\uDC01\uDC20-\uDC7F]
came from ... and
add a comment explaining it.
There is a comment already, I was using the regexpu tool - https://mothereff.in/regexpu#input=/\p{Cf}/u&unicodePropertyEscape=1
EDIT: OK, it mentions "surrogate code point pairs", but a reference how
those are generated would be useful.
For the surrogate point pairs, I just looked up what is at e.g. https://www.fileformat.info/info/unicode/char/e0100/index.htm - I assume there is some other way to get it too, but this was the easiest I found.
Assignee | ||
Comment 46•6 years ago
|
||
Comment 47•6 years ago
|
||
Comment 48•6 years ago
•
|
||
Interesting, changing to displayName = displayName.replace(/\s+/g, " ").trim();
now makes all tabs disappear, when before only some disappeared:
https://bugzilla.mozilla.org/attachment.cgi?oldid=9078646&action=interdiff&newid=9079004&headers=1
There were already some tabs in the test case before you added more. Also, my test in comment #44 passed, so those tabs weren't collapsed with the other white-space before.
That's certainly a plus. I still think some additional test cases to "visualise" what's happening wouldn't be bad. I've adjusted my additions to the new behaviour:
+ // Collapse multiple "special" spaces like NBSP (\u00A0), EM space (\u2003), etc.
+ ["Friend \u00A0\u00A0\u2003 A <ws@example.com>",
+ [{name: "Friend A", email: "ws@example.com"}]],
+ // Remove tabs.
+ ["Friend \t \t A\t\tB <ws@example.com>",
+ [{name: "Friend A B", email: "ws@example.com"}]],
So with these added and \u0600-\u0605 removed since we're really looking to avoid white-space spoofing, not eliminate characters that we deem not useful, that would be an r+, assuming that the surrogates are correct, which I haven't checked yet.
Assignee | ||
Comment 49•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #47)
Comment on attachment 9079004 [details] [diff] [review]
bug1506587_whitespace-spoof-followup.patchReview of attachment 9079004 [details] [diff] [review]:
We don't have a test for multiple "strange" spaces to be collapsed like I
suggested: "Friend \u00A0\u00A0\u2003 A <ws@example.com>", at least I can't
see it. Extra testing doesn't hurt.I'm confused about the tabs now. Weren't you saying that you didn't want to
mess with them?
I didn't want to mess with them inside the Token, but prefer to clean them up later (like the patch does).
I still think removing the subtending marks \u0600-\u0605 has any benefit.
// Replace consecutive whitespace in the name with a single whitespace.
- displayName = displayName.replace(/\s+/g, " ").trim();
Why this change now? I already questioned this in comment #7 and you
answered in comment #11. What's changed now to change your mind?
I realized we wanted also a single non-space space (like tab or \n) converted to " " if it happens to be there.
Assignee | ||
Comment 50•6 years ago
|
||
Comment 51•6 years ago
•
|
||
Nice. I tweaked the comments a bit. Any reason some links had < > and others didn't?
EDIT: See my changes here: https://bugzilla.mozilla.org/attachment.cgi?oldid=9079129&action=interdiff&newid=9079136&headers=1
Assignee | ||
Comment 52•6 years ago
|
||
So reason really.
Comment 53•6 years ago
|
||
You mean: No reason. Anyway, consistent now and with added links for get all that crazy stuff.
Comment 54•6 years ago
|
||
https://hg.mozilla.org/comm-central/rev/3ed8b845bb24fb8a1b48c7551d1cad979bdfdc27
remove unwanted characters from headers. r=jorgk
Updated•6 years ago
|
Comment 55•6 years ago
|
||
Let's leave the target milestone at TB 65. Second part landed on TB 70, will backport to TB 69 beta and TB 68 ESR.
Comment 56•6 years ago
|
||
https://hg.mozilla.org/comm-central/rev/6ad08b4647eb8866389f09e8adfa77a597a74ac4
Looks like a comment got lost in action.
Comment 57•6 years ago
|
||
TB 69 beta:
https://hg.mozilla.org/releases/comm-beta/rev/8ed355a905feffd822c5a543344018f1c38566b6 (merged patch)
Updated•6 years ago
|
Comment 58•6 years ago
|
||
TB 68.0 ESR:
https://hg.mozilla.org/releases/comm-esr68/rev/e79abc4716bf821bbde70b2a69b5f9792488d607
Comment 59•5 years ago
|
||
After reading a little about unicode normalisation here: https://en.wikipedia.org/wiki/Unicode_equivalence#Example I can only confirm that the characters from the Mn list https://www.fileformat.info/info/unicode/category/Mn/list.htm must not be stripped (as originally proposed).
I'm still worried that the bit of stripping we do here
https://hg.mozilla.org/comm-central/rev/3ed8b845bb24fb8a1b48c7551d1cad979bdfdc27#l2.34
token = token.replace(/[\u034F\u17B4\u17B5\u180B-\u180D\uFE00-\uFE0F]/g, "");
could do damage.
Personally, I'd remove stripping characters from the Mn list completely, but we could keep stripping \uFE00-\uFE0F.
Magnus? Or Henri, do you have any insights here?
Assignee | ||
Comment 60•5 years ago
|
||
Even if there's theoretical damage, that damage would be rather minor. We haven't heard of any problems in the year it's been landed either. It would seem safer to me to strip more rather than less.
What's the new info?
Comment 61•5 years ago
|
||
Removing the Mn class as a whole seems like a weird thing to do. Comment 12 does not fully why it is removed. Why is it removed?
Comment 62•5 years ago
|
||
The new info is that the Mn list is there for a reason and characters we don't understand should be stripped. Of course raw UTF-8 is rare in mail headers and those characters don't affect our main user groups in the US, Europe and Japan. That doesn't prove that the software is correct though.
Comment 63•5 years ago
|
||
Removing the Mn class as a whole seems like a weird thing to do.
Henri, that was the initial plan in attachment 9075449 [details] [diff] [review] but fortunately that didn't go ahead. Now we're down to stripping just a tiny bit, see comment #59. So what's your opinion about stripping this:
token = token.replace(/[\u034F\u17B4\u17B5\u180B-\u180D\uFE00-\uFE0F]/g, "");
Comment 64•5 years ago
|
||
token = token.replace(/[\u034F\u17B4\u17B5\u180B-\u180D\uFE00-\uFE0F]/g, "");
Is the idea here that these are default ignorables that aren't in the other classes?
As I understand it, default ignorables should be ignorable in rendering, but that doesn't mean they are freely omissible from the underlying data. Could the resulting string compare equal with something it should not compare equal with if these are removed? As I understood this bug, the problem was rendering spaces that push stuff out of view. AFAICT, these characters don't take display space. Why are these a problem here?
Comment 65•5 years ago
|
||
Thanks for the answer, Henri. As I said in comment #59, I'd remove that particular line completely since it's there for no good reason.
Assignee | ||
Comment 66•5 years ago
|
||
If they are not included, you can trick the multi-space removal with data that has <space><invisible><space>.
Reporter | ||
Comment 67•5 years ago
|
||
I just retested this issue on the latest Thunderbird releases—v68.5.0 (stable) and v75.0a1 (daily). I used a few space characters (U+0020) mixed with zero-width spaces (U+200B), but Thunderbird failed to strip them out as shown in the screenshot (while consecutive spaces '<space><space><space>' seem to get stripped out successfully, character sequences like '<space><zero-width-space><space>' don't get stripped).
Also, while this issue has been marked as fixed for a while, I haven't seen it mentioned anywhere in the release notes: https://www.mozilla.org/en-US/security/known-vulnerabilities/thunderbird/
Reporter | ||
Comment 68•5 years ago
|
||
Attaching a new test case file which you could use to reproduce the issue.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 69•5 years ago
|
||
Thanks! Let's continue in bug 1617370.
Updated•5 years ago
|
Updated•5 years ago
|
Description
•