Closed Bug 1506587 Opened 6 years ago Closed 6 years ago

Sender Email Address Spoofing by extraneous white space in display name

Tracking

(thunderbird_esr68 fixed, thunderbird69 fixed, thunderbird70 fixed)

Status:

RESOLVED FIXED

Milestone:

Thunderbird 65.0

Tracking Flags:

Tracking

Status

thunderbird_esr68

---

fixed

thunderbird69

---

fixed

thunderbird70

---

fixed

People

(Reporter: 0xsobky, Assigned: mkmelin)

References

(Regressed 1 open bug)

Details

(Keywords: csectype-spoof, sec-low)

Attachments

(6 files, 7 obsolete files)

screenshot.png 6 years ago 0xsobky 38.22 KB, image/png		Details
spoof.eml 6 years ago 0xsobky 4.39 KB, text/plain		Details
bug1506587_from_whitespace_spoof.patch [checked in] 6 years ago Magnus Melin [:mkmelin] 2.86 KB, patch	jorgk-bmo : review+	Details \| Diff \| Splinter Review
bug1506587_whitespace-spoof-followup.patch 6 years ago Magnus Melin [:mkmelin] 5.27 KB, patch		Details \| Diff \| Splinter Review
1506587-replace-v1.patch 6 years ago Kai Engert [:KaiE:] 1.22 KB, patch		Details \| Diff \| Splinter Review
bug1506587_whitespace-spoof-followup.patch 6 years ago Magnus Melin [:mkmelin] 12.24 KB, patch	KaiE : review+	Details \| Diff \| Splinter Review
bug1506587_whitespace-spoof-followup.patch 6 years ago Magnus Melin [:mkmelin] 12.50 KB, patch	mkmelin : review+ jorgk-bmo : review-	Details \| Diff \| Splinter Review
bug1506587_whitespace-spoof-followup.patch 6 years ago Magnus Melin [:mkmelin] 9.54 KB, patch		Details \| Diff \| Splinter Review
bug1506587_whitespace-spoof-followup.patch 6 years ago Magnus Melin [:mkmelin] 9.54 KB, patch		Details \| Diff \| Splinter Review
bug1506587_whitespace-spoof-followup.patch 6 years ago Magnus Melin [:mkmelin] 10.03 KB, patch		Details \| Diff \| Splinter Review
bug1506587_whitespace-spoof-followup.patch 6 years ago Jorg K (CEST = GMT+2) 10.47 KB, patch	jorgk-bmo : review+ jorgk-bmo : approval-comm-beta+ jorgk-bmo : approval-comm-esr68+	Details \| Diff \| Splinter Review
screenshot.png 5 years ago 0xsobky 84.55 KB, image/png		Details
testcase.eml 5 years ago 0xsobky 4.17 KB, text/plain		Details

0xsobky

Reporter

Description

•

6 years ago

By abusing Unicode whitespaces (e.g., U+3000) within the 'From:' email header, an attacker can spoof the sender email address when displayed by the Thunderbird mail client. This issue can be reproduced by sending an email to your Yahoo address with the following From header (ignore the > sign at the beginning): >From: "Mozilla <mail@mozilla.com>　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　." <youremail@example.com> This can be easily reproduced from a Gmail account by following these steps: 1. Go to "https://mail.google.com/mail/u/0/#settings/accounts". 2. Click "edit info" to modify the "Send mail as" setting. 3. Set the "Name" field to (ignore the > sign at the beginning): >Mozilla <mail@mozilla.com>　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　. 4. Send a new email to be received by Thunderbird. This issue can be reasonably mitigated by making the following changes: 1. Accepted characters in the 'From:' header should be limited to ASCII characters only with the exclusion of "<", ">", and "@". 2. Overly long values should be truncated so that the actual sender address is always visible.

0xsobky

Reporter

Comment 1

•

6 years ago

Attached image screenshot.png — Details

0xsobky

Reporter

Comment 2

•

6 years ago

Attached file spoof.eml — Details

Attaching the .eml file for easier reproduction....

Magnus Melin [:mkmelin]

Assignee

Comment 3

•

6 years ago

Thanks for the report and test case. While email headers of course can't be trusted, I think we should fix the situation here by removing extraneous whitespace from the display name.

Assignee: nobody → mkmelin+mozilla

Status: UNCONFIRMED → NEW

Ever confirmed: true

OS: Mac OS X → All

Summary: Sender Email Address Spoofing → Sender Email Address Spoofing by extraneous white space in display name

Magnus Melin [:mkmelin]

Assignee

Comment 4

•

6 years ago

Attached patch bug1506587_from_whitespace_spoof.patch [checked in] — Details — Splinter Review

While allowed, there's no point in allowing many consecutive spaces to confuse people.

Attachment #9025760 - Flags: review?(jorgk)

Jorg K (CEST = GMT+2)

Updated

•

6 years ago

Attachment #9024464 - Attachment mime type: message/rfc822 → text/plain

Jorg K (CEST = GMT+2)

Comment 5

•

6 years ago

Comment on attachment 9025760 [details] [diff] [review] bug1506587_from_whitespace_spoof.patch [checked in] Works for me. "consequtive" is a very novel spelling, I've already fixed it in the patch I will land.

Attachment #9025760 - Flags: review?(jorgk) → review+

Jorg K (CEST = GMT+2)

Updated

•

6 years ago

Target Milestone: --- → Thunderbird 65.0

Jorg K (CEST = GMT+2)

Comment 6

•

6 years ago

https://hg.mozilla.org/comm-central/rev/8ca96cbae03df48764aae6ab431b91164f16fd9b Compact extraneous white space in display name to avoid sender address spoofing. r=jorgk

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Jorg K (CEST = GMT+2)

Comment 7

•

6 years ago

Actually, why /\s\s+/? /\s+/ would have been enough. Or you didn't want to match a single space?

0xsobky

Reporter

Comment 8

•

6 years ago

I took a quick look at the patch, and I'm afraid it does not seem to take non-printing characters into account. > displayName = displayName.replace(/\s\s+/g, " "); This would match and replace only consecutive whitespace characters, but a combination of whitespace characters and zero-width characters would bypass that (e.g., "\x20\u200B\x20\u200B\x20").

Magnus Melin [:mkmelin]

Assignee

Comment 9

•

6 years ago

You're right, let's make sure that's fixed too.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Magnus Melin [:mkmelin]

Assignee

Comment 10

•

6 years ago

Attached patch bug1506587_whitespace-spoof-followup.patch (obsolete) — Details — Splinter Review

What do you think?

Attachment #9025899 - Flags: feedback?(0xsobky)

Magnus Melin [:mkmelin]

Assignee

Comment 11

•

6 years ago

(In reply to Jorg K (GMT+1) from comment #7) > Actually, why /\s\s+/? /\s+/ would have been enough. Or you didn't want to > match a single space? I don't know if it's smart enough to not do any work to replace space with space. I only wanted to replace when more than one space.

0xsobky

Reporter

Comment 12

•

6 years ago

Comment on attachment 9025899 [details] [diff] [review] bug1506587_whitespace-spoof-followup.patch Review of attachment 9025899 [details] [diff] [review]: ----------------------------------------------------------------- > token = token.replace(/[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F\u200B\u00AD]+/g, ""); While this indeed matches those problematic non-printable characters, it is still not an exhaustive match. For example, the combining grapheme joiner (CGJ) "U+034F" would still slip through this regular expression. Unfortunately, Unicode seems to be full of such problematic code points, but we can safely assume they fall into these four categories: "Separator, space" (Zs), "Mark, Nonspacing" (Mn), "Other, Control" (Cc), and "Other, Format" (Cf). The first category (Zs) is trivial to match by only using the metacharacter `\s`; however, the other three ("Mn", "Cc" and "Cf") can only be matched using Unicode property escapes (`\p{Mn}`, `\p{Cc}`, and `\p{Cf}` respectively). Still, not all characters in these categories are non-printable, so we probably will have to specify the code-point ranges for all non-printable characters within the three categories. A list of all characters in the "Mn", "Cc", and "Cf" categories can be found at: [1] https://www.fileformat.info/info/unicode/category/Mn/list.htm [2] https://www.fileformat.info/info/unicode/category/Cc/list.htm [3] https://www.fileformat.info/info/unicode/category/Cf/list.htm We might have to explore other options in case this approach sounds too complicated and prone to error (which it does)....

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Keywords: csectype-spoof, sec-low

Magnus Melin [:mkmelin]

Assignee

Updated

•

6 years ago

tracking-thunderbird68: --- → +

Kai Engert [:KaiE:]

Comment 15

•

6 years ago

I think we cannot use the suggestion to limit the display name to ASCII, because we want to show international characters in sender names, right?

We could detect "name" subsets that attempt to look like an email address. I think the name field wasn't intended to be used for the characters '<', '>', '@', and maybe it's acceptable to simply display them differently, to avoid confusion.

change < to {
change > to }
change @ to /

Alternatively, we could decide to not show such misleading strings at all, and show it as
(Misleading characters in sender name) <sender@from.com>

Kai Engert [:KaiE:]

Comment 16

•

6 years ago

Attached patch 1506587-replace-v1.patch (obsolete) — Details — Splinter Review

Thoughts?

Magnus Melin [:mkmelin]

Assignee

Comment 17

•

6 years ago

I don't think replacing is the way to go. There are many potential use cases where special chars can be used (and @ is inparticular rather common, e.g. for mailing lists).

What I've been thinking we should do is gather the specials we can come up with like the patch attached ealier, but also have a hard cap on length we're showing in the UI, so that it's normally not possible to push out of view.

Kai Engert [:KaiE:]

Comment 18

•

6 years ago

0xsobky, bug 1550431 is another spoffing issue related to email headers.
Would you allow the reporter of bug 1550431 to be cc'ed to this bug?

Kai Engert [:KaiE:]

Comment 19

•

6 years ago

It you don't want to replace it, alternatively we could use a different method to clarify to the user that the @ isn't part of the real sender/to email address.

For example, the email address could be shown bold, the name parts not-bold.

The order could be changed: Always show the email address first, then add "Name: ", and only afterwards append the display name.

Kai Engert [:KaiE:]

Comment 20

•

6 years ago

Magnus, if you find an example where a mailing list legitimately includes the @ character in their display name, or gets automatically by a daemon as part of the display name, please show me.

Magnus Melin [:mkmelin]

Assignee

Comment 21

•

6 years ago

I don't seem to have a real example handy, but I think commonly especially by ticketing systems (iirc github also, at least used to). Like "John <john@example> via mailing list Foo Bar" <notifications@example.com>)

Magnus Melin [:mkmelin]

Assignee

Comment 22

•

6 years ago

Email just has too many unexpected real usages that are real use cases.
Why not the hard cap of something like 60 chars for display name during display, like comment 16?

Kai Engert [:KaiE:]

Comment 23

•

6 years ago

(In reply to Magnus Melin [:mkmelin] from comment #21)

I don't seem to have a real example handy, but I think commonly especially by ticketing systems (iirc github also, at least used to). Like "John <john@example> via mailing list Foo Bar" <notifications@example.com>)

I searched my archive of email.

I found a few examples of the "via mailing list" scenario, but it never included a full email address.
Rather, it was worded like this:
From: someone lastname via dev-security <dev-security@lists.mozilla.org>

In other emails, more than one @ was found only, if the sender name is set identical to the sender's email address.

Kai Engert [:KaiE:]

Comment 24

•

6 years ago

(In reply to Magnus Melin [:mkmelin] from comment #22)

Email just has too many unexpected real usages that are real use cases.
Why not the hard cap of something like 60 chars for display name during display, like comment 16?

Even if the From is shown as ...

Alice <alice@mozilla.com> - Magnus <magnus@thunderbird.net>

... it can still be considered confusing. When you're in a hurry, you might not read through the end. You might notice the important part that you're expecting at the beginning, and you might stop parsing/reading after it.

Jorg K (CEST = GMT+2)

Comment 25

•

6 years ago

•

Edited

(In reply to Kai Engert (:kaie:) from comment #20)

Magnus, if you find an example where a mailing list legitimately includes the @ character in their
display name, or gets automatically by a daemon as part of the display name, please show me.

Yahoo's Barcelona Freecycle list has hundreds/thousands of these:

From: "Antonia Palas APalas007@gmail.com [Barcelona-Freecycle]" <Barcelona-Freecycle-noreply@yahoogroups.com>
EDIT: edited original address (which was of course publicly visible on the Yahoo list)

Is that the format you're after?

Kai Engert [:KaiE:]

Comment 26

•

6 years ago

(In reply to Jorg K (GMT+2) from comment #25)

Is that the format you're after?

Yes, thanks. So it's confirmed, the @ sign is indeed used in the name part of email recipients...

Kai Engert [:KaiE:]

Comment 27

•

6 years ago

I've skimmed through RFCs. Below are excerpts of definitions from rfc 5322:

address         =   mailbox / group

mailbox         =   name-addr / addr-spec

name-addr       =   [display-name] angle-addr

display-name    =   phrase

phrase          =   1*word / obs-phrase

word            =   atom / quoted-string

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                    "!" / "#" /        ;  characters not including
                    "$" / "%" /        ;  specials.  Used for atoms.
                    "&" / "'" /
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~"

atom            =   [CFWS] 1*atext [CFWS]

specials        =   "(" / ")" /        ; Special characters that do
                    "<" / ">" /        ;  not appear in atext
                    "[" / "]" /
                    ":" / ";" /
                    "@" / "\" /
                    "," / "." /
                    DQUOTE

rfc 2047 defines how to encode unicode in mail headers, like in the spoof example attached to this bug. A relevant excerpt is:

(3) As a replacement for a 'word' entity within a 'phrase', for example,
    one that precedes an address in a From, To, or Cc header.  The ABNF
    definition for 'phrase' from RFC 822 thus becomes:

    phrase = 1*( encoded-word / word )

    In this case the set of characters that may be used in a "Q"-encoded
    'encoded-word' is restricted to: <upper and lower case ASCII
    letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_"
    (underscore, ASCII 95.)>.  An 'encoded-word' that appears within a
    'phrase' MUST be separated from any adjacent 'word', 'text' or
    'special' by 'linear-white-space'.

Jorg K (CEST = GMT+2)

Comment 28

•

6 years ago

(In reply to Kai Engert (:kaie:) from comment #26)

Yes, thanks. So it's confirmed, the @ sign is indeed used in the name part of email recipients...

That was never in doubt, see "From: "Bugzilla@Mozilla" <bugzilla-daemon@mozilla.org>". But you wanted it from a mailing list. Prior discussion about @ in display names (and people who wanted to remove that) in bug 1423440.

Kai Engert [:KaiE:]

Comment 29

•

6 years ago

Thanks Jörg. I had missed that even bugzilla uses it in inside display-name...

It's interesting that all examples don't display the '<' and '>' characters.
Could we strip those from the display name, and never show them?

Magnus Melin [:mkmelin]

Assignee

Comment 30

•

6 years ago

Attached patch bug1506587_whitespace-spoof-followup.patch (obsolete) — Details — Splinter Review

This gets rid of a bunch of unwanted characters. While they are not illegal, they should be super unusual in practice, so I think we can get away with removing them.

Attachment #9025899 - Attachment is obsolete: true

Attachment #9070828 - Attachment is obsolete: true

Attachment #9025899 - Flags: feedback?(0xsobky)

Attachment #9075236 - Flags: review?(kaie)

Attachment #9075236 - Flags: feedback?(0xsobky)

Magnus Melin [:mkmelin]

Assignee

Comment 31

•

6 years ago

The try run was pretty happy (fixed the small things it found). https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=4c2ae3db74786f6b403af4682d8b3853a1657229&selectedJob=254263226

Kai Engert [:KaiE:]

Comment 32

•

6 years ago

Magnus, did you expect that your patch has any effect for the attached spoof.eml ? I don't see any difference.

Should the generated expressions get updated once a year? If yes, how could we remember to do that?

Kai Engert [:KaiE:]

Updated

•

6 years ago

Attachment #9075236 - Flags: review?(kaie) → review+

Magnus Melin [:mkmelin]

Assignee

Updated

•

6 years ago

Attachment #9025760 - Attachment filename: bug1506587_from_whitespace_spoof.patch → bug1506587_from_whitespace_spoof.patch [checked in]

Magnus Melin [:mkmelin]

Assignee

Updated

•

6 years ago

Attachment #9025760 - Attachment description: bug1506587_from_whitespace_spoof.patch → bug1506587_from_whitespace_spoof.patch [checked in]

Attachment #9025760 - Attachment filename: bug1506587_from_whitespace_spoof.patch [checked in] → bug1506587_from_whitespace_spoof.patch

Magnus Melin [:mkmelin]

Assignee

Comment 33

•

6 years ago

This patch wouldn't have any effect on spoof.eml since that was fixed already with the earlier patch.
I guess they should get updated. I've added a code comment about that.

Magnus Melin [:mkmelin]

Assignee

Comment 34

•

6 years ago

Attached patch bug1506587_whitespace-spoof-followup.patch (obsolete) — Details — Splinter Review

Attachment #9075236 - Attachment is obsolete: true

Attachment #9075236 - Flags: feedback?(0xsobky)

Attachment #9075449 - Flags: review+

Magnus Melin [:mkmelin]

Assignee

Updated

•

6 years ago

Keywords: checkin-needed

Jorg K (CEST = GMT+2)

Updated

•

6 years ago

Attachment #9075449 - Flags: approval-comm-beta+

Jorg K (CEST = GMT+2)

Comment 35

•

6 years ago

•

Edited

Comment on attachment 9075449 [details] [diff] [review] bug1506587_whitespace-spoof-followup.patch I don't think this is right. This bug is about "white-space" spoofing, not removing a whole range of Unicode characters and potentially obliterating message subjects. I see no reason to remove most of the Mn class, https://www.fileformat.info/info/unicode/category/Mn/list.htm, and I'd also be more careful with the Cf class, https://www.fileformat.info/info/unicode/category/Cf/list.htm. There are a lot of printable characters or some that add "adjustments" to following characters. "they should be super unusual in practice" is completely unfounded. Please prove that you're not obliterating Arabic writing or anything that uses stuff from \u0300-\u036F. BTW, you can find a use of ؄ (U+0604 ARABIC SIGN SAMVAT) here: https://archive.urdu.siasat.com/news/%D8%AD%D8%B6%D8%B1%D8%AA-%D8%B9%D9%84%DB%8C-%D9%85%D8%B1%D8%AA%D8%B6%DB%8C%D9%B0-%D8%84-%DA%A9%DB%8C-%D8%AC%D8%A7%DA%BA-%D9%86%D8%AB%D8%A7%D8%B1%DB%8C-730082/ - Search for the character on the page.

Attachment #9075449 - Flags: approval-comm-beta+ → review-

Jorg K (CEST = GMT+2)

Comment 36

•

6 years ago

Comment on attachment 9075449 [details] [diff] [review] bug1506587_whitespace-spoof-followup.patch Review of attachment 9075449 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/mime/jsmime/jsmime.js @@ +399,5 @@ > + // For a full list of categories, see <http://unicode.org/Public//5.0.0/ucd/UCD.html>. > + > + // -- case A: /\p{Zs}/u, except space > + // https://www.fileformat.info/info/unicode/category/Zs/list.htm > + token = token.replace(/(?![ ])[\xA0\u1680\u2000-\u200A\u202F\u205F\u3000]/g, ""); BTW, this is wrong, too. You're simply removing any space other than the "normal" space, so you will potentially glue words together. What you need to do is replace all "non-standard" spaces and then collapse them to a single one.

Jorg K (CEST = GMT+2)

Updated

•

6 years ago

Keywords: checkin-needed

Magnus Melin [:mkmelin]

Assignee

Comment 37

•

6 years ago

You're right that the spacings should be replaced with space.
But for the other classes... well I looked through them even more closely now. I think it's possible that the Mn class removes some things wanted, in some cases. However, some points:

they may very well show as space if the font used on the system do not support them
for the combiners usually the "already combined" character is used, and the combination way of writing it is quite rare (e.g. a\u0302 vs â, or a\u030A vs å).
So in summary, I would still remove them for the common good.

For the SAMWAT case you quote, I think that is also not something legit found in a from address. "used for writing Samvat era dates in Urdu" - https://scriptsource.org/cms/scripts/page.php?item_id=character_detail&key=U000604. We don't have so support stuff like that in the display name of emails, when there are clear drawbacks from doing so.

Jorg K (CEST = GMT+2)

Comment 38

•

6 years ago

But you plan to cut into JS Mime's Token() function. Is that only used for addressing headers or also the subject header? I added debug in it and only saw parts of addresses go past, no subjects. In this case being more restrictive it possibly OK.

Magnus Melin [:mkmelin]

Assignee

Comment 39

•

6 years ago

Yes it's used for all headers, but I don't think that makes a significant difference. We want to allow as much as possible, but I'm concerned what is shown would vary from system to system and some of what's showing on my system will be blank on others, and vice versa.

Jorg K (CEST = GMT+2)

Comment 40

•

6 years ago

Hmm, I'm not sure why you're saying that it's used for all headers since that would make the patch less viable. I rebuilt the MSF file and still didn't see any subjects going past with my debugging added in that function.

As you know, subjects can be encoded in raw UTF-8 and we really shouldn't mess with that too much, if at all.

Magnus Melin [:mkmelin]

Assignee

Comment 41

•

6 years ago

By the time they get to new Token, it is always UTF-8, otherwise this whole thing would be pretty useless.
It is only used for addresses, dates, and content type (looking at it more closely).

Magnus Melin [:mkmelin]

Assignee

Comment 42

•

6 years ago

Or well, decoded from utf-8 to the internal string representation.

Magnus Melin [:mkmelin]

Assignee

Comment 43

•

6 years ago

Attached patch bug1506587_whitespace-spoof-followup.patch (obsolete) — Details — Splinter Review

Just removing some (well, a lot) for the Mn class

Attachment #9075449 - Attachment is obsolete: true

Attachment #9078646 - Flags: review?(jorgk)

Jorg K (CEST = GMT+2)

Comment 44

•

6 years ago

•

Edited

Comment on attachment 9078646 [details] [diff] [review] bug1506587_whitespace-spoof-followup.patch Some questions: Can you add some tests like this: ``` + // Collapse multiple "special" spaces like NBSP (\u00A0), EM space (\u2003), etc. + ["Friend \u00A0\u00A0\u2003 A <ws@example.com>", + [{name: "Friend A", email: "ws@example.com"}]], + // Maintain tabs. + ["Friend \t \t A <ws@example.com>", + [{name: "Friend \t \t A", email: "ws@example.com"}]], ``` Why are we maintaining tabs and not replace them with spaces? Why are we replacing \u0600-\u0605\, they seem printable. As I said, I found a webpage with ؄, so why wouldn't that be valid in a display name? Can you explain in a way I can retrace the steps where `\uD804[\uDCBD\uDCCD]|\uD80D[\uDC30-\uDC38]|\uD82F[\uDCA0-\uDCA3]|\uD834[\uDD73-\uDD7A]|\uDB40[\uDC01\uDC20-\uDC7F]` came from ... and add a comment explaining it. EDIT: OK, it mentions "surrogate code point pairs", but a reference how those are generated would be useful.

Attachment #9078646 - Flags: review?(jorgk)

Magnus Melin [:mkmelin]

Assignee

Comment 45

•

6 years ago

(In reply to Jorg K (GMT+2) from comment #44)

Can you add some tests like this:

We already have that, though I added some more tabs now

Why are we maintaining tabs and not replace them with spaces?

We aren't but didn't want to mess with them in the token stage since they can have meaning, and of course, since for the replacing of their class, we probably want them to be " " and not "".

Why are we replacing \u0600-\u0605, they seem printable. As I said, I found
a webpage with ؄, so why wouldn't that be valid in a display name?

Like I wrote in comment 37, that would be a rather strange display name. At any case, they are all formatting per definition, so we can certainly drop them for reasons of getting unformatted output too.

Can you explain in a way I can retrace the steps where
\uD804[\uDCBD\uDCCD]|\uD80D[\uDC30-\uDC38]|\uD82F[\uDCA0- \uDCA3]|\uD834[\uDD73-\uDD7A]|\uDB40[\uDC01\uDC20-\uDC7F] came from ... and
add a comment explaining it.

There is a comment already, I was using the regexpu tool - https://mothereff.in/regexpu#input=/\p{Cf}/u&unicodePropertyEscape=1

EDIT: OK, it mentions "surrogate code point pairs", but a reference how
those are generated would be useful.

For the surrogate point pairs, I just looked up what is at e.g. https://www.fileformat.info/info/unicode/char/e0100/index.htm - I assume there is some other way to get it too, but this was the easiest I found.

Magnus Melin [:mkmelin]

Assignee

Comment 46

•

6 years ago

Attached patch bug1506587_whitespace-spoof-followup.patch (obsolete) — Details — Splinter Review

Attachment #9078646 - Attachment is obsolete: true

Attachment #9079004 - Flags: review?(jorgk)

Jorg K (CEST = GMT+2)

Comment 47

•

6 years ago

Comment on attachment 9079004 [details] [diff] [review] bug1506587_whitespace-spoof-followup.patch Review of attachment 9079004 [details] [diff] [review]: ----------------------------------------------------------------- We don't have a test for multiple "strange" spaces to be collapsed like I suggested: "Friend \u00A0\u00A0\u2003 A <ws@example.com>", at least I can't see it. Extra testing doesn't hurt. I'm confused about the tabs now. Weren't you saying that you didn't want to mess with them? Now you added some to the input and they're not expected in the output any more. Does that test still pass? I still think removing the subtending marks \u0600-\u0605 has any benefit. So if in doubt, they should stay. ::: mailnews/mime/jsmime/jsmime.js @@ +858,5 @@ > addrSpec.substring(addrSpec.lastIndexOf("@")); > } > > // Replace consecutive whitespace in the name with a single whitespace. > + displayName = displayName.replace(/\s+/g, " ").trim(); Why this change now? I already questioned this in comment #7 and you answered in comment #11. What's changed now to change your mind?

Jorg K (CEST = GMT+2)

Comment 48

•

6 years ago

•

Edited

Interesting, changing to displayName = displayName.replace(/\s+/g, " ").trim(); now makes all tabs disappear, when before only some disappeared:
https://bugzilla.mozilla.org/attachment.cgi?oldid=9078646&action=interdiff&newid=9079004&headers=1
There were already some tabs in the test case before you added more. Also, my test in comment #44 passed, so those tabs weren't collapsed with the other white-space before.

That's certainly a plus. I still think some additional test cases to "visualise" what's happening wouldn't be bad. I've adjusted my additions to the new behaviour:

+      // Collapse multiple "special" spaces like NBSP (\u00A0), EM space (\u2003), etc.
+      ["Friend \u00A0\u00A0\u2003 A <ws@example.com>",
+        [{name: "Friend A", email: "ws@example.com"}]],
+      // Remove tabs.
+      ["Friend \t \t A\t\tB <ws@example.com>",
+        [{name: "Friend A B", email: "ws@example.com"}]],

So with these added and \u0600-\u0605 removed since we're really looking to avoid white-space spoofing, not eliminate characters that we deem not useful, that would be an r+, assuming that the surrogates are correct, which I haven't checked yet.

Magnus Melin [:mkmelin]

Assignee

Comment 49

•

6 years ago

(In reply to Jorg K (GMT+2) from comment #47)

Comment on attachment 9079004 [details] [diff] [review]
bug1506587_whitespace-spoof-followup.patch

Review of attachment 9079004 [details] [diff] [review]:

We don't have a test for multiple "strange" spaces to be collapsed like I
suggested: "Friend \u00A0\u00A0\u2003 A <ws@example.com>", at least I can't
see it. Extra testing doesn't hurt.

I'm confused about the tabs now. Weren't you saying that you didn't want to
mess with them?

I didn't want to mess with them inside the Token, but prefer to clean them up later (like the patch does).

I still think removing the subtending marks \u0600-\u0605 has any benefit.

 // Replace consecutive whitespace in the name with a single whitespace.
displayName = displayName.replace(/\s+/g, " ").trim();
Why this change now? I already questioned this in comment #7 and you
answered in comment #11. What's changed now to change your mind?

I realized we wanted also a single non-space space (like tab or \n) converted to " " if it happens to be there.

Magnus Melin [:mkmelin]

Assignee

Comment 50

•

6 years ago

Attached patch bug1506587_whitespace-spoof-followup.patch (obsolete) — Details — Splinter Review

Attachment #9079004 - Attachment is obsolete: true

Attachment #9079004 - Flags: review?(jorgk)

Attachment #9079129 - Flags: review?(jorgk)

Jorg K (CEST = GMT+2)

Comment 51

•

6 years ago

•

Edited

Attached patch bug1506587_whitespace-spoof-followup.patch — Details — Splinter Review

Nice. I tweaked the comments a bit. Any reason some links had < > and others didn't?

EDIT: See my changes here: https://bugzilla.mozilla.org/attachment.cgi?oldid=9079129&action=interdiff&newid=9079136&headers=1

Attachment #9079129 - Attachment is obsolete: true

Attachment #9079129 - Flags: review?(jorgk)

Attachment #9079136 - Flags: review+

Magnus Melin [:mkmelin]

Assignee

Comment 52

•

6 years ago

So reason really.

Jorg K (CEST = GMT+2)

Comment 53

•

6 years ago

You mean: No reason. Anyway, consistent now and with added links for get all that crazy stuff.

Jorg K (CEST = GMT+2)

Comment 54

•

6 years ago

https://hg.mozilla.org/comm-central/rev/3ed8b845bb24fb8a1b48c7551d1cad979bdfdc27
remove unwanted characters from headers. r=jorgk

Jorg K (CEST = GMT+2)

Updated

•

6 years ago

Attachment #9079136 - Flags: approval-comm-esr68+

Attachment #9079136 - Flags: approval-comm-beta+

Jorg K (CEST = GMT+2)

Comment 55

•

6 years ago

Let's leave the target milestone at TB 65. Second part landed on TB 70, will backport to TB 69 beta and TB 68 ESR.

Status: REOPENED → RESOLVED

Closed: 6 years ago → 6 years ago

Resolution: --- → FIXED

Jorg K (CEST = GMT+2)

Comment 56

•

6 years ago

https://hg.mozilla.org/comm-central/rev/6ad08b4647eb8866389f09e8adfa77a597a74ac4
Looks like a comment got lost in action.

Jorg K (CEST = GMT+2)

Comment 57

•

6 years ago

TB 69 beta:
https://hg.mozilla.org/releases/comm-beta/rev/8ed355a905feffd822c5a543344018f1c38566b6 (merged patch)

status-thunderbird69: --- → fixed

status-thunderbird70: --- → fixed

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: mail-core-security → core-security-release

Jorg K (CEST = GMT+2)

Comment 58

•

6 years ago

TB 68.0 ESR:
https://hg.mozilla.org/releases/comm-esr68/rev/e79abc4716bf821bbde70b2a69b5f9792488d607

status-thunderbird_esr68: --- → fixed

tracking-thunderbird68: + → ---

Jorg K (CEST = GMT+2)

Comment 59

•

5 years ago

After reading a little about unicode normalisation here: https://en.wikipedia.org/wiki/Unicode_equivalence#Example I can only confirm that the characters from the Mn list https://www.fileformat.info/info/unicode/category/Mn/list.htm must not be stripped (as originally proposed).

I'm still worried that the bit of stripping we do here
https://hg.mozilla.org/comm-central/rev/3ed8b845bb24fb8a1b48c7551d1cad979bdfdc27#l2.34
token = token.replace(/[\u034F\u17B4\u17B5\u180B-\u180D\uFE00-\uFE0F]/g, "");
could do damage.

Personally, I'd remove stripping characters from the Mn list completely, but we could keep stripping \uFE00-\uFE0F.

Magnus? Or Henri, do you have any insights here?

Flags: needinfo?(mkmelin+mozilla)

Flags: needinfo?(hsivonen)

Magnus Melin [:mkmelin]

Assignee

Comment 60

•

5 years ago

Even if there's theoretical damage, that damage would be rather minor. We haven't heard of any problems in the year it's been landed either. It would seem safer to me to strip more rather than less.

What's the new info?

Flags: needinfo?(mkmelin+mozilla)

Henri Sivonen (:hsivonen)

Comment 61

•

5 years ago

Removing the Mn class as a whole seems like a weird thing to do. Comment 12 does not fully why it is removed. Why is it removed?

Flags: needinfo?(hsivonen)

Jorg K (CEST = GMT+2)

Comment 62

•

5 years ago

The new info is that the Mn list is there for a reason and characters we don't understand should be stripped. Of course raw UTF-8 is rare in mail headers and those characters don't affect our main user groups in the US, Europe and Japan. That doesn't prove that the software is correct though.

Jorg K (CEST = GMT+2)

Comment 63

•

5 years ago

Removing the Mn class as a whole seems like a weird thing to do.

Henri, that was the initial plan in attachment 9075449 [details] [diff] [review] but fortunately that didn't go ahead. Now we're down to stripping just a tiny bit, see comment #59. So what's your opinion about stripping this:
token = token.replace(/[\u034F\u17B4\u17B5\u180B-\u180D\uFE00-\uFE0F]/g, "");

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 64

•

5 years ago

token = token.replace(/[\u034F\u17B4\u17B5\u180B-\u180D\uFE00-\uFE0F]/g, "");

Is the idea here that these are default ignorables that aren't in the other classes?

As I understand it, default ignorables should be ignorable in rendering, but that doesn't mean they are freely omissible from the underlying data. Could the resulting string compare equal with something it should not compare equal with if these are removed? As I understood this bug, the problem was rendering spaces that push stuff out of view. AFAICT, these characters don't take display space. Why are these a problem here?

Flags: needinfo?(hsivonen)

Jorg K (CEST = GMT+2)

Comment 65

•

5 years ago

Thanks for the answer, Henri. As I said in comment #59, I'd remove that particular line completely since it's there for no good reason.

Magnus Melin [:mkmelin]

Assignee

Comment 66

•

5 years ago

If they are not included, you can trick the multi-space removal with data that has <space><invisible><space>.

0xsobky

Reporter

Comment 67

•

5 years ago

Attached image screenshot.png — Details

I just retested this issue on the latest Thunderbird releases—v68.5.0 (stable) and v75.0a1 (daily). I used a few space characters (U+0020) mixed with zero-width spaces (U+200B), but Thunderbird failed to strip them out as shown in the screenshot (while consecutive spaces '<space><space><space>' seem to get stripped out successfully, character sequences like '<space><zero-width-space><space>' don't get stripped).

Also, while this issue has been marked as fixed for a while, I haven't seen it mentioned anywhere in the release notes: https://www.mozilla.org/en-US/security/known-vulnerabilities/thunderbird/

0xsobky

Reporter

Comment 68

•

5 years ago

Attached file testcase.eml — Details

Attaching a new test case file which you could use to reproduce the issue.

Magnus Melin [:mkmelin]

Assignee

Updated

•

5 years ago

Comment 69

•

5 years ago

Thanks! Let's continue in bug 1617370.

Jorg K (CEST = GMT+2)

Updated

•

5 years ago

Attachment #9128245 - Attachment mime type: message/rfc822 → text/plain

Daniel Veditz [:dveditz]

Updated

•

5 years ago

Group: core-security-release

Alfred Peters [:infofrommozilla]

Updated

•

9 months ago

Regressions: 1900708

You need to log in before you can comment on or make changes to this bug.