Closed Bug 1506587 Opened 2 years ago Closed 1 year ago

Sender Email Address Spoofing by extraneous white space in display name

Categories

(Thunderbird :: Message Reader UI, defect)

Desktop
All
defect
Not set
normal

Tracking

(thunderbird_esr68 fixed, thunderbird69 fixed, thunderbird70 fixed)

RESOLVED FIXED
Thunderbird 65.0
Tracking Status
thunderbird_esr68 --- fixed
thunderbird69 --- fixed
thunderbird70 --- fixed

People

(Reporter: 0xsobky, Assigned: mkmelin)

References

Details

(Keywords: csectype-spoof, sec-low)

Attachments

(6 files, 7 obsolete files)

38.22 KB, image/png
Details
4.39 KB, text/plain
Details
2.86 KB, patch
jorgk-bmo
: review+
Details | Diff | Splinter Review
10.47 KB, patch
jorgk-bmo
: review+
Details | Diff | Splinter Review
84.55 KB, image/png
Details
4.17 KB, text/plain
Details
By abusing Unicode whitespaces (e.g., U+3000) within the 'From:' email header, an attacker can spoof the sender email address when displayed by the Thunderbird mail client.

This issue can be reproduced by sending an email to your Yahoo address with the following From header (ignore the > sign at the beginning):
>From: "Mozilla <mail@mozilla.com>                                                                     ." <youremail@example.com>

This can be easily reproduced from a Gmail account by following these steps:
1. Go to "https://mail.google.com/mail/u/0/#settings/accounts".
2. Click "edit info" to modify the "Send mail as" setting.
3. Set the "Name" field to (ignore the > sign at the beginning):
>Mozilla <mail@mozilla.com>                                                                     .
4. Send a new email to be received by Thunderbird.

This issue can be reasonably mitigated by making the following changes:
1. Accepted characters in the 'From:' header should be limited to ASCII characters only with the exclusion of "<", ">", and "@".
2. Overly long values should be truncated so that the actual sender address is always visible.
Attached image screenshot.png
Attached file spoof.eml
Attaching the .eml file for easier reproduction....
Thanks for the report and test case. 

While email headers of course can't be trusted, I think we should fix the situation here by removing extraneous whitespace from the display name.
Assignee: nobody → mkmelin+mozilla
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Mac OS X → All
Summary: Sender Email Address Spoofing → Sender Email Address Spoofing by extraneous white space in display name
While allowed, there's no point in allowing many consecutive spaces to confuse people.
Attachment #9025760 - Flags: review?(jorgk)
Attachment #9024464 - Attachment mime type: message/rfc822 → text/plain
Comment on attachment 9025760 [details] [diff] [review]
bug1506587_from_whitespace_spoof.patch  [checked in]

Works for me. "consequtive" is a very novel spelling, I've already fixed it in the patch I will land.
Attachment #9025760 - Flags: review?(jorgk) → review+
Target Milestone: --- → Thunderbird 65.0
https://hg.mozilla.org/comm-central/rev/8ca96cbae03df48764aae6ab431b91164f16fd9b
Compact extraneous white space in display name to avoid sender address spoofing. r=jorgk
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Actually, why /\s\s+/? /\s+/ would have been enough. Or you didn't want to match a single space?
I took a quick look at the patch, and I'm afraid it does not seem to take non-printing characters into account.

> displayName = displayName.replace(/\s\s+/g, " ");
This would match and replace only consecutive whitespace characters, but a combination of whitespace characters and zero-width characters would bypass that (e.g., "\x20\u200B\x20\u200B\x20").
You're right, let's make sure that's fixed too.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
What do you think?
Attachment #9025899 - Flags: feedback?(0xsobky)
(In reply to Jorg K (GMT+1) from comment #7)
> Actually, why /\s\s+/? /\s+/ would have been enough. Or you didn't want to
> match a single space?

I don't know if it's smart enough to not do any work to replace space with space. I only wanted to replace when more than one space.
Comment on attachment 9025899 [details] [diff] [review]
bug1506587_whitespace-spoof-followup.patch

Review of attachment 9025899 [details] [diff] [review]:
-----------------------------------------------------------------

> token = token.replace(/[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F\u200B\u00AD]+/g, "");
While this indeed matches those problematic non-printable characters, it is still not an exhaustive match. For example, the combining grapheme joiner (CGJ) "U+034F" would still slip through this regular expression. Unfortunately, Unicode seems to be full of such problematic code points, but we can safely assume they fall into these four categories: "Separator, space" (Zs), "Mark, Nonspacing" (Mn), "Other, Control" (Cc), and "Other, Format" (Cf).

The first category (Zs) is trivial to match by only using the metacharacter `\s`; however, the other three ("Mn", "Cc" and "Cf") can only be matched using Unicode property escapes (`\p{Mn}`, `\p{Cc}`, and `\p{Cf}` respectively). Still, not all characters in these categories are non-printable, so we probably will have to specify the code-point ranges for all non-printable characters within the three categories.

A list of all characters in the "Mn", "Cc", and "Cf" categories can be found at:
[1] https://www.fileformat.info/info/unicode/category/Mn/list.htm
[2] https://www.fileformat.info/info/unicode/category/Cc/list.htm
[3] https://www.fileformat.info/info/unicode/category/Cf/list.htm

We might have to explore other options in case this approach sounds too complicated and prone to error (which it does)....
Duplicate of this bug: 53703
Duplicate of this bug: 1521611

I think we cannot use the suggestion to limit the display name to ASCII, because we want to show international characters in sender names, right?

We could detect "name" subsets that attempt to look like an email address. I think the name field wasn't intended to be used for the characters '<', '>', '@', and maybe it's acceptable to simply display them differently, to avoid confusion.

change < to {
change > to }
change @ to /

Alternatively, we could decide to not show such misleading strings at all, and show it as
(Misleading characters in sender name) <sender@from.com>

Attached patch 1506587-replace-v1.patch (obsolete) — Splinter Review

Thoughts?

I don't think replacing is the way to go. There are many potential use cases where special chars can be used (and @ is inparticular rather common, e.g. for mailing lists).

What I've been thinking we should do is gather the specials we can come up with like the patch attached ealier, but also have a hard cap on length we're showing in the UI, so that it's normally not possible to push out of view.

0xsobky, bug 1550431 is another spoffing issue related to email headers.
Would you allow the reporter of bug 1550431 to be cc'ed to this bug?

It you don't want to replace it, alternatively we could use a different method to clarify to the user that the @ isn't part of the real sender/to email address.

For example, the email address could be shown bold, the name parts not-bold.

The order could be changed: Always show the email address first, then add "Name: ", and only afterwards append the display name.

Magnus, if you find an example where a mailing list legitimately includes the @ character in their display name, or gets automatically by a daemon as part of the display name, please show me.

I don't seem to have a real example handy, but I think commonly especially by ticketing systems (iirc github also, at least used to). Like "John <john@example> via mailing list Foo Bar" <notifications@example.com>)

Email just has too many unexpected real usages that are real use cases.
Why not the hard cap of something like 60 chars for display name during display, like comment 16?

(In reply to Magnus Melin [:mkmelin] from comment #21)

I don't seem to have a real example handy, but I think commonly especially by ticketing systems (iirc github also, at least used to). Like "John <john@example> via mailing list Foo Bar" <notifications@example.com>)

I searched my archive of email.

I found a few examples of the "via mailing list" scenario, but it never included a full email address.
Rather, it was worded like this:
From: someone lastname via dev-security <dev-security@lists.mozilla.org>

In other emails, more than one @ was found only, if the sender name is set identical to the sender's email address.

(In reply to Magnus Melin [:mkmelin] from comment #22)

Email just has too many unexpected real usages that are real use cases.
Why not the hard cap of something like 60 chars for display name during display, like comment 16?

Even if the From is shown as ...

Alice <alice@mozilla.com> - Magnus <magnus@thunderbird.net>

... it can still be considered confusing. When you're in a hurry, you might not read through the end. You might notice the important part that you're expecting at the beginning, and you might stop parsing/reading after it.

(In reply to Kai Engert (:kaie:) from comment #20)

Magnus, if you find an example where a mailing list legitimately includes the @ character in their
display name, or gets automatically by a daemon as part of the display name, please show me.

Yahoo's Barcelona Freecycle list has hundreds/thousands of these:

From: "Antonia Palas APalas007@gmail.com [Barcelona-Freecycle]" <Barcelona-Freecycle-noreply@yahoogroups.com>
EDIT: edited original address (which was of course publicly visible on the Yahoo list)

Is that the format you're after?

(In reply to Jorg K (GMT+2) from comment #25)

Is that the format you're after?

Yes, thanks. So it's confirmed, the @ sign is indeed used in the name part of email recipients...

I've skimmed through RFCs. Below are excerpts of definitions from rfc 5322:

address         =   mailbox / group

mailbox         =   name-addr / addr-spec

name-addr       =   [display-name] angle-addr

display-name    =   phrase

phrase          =   1*word / obs-phrase

word            =   atom / quoted-string

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                    "!" / "#" /        ;  characters not including
                    "$" / "%" /        ;  specials.  Used for atoms.
                    "&" / "'" /
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~"

atom            =   [CFWS] 1*atext [CFWS]

specials        =   "(" / ")" /        ; Special characters that do
                    "<" / ">" /        ;  not appear in atext
                    "[" / "]" /
                    ":" / ";" /
                    "@" / "\" /
                    "," / "." /
                    DQUOTE

rfc 2047 defines how to encode unicode in mail headers, like in the spoof example attached to this bug. A relevant excerpt is:

(3) As a replacement for a 'word' entity within a 'phrase', for example,
    one that precedes an address in a From, To, or Cc header.  The ABNF
    definition for 'phrase' from RFC 822 thus becomes:

    phrase = 1*( encoded-word / word )

    In this case the set of characters that may be used in a "Q"-encoded
    'encoded-word' is restricted to: <upper and lower case ASCII
    letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_"
    (underscore, ASCII 95.)>.  An 'encoded-word' that appears within a
    'phrase' MUST be separated from any adjacent 'word', 'text' or
    'special' by 'linear-white-space'.

(In reply to Kai Engert (:kaie:) from comment #26)

Yes, thanks. So it's confirmed, the @ sign is indeed used in the name part of email recipients...

That was never in doubt, see "From: "Bugzilla@Mozilla" <bugzilla-daemon@mozilla.org>". But you wanted it from a mailing list. Prior discussion about @ in display names (and people who wanted to remove that) in bug 1423440.

Thanks Jörg. I had missed that even bugzilla uses it in inside display-name...

It's interesting that all examples don't display the '<' and '>' characters.
Could we strip those from the display name, and never show them?

This gets rid of a bunch of unwanted characters. While they are not illegal, they should be super unusual in practice, so I think we can get away with removing them.

Attachment #9025899 - Attachment is obsolete: true
Attachment #9070828 - Attachment is obsolete: true
Attachment #9025899 - Flags: feedback?(0xsobky)
Attachment #9075236 - Flags: review?(kaie)
Attachment #9075236 - Flags: feedback?(0xsobky)

Magnus, did you expect that your patch has any effect for the attached spoof.eml ? I don't see any difference.

Should the generated expressions get updated once a year? If yes, how could we remember to do that?

Attachment #9075236 - Flags: review?(kaie) → review+
Attachment #9025760 - Attachment filename: bug1506587_from_whitespace_spoof.patch → bug1506587_from_whitespace_spoof.patch [checked in]
Attachment #9025760 - Attachment description: bug1506587_from_whitespace_spoof.patch → bug1506587_from_whitespace_spoof.patch [checked in]
Attachment #9025760 - Attachment filename: bug1506587_from_whitespace_spoof.patch [checked in] → bug1506587_from_whitespace_spoof.patch

This patch wouldn't have any effect on spoof.eml since that was fixed already with the earlier patch.
I guess they should get updated. I've added a code comment about that.

Attachment #9075236 - Attachment is obsolete: true
Attachment #9075236 - Flags: feedback?(0xsobky)
Attachment #9075449 - Flags: review+
Keywords: checkin-needed
Attachment #9075449 - Flags: approval-comm-beta+
Comment on attachment 9075449 [details] [diff] [review]
bug1506587_whitespace-spoof-followup.patch

I don't think this is right. This bug is about "white-space" spoofing, not removing a whole range of Unicode characters and potentially obliterating message subjects.

I see no reason to remove most of the Mn class, https://www.fileformat.info/info/unicode/category/Mn/list.htm, and I'd also be more careful with the Cf class, https://www.fileformat.info/info/unicode/category/Cf/list.htm. There are a lot of printable characters or some that add "adjustments" to following characters. "they should be super unusual in practice" is completely unfounded. Please prove that you're not obliterating Arabic writing or anything that uses stuff from \u0300-\u036F.

BTW, you can find a use of ؄ (U+0604 ARABIC SIGN SAMVAT) here: https://archive.urdu.siasat.com/news/%D8%AD%D8%B6%D8%B1%D8%AA-%D8%B9%D9%84%DB%8C-%D9%85%D8%B1%D8%AA%D8%B6%DB%8C%D9%B0-%D8%84-%DA%A9%DB%8C-%D8%AC%D8%A7%DA%BA-%D9%86%D8%AB%D8%A7%D8%B1%DB%8C-730082/ - Search for the character on the page.
Attachment #9075449 - Flags: approval-comm-beta+ → review-
Comment on attachment 9075449 [details] [diff] [review]
bug1506587_whitespace-spoof-followup.patch

Review of attachment 9075449 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/jsmime/jsmime.js
@@ +399,5 @@
> +    // For a full list of categories, see <http://unicode.org/Public//5.0.0/ucd/UCD.html>.
> +
> +    // -- case A: /\p{Zs}/u, except space
> +    // https://www.fileformat.info/info/unicode/category/Zs/list.htm
> +    token = token.replace(/(?![ ])[\xA0\u1680\u2000-\u200A\u202F\u205F\u3000]/g, "");

BTW, this is wrong, too. You're simply removing any space other than the "normal" space, so you will potentially glue words together. What you need to do is replace all "non-standard" spaces and then collapse them to a single one.

You're right that the spacings should be replaced with space.
But for the other classes... well I looked through them even more closely now. I think it's possible that the Mn class removes some things wanted, in some cases. However, some points:

  • they may very well show as space if the font used on the system do not support them
  • for the combiners usually the "already combined" character is used, and the combination way of writing it is quite rare (e.g. a\u0302 vs â, or a\u030A vs å).
    So in summary, I would still remove them for the common good.

For the SAMWAT case you quote, I think that is also not something legit found in a from address. "used for writing Samvat era dates in Urdu" - https://scriptsource.org/cms/scripts/page.php?item_id=character_detail&key=U000604. We don't have so support stuff like that in the display name of emails, when there are clear drawbacks from doing so.

But you plan to cut into JS Mime's Token() function. Is that only used for addressing headers or also the subject header? I added debug in it and only saw parts of addresses go past, no subjects. In this case being more restrictive it possibly OK.

Yes it's used for all headers, but I don't think that makes a significant difference. We want to allow as much as possible, but I'm concerned what is shown would vary from system to system and some of what's showing on my system will be blank on others, and vice versa.

Hmm, I'm not sure why you're saying that it's used for all headers since that would make the patch less viable. I rebuilt the MSF file and still didn't see any subjects going past with my debugging added in that function.

As you know, subjects can be encoded in raw UTF-8 and we really shouldn't mess with that too much, if at all.

By the time they get to new Token, it is always UTF-8, otherwise this whole thing would be pretty useless.
It is only used for addresses, dates, and content type (looking at it more closely).

Or well, decoded from utf-8 to the internal string representation.

Just removing some (well, a lot) for the Mn class

Attachment #9075449 - Attachment is obsolete: true
Attachment #9078646 - Flags: review?(jorgk)
Comment on attachment 9078646 [details] [diff] [review]
bug1506587_whitespace-spoof-followup.patch

Some questions:

Can you add some tests like this:
```
+      // Collapse multiple "special" spaces like NBSP (\u00A0), EM space (\u2003), etc.
+      ["Friend \u00A0\u00A0\u2003 A <ws@example.com>",
+        [{name: "Friend A", email: "ws@example.com"}]],
+      // Maintain tabs.
+      ["Friend \t \t A <ws@example.com>",
+        [{name: "Friend \t \t A", email: "ws@example.com"}]],
```

Why are we maintaining tabs and not replace them with spaces?

Why are we replacing \u0600-\u0605\, they seem printable. As I said, I found a webpage with ؄, so why wouldn't that be valid in a display name?

Can you explain in a way I can retrace the steps where `\uD804[\uDCBD\uDCCD]|\uD80D[\uDC30-\uDC38]|\uD82F[\uDCA0-\uDCA3]|\uD834[\uDD73-\uDD7A]|\uDB40[\uDC01\uDC20-\uDC7F]` came from ... and add a comment explaining it.

EDIT: OK, it mentions "surrogate code point pairs", but a reference how those are generated would be useful.
Attachment #9078646 - Flags: review?(jorgk)

(In reply to Jorg K (GMT+2) from comment #44)

Can you add some tests like this:

We already have that, though I added some more tabs now

Why are we maintaining tabs and not replace them with spaces?

We aren't but didn't want to mess with them in the token stage since they can have meaning, and of course, since for the replacing of their class, we probably want them to be " " and not "".

Why are we replacing \u0600-\u0605, they seem printable. As I said, I found
a webpage with ؄, so why wouldn't that be valid in a display name?

Like I wrote in comment 37, that would be a rather strange display name. At any case, they are all formatting per definition, so we can certainly drop them for reasons of getting unformatted output too.

Can you explain in a way I can retrace the steps where
\uD804[\uDCBD\uDCCD]|\uD80D[\uDC30-\uDC38]|\uD82F[\uDCA0- \uDCA3]|\uD834[\uDD73-\uDD7A]|\uDB40[\uDC01\uDC20-\uDC7F] came from ... and
add a comment explaining it.

There is a comment already, I was using the regexpu tool - https://mothereff.in/regexpu#input=/\p{Cf}/u&unicodePropertyEscape=1

EDIT: OK, it mentions "surrogate code point pairs", but a reference how
those are generated would be useful.

For the surrogate point pairs, I just looked up what is at e.g. https://www.fileformat.info/info/unicode/char/e0100/index.htm - I assume there is some other way to get it too, but this was the easiest I found.

Attachment #9078646 - Attachment is obsolete: true
Attachment #9079004 - Flags: review?(jorgk)
Comment on attachment 9079004 [details] [diff] [review]
bug1506587_whitespace-spoof-followup.patch

Review of attachment 9079004 [details] [diff] [review]:
-----------------------------------------------------------------

We don't have a test for multiple "strange" spaces to be collapsed like I suggested: "Friend \u00A0\u00A0\u2003 A <ws@example.com>", at least I can't see it. Extra testing doesn't hurt.

I'm confused about the tabs now. Weren't you saying that you didn't want to mess with them? Now you added some to the input and they're not expected in the output any more. Does that test still pass?

I still think removing the subtending marks \u0600-\u0605 has any benefit. So if in doubt, they should stay.

::: mailnews/mime/jsmime/jsmime.js
@@ +858,5 @@
>                   addrSpec.substring(addrSpec.lastIndexOf("@"));
>      }
>  
>      // Replace consecutive whitespace in the name with a single whitespace.
> +    displayName = displayName.replace(/\s+/g, " ").trim();

Why this change now? I already questioned this in comment #7 and you answered in comment #11. What's changed now to change your mind?

Interesting, changing to displayName = displayName.replace(/\s+/g, " ").trim(); now makes all tabs disappear, when before only some disappeared:
https://bugzilla.mozilla.org/attachment.cgi?oldid=9078646&action=interdiff&newid=9079004&headers=1
There were already some tabs in the test case before you added more. Also, my test in comment #44 passed, so those tabs weren't collapsed with the other white-space before.

That's certainly a plus. I still think some additional test cases to "visualise" what's happening wouldn't be bad. I've adjusted my additions to the new behaviour:

+      // Collapse multiple "special" spaces like NBSP (\u00A0), EM space (\u2003), etc.
+      ["Friend \u00A0\u00A0\u2003 A <ws@example.com>",
+        [{name: "Friend A", email: "ws@example.com"}]],
+      // Remove tabs.
+      ["Friend \t \t A\t\tB <ws@example.com>",
+        [{name: "Friend A B", email: "ws@example.com"}]],

So with these added and \u0600-\u0605 removed since we're really looking to avoid white-space spoofing, not eliminate characters that we deem not useful, that would be an r+, assuming that the surrogates are correct, which I haven't checked yet.

(In reply to Jorg K (GMT+2) from comment #47)

Comment on attachment 9079004 [details] [diff] [review]
bug1506587_whitespace-spoof-followup.patch

Review of attachment 9079004 [details] [diff] [review]:

We don't have a test for multiple "strange" spaces to be collapsed like I
suggested: "Friend \u00A0\u00A0\u2003 A <ws@example.com>", at least I can't
see it. Extra testing doesn't hurt.

I'm confused about the tabs now. Weren't you saying that you didn't want to
mess with them?

I didn't want to mess with them inside the Token, but prefer to clean them up later (like the patch does).

I still think removing the subtending marks \u0600-\u0605 has any benefit.

 // Replace consecutive whitespace in the name with a single whitespace.
  • displayName = displayName.replace(/\s+/g, " ").trim();

Why this change now? I already questioned this in comment #7 and you
answered in comment #11. What's changed now to change your mind?

I realized we wanted also a single non-space space (like tab or \n) converted to " " if it happens to be there.

Attachment #9079004 - Attachment is obsolete: true
Attachment #9079004 - Flags: review?(jorgk)
Attachment #9079129 - Flags: review?(jorgk)

Nice. I tweaked the comments a bit. Any reason some links had < > and others didn't?

EDIT: See my changes here: https://bugzilla.mozilla.org/attachment.cgi?oldid=9079129&action=interdiff&newid=9079136&headers=1

Attachment #9079129 - Attachment is obsolete: true
Attachment #9079129 - Flags: review?(jorgk)
Attachment #9079136 - Flags: review+

So reason really.

You mean: No reason. Anyway, consistent now and with added links for get all that crazy stuff.

Attachment #9079136 - Flags: approval-comm-esr68+
Attachment #9079136 - Flags: approval-comm-beta+

Let's leave the target milestone at TB 65. Second part landed on TB 70, will backport to TB 69 beta and TB 68 ESR.

Status: REOPENED → RESOLVED
Closed: 2 years ago1 year ago
Resolution: --- → FIXED
Group: mail-core-security → core-security-release

After reading a little about unicode normalisation here: https://en.wikipedia.org/wiki/Unicode_equivalence#Example I can only confirm that the characters from the Mn list https://www.fileformat.info/info/unicode/category/Mn/list.htm must not be stripped (as originally proposed).

I'm still worried that the bit of stripping we do here
https://hg.mozilla.org/comm-central/rev/3ed8b845bb24fb8a1b48c7551d1cad979bdfdc27#l2.34
token = token.replace(/[\u034F\u17B4\u17B5\u180B-\u180D\uFE00-\uFE0F]/g, "");
could do damage.

Personally, I'd remove stripping characters from the Mn list completely, but we could keep stripping \uFE00-\uFE0F.

Magnus? Or Henri, do you have any insights here?

Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(hsivonen)

Even if there's theoretical damage, that damage would be rather minor. We haven't heard of any problems in the year it's been landed either. It would seem safer to me to strip more rather than less.

What's the new info?

Flags: needinfo?(mkmelin+mozilla)

Removing the Mn class as a whole seems like a weird thing to do. Comment 12 does not fully why it is removed. Why is it removed?

Flags: needinfo?(hsivonen)

The new info is that the Mn list is there for a reason and characters we don't understand should be stripped. Of course raw UTF-8 is rare in mail headers and those characters don't affect our main user groups in the US, Europe and Japan. That doesn't prove that the software is correct though.

Removing the Mn class as a whole seems like a weird thing to do.

Henri, that was the initial plan in attachment 9075449 [details] [diff] [review] but fortunately that didn't go ahead. Now we're down to stripping just a tiny bit, see comment #59. So what's your opinion about stripping this:
token = token.replace(/[\u034F\u17B4\u17B5\u180B-\u180D\uFE00-\uFE0F]/g, "");

Flags: needinfo?(hsivonen)

token = token.replace(/[\u034F\u17B4\u17B5\u180B-\u180D\uFE00-\uFE0F]/g, "");

Is the idea here that these are default ignorables that aren't in the other classes?

As I understand it, default ignorables should be ignorable in rendering, but that doesn't mean they are freely omissible from the underlying data. Could the resulting string compare equal with something it should not compare equal with if these are removed? As I understood this bug, the problem was rendering spaces that push stuff out of view. AFAICT, these characters don't take display space. Why are these a problem here?

Flags: needinfo?(hsivonen)

Thanks for the answer, Henri. As I said in comment #59, I'd remove that particular line completely since it's there for no good reason.

If they are not included, you can trick the multi-space removal with data that has <space><invisible><space>.

Attached image screenshot.png

I just retested this issue on the latest Thunderbird releases—v68.5.0 (stable) and v75.0a1 (daily). I used a few space characters (U+0020) mixed with zero-width spaces (U+200B), but Thunderbird failed to strip them out as shown in the screenshot (while consecutive spaces '<space><space><space>' seem to get stripped out successfully, character sequences like '<space><zero-width-space><space>' don't get stripped).

Also, while this issue has been marked as fixed for a while, I haven't seen it mentioned anywhere in the release notes: https://www.mozilla.org/en-US/security/known-vulnerabilities/thunderbird/

Attached file testcase.eml

Attaching a new test case file which you could use to reproduce the issue.

See Also: → CVE-2020-12397

Thanks! Let's continue in bug 1617370.

Attachment #9128245 - Attachment mime type: message/rfc822 → text/plain
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.