Closed Bug 1141446 Opened 9 years ago Closed 9 years ago

Behaviour of malformed rfc2047 encoded From message header inconsistent

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(thunderbird38+ fixed, thunderbird39 fixed, thunderbird40 fixed, thunderbird_esr31 unaffected)

RESOLVED FIXED
Thunderbird 40.0
Tracking Status
thunderbird38 + fixed
thunderbird39 --- fixed
thunderbird40 --- fixed
thunderbird_esr31 --- unaffected

People

(Reporter: jorgk-bmo, Assigned: jorgk-bmo)

Details

(Keywords: regression)

Attachments

(9 files, 8 obsolete files)

18.57 KB, image/png
Details
25.61 KB, application/zip
Details
18.90 KB, image/png
Details
5.88 KB, application/zip
Details
429.05 KB, application/x-7z-compressed
Details
100.98 KB, image/png
Details
3.62 KB, patch
Details | Diff | Splinter Review
3.85 KB, patch
Details | Diff | Splinter Review
3.86 KB, patch
rkent
: review+
Details | Diff | Splinter Review
rfc2047 encoded message headers, which are mal-formed, are treated inconsistently by TB. Enclosed are four messages with this from header:

From: "=?UTF-8?Q?S=C3=A1nchez_Edgar?= edgsanchezg@hotmail.com [Barcelona-Freecycle]" <Barcelona-Freecycle-noreply@yahoogroups.com>

The quotes should not be there.

In the *message list* for one message TB displays the undecoded text, for the others it shows "Sánchez Edgar". This behaviour is consistent for TB 24 to TB 38. I tried 24, 31, 36 und 38.

In the message preview or when opening the message, the different versions do different things: TB 24 and TB 31 show
  From "Sánchez Edgar"
TB 36 and further show the undecoded text.

Import from EML files:
Importing into TB 31, all messages are now displayed good.
Importing into TB 36, newly imported messages display bad, the ones imported before with TB 31 still show good, see screenshot.

This is very confusing and inconsistent behaviour.

It would be good if all messages were always displayed well, despite the mal-formed headers.
Another observation:
If in TB 31, 36 or 38 you select a set of these messages, so a summary is displayed in the preview pane, the decoded "Sánchez Edgar" is shown for all messages (however, in TB 36 the accented character doesn't show correctly, most likely due to bug 1082896, which is fixed in TB 38).
There is something really weird going on here. In TB 36 I revisited the messages previously showing correctly, and now they are *all* showing bad.

This is the reproducible case:
Import the four attached messages from the ELM files into TB 31.5 using drag from a file system folder onto the TB window. They should show correctly.
Start TB 36 or 38. These message should show still correctly. Now import the same messages in the same way. They should look bad.
This is what is shown in the attached screenshot.
Flags: needinfo?(Pidgeot18)
Keywords: regression
Would be good if TB 38 didn't look worse than TB 31 ;-)
This is seriously weird. Here the From address shows correctly in the message list, it shows bad in the message pane.

I've had cases where the From header looked bad in the message list and a "Repair Folder" fixed it. I tried again, and the second time, it didn't fix it.

Hard to find a reproducible case. To me it looks like some caching problem, since at times the From header is shown correctly, other times not.
OK. Things are getting clearer. The information in the message list is displayed from the MSF file. So what matters is with which version the message was received or imported. That explains the observation in comment #3.

Most likely it's just a JSMime regression. Joshua Cranmer pointed me to:
https://dxr.mozilla.org/comm-central/source/mailnews/mime/jsmime/jsmime.js#375

There we decode headers of the form
From: "=?UTF-8?Q?S=C3=A1nchez_Edgar?="
so with quotes, but =? and ?= at the start/end of the string
(if I read it correctly).

The reported case:
From: "=?UTF-8?Q?S=C3=A1nchez_Edgar?= more stuff" and even more stuff
is not handled.
Assignee: nobody → mozilla
Attached patch Proposal for discussion (obsolete) — Splinter Review
This is a simple one line change to be more aggressive with rfc2047 decoding.
Please let me know what you think.
Attachment #8577674 - Flags: feedback?(Pidgeot18)
Here's a message, where the decoder worked in 24.x (before jsmime) and now it doesn't. From header is:

From: "=?iso-8859-1?Q?Daniel_Rius_garcer=C3=A1n?= daniriusgarceran@yahoo.es [Barcelona-Freecycle]" <Barcelona-Freecycle-noreply@yahoogroups.com>

OK, this header is wrong, it should be UTF-8, but how come it worked before?
Some more testing: This mailbox contains 332 messages with rfc2047 encoded From field. With the patch in attachment 8577674 [details] [diff] [review] all but one message display fine. The one that doesn't display "correctly" is the one mentioned in comment #8. Maybe we can live with the fact that a "really bad" rfc2047 now doesn't decode correctly any more.
When multiple messages are selected, the "conversations" displayed in the message pane are also displayed correctly.
Summary: Behaviour of mal-formed rfc2047 encoded message header inconsistent → Behaviour of mal-formed rfc2047 encoded From message header inconsistent
Attached patch Proposed patch (obsolete) — Splinter Review
Attachment #8577674 - Attachment is obsolete: true
Attachment #8577674 - Flags: feedback?(Pidgeot18)
Attachment #8584753 - Flags: review?(Pidgeot18)
Comment on attachment 8584753 [details] [diff] [review]
Proposed patch

This is really a one-line change:
Old:
if (opts.rfc2047 && text.startsWith("=?") && text.endsWith("?="))
New:
if (opts.rfc2047 && text.indexOf("=?") > -1 && text.indexOf("?=") > -1)

In other words: Instead of only doing the decoding when the text starts and ends with the rfc2047 markers =? and ?=, we always (try to) decode when these markers are found anywhere in the text.

Sorry for being impatient, but we need this for TB 38 right about *now* ;-)
Attachment #8584753 - Flags: review?(kent)
Comment on attachment 8584753 [details] [diff] [review]
Proposed patch

Review of attachment 8584753 [details] [diff] [review]:
-----------------------------------------------------------------

Just because we need it now doesn't mean you can skimp on tests.

::: mailnews/mime/jsmime/jsmime.js
@@ +372,5 @@
>          // Quoted strings don't include their delimiters.
>          let text = value.slice(tokenStart + 1, i);
>  
> +        // If RFC 2047 is enabled, always decode the qstring if a both 2047 tokens are found
> +        if (opts.rfc2047 && text.indexOf("=?") > -1 && text.indexOf("?=") > -1)

First off, you're not explaining the rationale of the test.

Secondly, the test is not correct. (It may be simpler to just unconditionally decode the string).

Finally, this needs tests.
Attachment #8584753 - Flags: review?(kent)
Attachment #8584753 - Flags: review?(Pidgeot18)
Attachment #8584753 - Flags: review-
Can you propose a patch then please.
I'm not at all familiar with (automated?) testing in Thunderbird.
Assignee: mozilla → nobody
Attached patch Proposed patch, second attempt (obsolete) — Splinter Review
Here is a second patch with unconditional decoding.
This *may* be what Joshua Cranmer had in mind, sadly, he didn't give exact instructions.

I don't know how to add an automated test. I tested it manually with the mailbox in attachment 8577713 [details]: Delete the MSF and open the mailbox. All sender addresses look good.

As Joshua confirmed in his comment #12, we need this now, or else a problem will regress into TB 38. I believe we should be pragmatic and if need be, add a test later.

Sadly I can't do anything else, so I unassigned myself from the bug.
(In reply to Jorg K from comment #13)
> Can you propose a patch then please.
> I'm not at all familiar with (automated?) testing in Thunderbird.

Fortunately, this is easy for this sort of patch to jsmime: just add some extra entries to the array here:
<https://dxr.mozilla.org/comm-central/source/mailnews/mime/jsmime/test/test_header.js#341>.

You can test that the tests pass with |mozilla/mach xpcshell-test mailnews/mime|
Flags: needinfo?(Pidgeot18)
OK, thanks, but I still don't know where in the file you want the tests to go.
Here are the tests, can you please add them to avoid further delay (I will be travelling for the next few days):

Header, includes the quotes:
From: "=?UTF-8?Q?Jordi_Fern=C3=A1ndez_Nieto?= jordi.fernandez.nieto@gmail.com [Barcelona-Freecycle]"
From: "=?UTF-8?B?TWlyaWFtIEJlcm5hYsOpIFBlcmVsbMOz?= miriambernabe@gmail.com [Barcelona-Freecycle]"
From: "=?iso-8859-1?Q?Juana_Mar=EDa_Furi=F3_Sancho?= shangaguay@yahoo.es [Barcelona-Freecycle]"
From: "=?iso-8859-1?B?U29maWEgQ2FzdGVsbPMgUm9tZXJv?= soniasofia2@hotmail.com [Barcelona-Freecycle]"

Expected:
Jordi Fernández Nieto jordi.fernandez.nieto@gmail.com [Barcelona-Freecycle]
Miriam Bernabé Perelló miriambernabe@gmail.com [Barcelona-Freecycle]
Juana María Furió Sancho shangaguay@yahoo.es [Barcelona-Freecycle]
Sofia Castelló Romero soniasofia2@hotmail.com [Barcelona-Freecycle]
If you don't like the comments, I'll fix them.
Please tell me *exactly* what you want.
Attachment #8586091 - Attachment is obsolete: true
Attachment #8586986 - Flags: review?(Pidgeot18)
Component: Mail Window Front End → MIME
Product: Thunderbird → MailNews Core
Target Milestone: --- → Thunderbird 38.0
Version: 31 → 38
Comment on attachment 8586986 [details] [diff] [review]
Proposed patch, third attempt, now with tests

Review of attachment 8586986 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, but the patch does need some nits fixed before it can land.

::: mailnews/mime/jsmime/jsmime.js
@@ +371,5 @@
>        if (ch == endQuote && ch == '"') {
>          // Quoted strings don't include their delimiters.
>          let text = value.slice(tokenStart + 1, i);
>  
> +        // If RFC 2047 is enabled, always decode the qstring

Nit: period.

::: mailnews/mime/jsmime/test/test_header.js
@@ +349,5 @@
>        ["=?UTF-8?Q?<?= <a@b.c>", [{name: "<", email: "a@b.c"}]],
>        ["Simple =?UTF-8?Q?<?= a@b.c>",
>          [{name: "", email: '"Simple < a"@b.c'}]],
>        ["Tag <=?UTF-8?Q?email?=@b.c>", [{name: "Tag", email: "email@b.c"}]],
> +	  // Bug 1141446: mal-formed From addresses with erroneous quotes

Nit: "malformed"

Also, please replace the tabs with spaces, and keep to an 80-character maximum line length.
Attachment #8586986 - Flags: review?(Pidgeot18) → feedback+
Attached patch Proposed patch, after review (obsolete) — Splinter Review
Thanks!
Sorry about the tabs.
The longest line is now 85 characters long, I hope that's acceptable as line 104 was already 86 characters long (I thought the days of the teletype (24x80) are over) ;-)
Attachment #8584753 - Attachment is obsolete: true
Attachment #8586986 - Attachment is obsolete: true
Attachment #8594393 - Flags: review?(Pidgeot18)
Summary: Behaviour of mal-formed rfc2047 encoded From message header inconsistent → Behaviour of malformed rfc2047 encoded From message header inconsistent
(In reply to Jorg K (at the beach until 22nd April) from comment #20)
> Thanks!
> Sorry about the tabs.
> The longest line is now 85 characters long, I hope that's acceptable as line
> 104 was already 86 characters long (I thought the days of the teletype
> (24x80) are over) ;-)

I do most of my development using vim in tiled 24x80 xterms...
I used to be a 'vi' man, too, but now I like using Notepad++ on Windows ;-)
I hope you can accept the slightly overlong lines.
Attachment #8594393 - Attachment is obsolete: true
Attachment #8594393 - Flags: review?(Pidgeot18)
Attachment #8594429 - Flags: review?(Pidgeot18)
Comment on attachment 8594429 [details] [diff] [review]
Proposed patch, after review, again, this time with shorter lines

Review of attachment 8594429 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/jsmime/test/test_header.js
@@ +349,5 @@
>        ["=?UTF-8?Q?<?= <a@b.c>", [{name: "<", email: "a@b.c"}]],
>        ["Simple =?UTF-8?Q?<?= a@b.c>",
>          [{name: "", email: '"Simple < a"@b.c'}]],
>        ["Tag <=?UTF-8?Q?email?=@b.c>", [{name: "Tag", email: "email@b.c"}]],
> +    // Bug 1141446: malformed From addresses with erroneous quotes

This block of text is still improperly formatted, unfortunately.
Attachment #8594429 - Flags: review?(Pidgeot18) → feedback+
Sorry, I don't follow.

The previous review said:
> > +	  // Bug 1141446: mal-formed From addresses with erroneous quotes
> Nit: "malformed"

I fixed that. What else don't you like? Why don't you just say?
Would you like "Malformed" or perhaps a period at the end, it's not a full sentence though?

Most lines have a period, but not all:
// Make sure the input is an array (accept a single entry)
// Unstructured headers (just decode RFC 2047 for the first header value)
// Make the buffer be a typed array for what follows
*   name: The display name of the address
*   email: The address of the object
*   name: The display name of the group
*   group: An array of address object for members in the group.
* @param {String} headerValue The MIME header value to parse.
* @param {Boolean} doRFC2047  If true, decode RFC 2047 encoded-words.
* @param {Boolean} doRFC2231  If true, decode RFC 2231 encoded parameters.
* @return {Map(String -> String)} A map of parameter names to parameter values.

Let me know the rules that you follow and I adhere to them.
Attachment #8594429 - Attachment is obsolete: true
Attachment #8596351 - Flags: review?(Pidgeot18)
(In reply to Jorg K from comment #25)
> Let me know the rules that you follow and I adhere to them.

The code you have looks like this:
.....
      ["Simple =?UTF-8?Q?<?= a@b.c>",
        [{name: "", email: '"Simple < a"@b.c'}]],	
        [{name: "", email: '"Simple < a"@b.c'}]],
      ["Tag <=?UTF-8?Q?email?=@b.c>", [{name: "Tag", email: "email@b.c"}]],	
      ["Tag <=?UTF-8?Q?email?=@b.c>", [{name: "Tag", email: "email@b.c"}]],
    // Bug 1141446: Malformed From addresses with erroneous quotes,
    ["name

Notice how your new lines fail to match up? That's the formatting issues I complained about, now for the third time. The first time you used tabs, then you converted to spaces, but you failed to convert to enough spaces to cause the lines to match up properly.
Attachment #8596351 - Flags: review?(Pidgeot18) → review-
Sorry about the incorrect indent of the inserted block.
Attachment #8596351 - Attachment is obsolete: true
Attachment #8596365 - Flags: review?(Pidgeot18)
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
OS: Windows 7 → All
Hardware: x86_64 → All
Target Milestone: Thunderbird 38.0 → ---
Attached patch UnbitrottedSplinter Review
I think I can review the remaining issues, which is formatting. Unfortunately if I am going to add my name, then I am going to add my standards.

Are the names in question like jordi.f.nieto@gmail.com and miriam@gmail.com names of real valid users? If so, I don't think we should be adding them to our codebase. I don't mind Barcelona-Freecycle-noreply@yahoogroups.com as I assume it is a public group. All you have to do is to change gmail.com to example.com and change the user part, so jordi.f.nieto@gmail.com can be aaaaa.b.ccccc@example.com and vary parts of the names, so for example "Jordi Fern" can be something like "Jazzy Plant" or "Aaaaa Bbbbb"
Attachment #8602407 - Flags: review?(rkent)
Hi Kent, good to see some movement here. Here is an updated patch.
Remarks:
1) I made the changes in a text editor. I am in Spain with the travel laptop.
   I have only Firefox compiled, not Thunderbird, so I can't run the test
   (and I'd like to avoid a 90 minute compilation just for that):
   |mozilla/mach xpcshell-test mailnews/mime|
2) The names are real, the e-mail address had already been changed/shortened,
   to comply with the holy 80 character per line limit.
3) I changed gmail.com and hotmail.com to example.com, of course that results
   in a line that is now 81 characters long.
4) I changed Jordi to Jazzy and Juana to First, I can't change the other two
   since their complete names are ?B? (binary) encoded.
5) There is no Fern, it's part of Fernández.
6) There is no privacy issue here, since the names are visible on Yahoo Groups:
   https://es.groups.yahoo.com/neo/groups/Barcelona-Freecycle/conversations/messages
7) Am I really surprised how stuff was checked in, that clearly violates the
   80 character rule, see bug 1069790 comment #21. I think this rules is antiquated,
   but I don't understand why some and not others must adhere to it :-((
Attachment #8602524 - Flags: review?(rkent)
Attachment #8602524 - Attachment is obsolete: true
Attachment #8602524 - Flags: review?(rkent)
Attachment #8602542 - Flags: review?(rkent)
I did the 72 minute compile (after running into bug 777770) and ran the test:
mozilla/mach xpcshell-test mailnews/mime

Summary
=======
Ran 26 tests
Expected results: 26
Unexpected results: 0
OK

So it's ready to land, finally ...
Attachment #8602542 - Flags: review?(rkent)
Attachment #8602542 - Flags: review+
Attachment #8602542 - Flags: approval-comm-beta?
Attachment #8602542 - Flags: approval-comm-aurora?
Comment on attachment 8602542 [details] [diff] [review]
Obscured names and e-mail addresses, even more obfuscated

https://hg.mozilla.org/releases/comm-beta/rev/47119288c8c5
https://hg.mozilla.org/releases/comm-aurora/rev/6f83b5819850
Attachment #8602542 - Flags: approval-comm-beta?
Attachment #8602542 - Flags: approval-comm-beta+
Attachment #8602542 - Flags: approval-comm-aurora?
Attachment #8602542 - Flags: approval-comm-aurora+
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 40.0
Attachment #8602407 - Flags: review?(rkent)
Attachment #8596365 - Flags: review?(Pidgeot18)
Unless I'm blind, this never landed on trunk (TB40)
Flags: needinfo?(rkent)
OK, it got landed, I pulled it back to my local tree, but you didn't post the changeset:
https://hg.mozilla.org/comm-central/rev/df51276ef640
Flags: needinfo?(rkent)
You need to log in before you can comment on or make changes to this bug.