Closed Bug 1023285 Opened 10 years ago Closed 8 years ago

Message preview pane header has encoding problem (When RSS feed has not-rfc2047-encoded display-name of mail address and raw UTF-8 binary of « and » is contained in display-name part, mail address display at header pane is garbled)

Categories

(MailNews Core :: Backend, defect)

defect
Not set
normal

Tracking

(thunderbird38-, thunderbird47 wontfix, thunderbird48 wontfix, thunderbird49 fixed, thunderbird50 fixed, thunderbird_esr31 unaffected, thunderbird_esr38+ wontfix, thunderbird_esr4550+ fixed)

RESOLVED FIXED
Thunderbird 50.0
Tracking Status
thunderbird38 - ---
thunderbird47 --- wontfix
thunderbird48 --- wontfix
thunderbird49 --- fixed
thunderbird50 --- fixed
thunderbird_esr31 --- unaffected
thunderbird_esr38 + wontfix
thunderbird_esr45 50+ fixed

People

(Reporter: cakbugzilla, Assigned: jorgk-bmo)

References

Details

(Keywords: regression)

Attachments

(2 files, 13 obsolete files)

1.54 KB, message/rfc822
Details
1.47 KB, patch
rkent
: review+
Details | Diff | Splinter Review
Attached image EarlyBird.png (obsolete) —
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140609004001

Steps to reproduce:

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Thunderbird/32.0a2
Application Build: 20140610004001

After the latest update of Earlybird (June 10, 2014) the message preview pane header has a problem with the Greek encoding. 


Actual results:

Unreadable characters in message preview pane header. (Please see attachment).


Expected results:

The header up until before the update was working fine.
Because this is email, any data shown by mail viewing comes from mail data stream, and the "mail data stream" can be known by "View/Message Source" always, and From: field of Message Header Pane & From column value at Thread pane comes from From: header in the "mail data stream".

What "From: header" is seen in message source?
If non-ascii is contained in data of From: header, is it correctly encoded?

How about Subject: header? Non-ascii characters correctly encoded? What charset is specified in Content-Type: header of the mail or of "message body text part" if multipart mail?
Attached image EarlyBird2.png (obsolete) —
PNG of Source of message shown in attachment EarlyBird.png (8437681)
(In reply to WADA from comment #1)
> Because this is email, any data shown by mail viewing comes from mail data
> stream, and the "mail data stream" can be known by "View/Message Source"
> always, and From: field of Message Header Pane & From column value at Thread
> pane comes from From: header in the "mail data stream".
> 
> What "From: header" is seen in message source?
> If non-ascii is contained in data of From: header, is it correctly encoded?
> 
> How about Subject: header? Non-ascii characters correctly encoded? What
> charset is specified in Content-Type: header of the mail or of "message body
> text part" if multipart mail?

Hi!

Please see attachment EarlyBird2.png for an answer to your questions. I'm not sure if it makes a difference but it's not exactly an e-mail. It's an RSS news article.

Thank you!
(In reply to CAK from comment #3)
"Message Source" is text. So, "Copy&Paste in comment", "save as .eml file & Edit by Text Editor," etc. is possible and it's pretty easy. And, if saved as .eml file & Edited by Text Editor, "attaching .eml file to this bug" is also pretty easy. And, if original mail source file is attached to bug, many peoples can do duplication test.
Why do you use image even though text data and messeage source?

Content-Type: text/html; charset=utf-8, and HTML source is shown as expected.
i.e. Data of this mail is utf-8 binary, if characters in HTML is displayed as expected. 
i.e. display-name part of From: header, text in Subject: header, is written in utf-8 binary, and is not correctly encoded by RFC2047.
> http://tools.ietf.org/html/rfc2047
i.e. Essentially fault of mail sender.

Reason why Subject is shown as you expect at Thread Pane.
   Quirks for "non encoded non-ascii string in Subject:" works.
   When Subject: is not RFC2047 encoded, Thunderbird doesn't convert Subject: data from ascii to utf-8.
   Tb saves it as-is, and apply charset of mail(=utf-8 in your case) upon display.
Reason why Subject: is shown as you expect at message header pane.
  In Message header pane display, when Subject: is not RFC2047 encoded, data of Subject: header is placed in HTML element
  such as <p>, <span>, as-is.
  And charset of mail(=utf-8 in your case) is applied.  So, utf-8 binary in Subject: is shown as utf-8.
When From:(and To:/CC: display), in both Thread Pane and Message Header Pane, different process is used, because "replacement by Display: in Address Book" is applied and because "IDN support on addr-spec of mail address" will be needed.

(Q1) Quirks for "non encoded non-ascii string in display-name of From:/To:/CC:" worked in former builds of Tb?

(Q2) Who received this mail?
If received by older Thunderbird who showed the From: as you expect, data used for From column of Thread Pane is cached by the older Thunderbird. 
Is From: display changed in current Thunderbird by next?
- Tools/Opsions/Advanced/Readig&Display, Display,
   Check "Show only..." option, Uncheck  "Show only..." option again, OK and close option setting.
   Please keep "Show only..." option Unchecked, in order to make it simple, to surely rule out Address Book relevant issues.

(Q3) Is From: display changed in current Thunderbird by next? How about From column of Thread Pane?
- Create a local mail folder(call FolderX), 
  Folder Properties/General, Default character encoding: utf-8, Check [x] Apply default ... (may be "fall back charset" in trunk)
  This is option for such "not correctly encoded" mail, mail of "wrong charset which is set by mail sender".
- Copy the mail to FolderX, Folder Properties/General, Repair Folder
Attached image Properties window of FolderX (obsolete) —
(In reply to WADA from comment #4)
> (In reply to CAK from comment #3)
> "Message Source" is text. So, "Copy&Paste in comment", "save as .eml file &
> Edit by Text Editor," etc. is possible and it's pretty easy. And, if saved
> as .eml file & Edited by Text Editor, "attaching .eml file to this bug" is
> also pretty easy. And, if original mail source file is attached to bug, many
> peoples can do duplication test.
> Why do you use image even though text data and messeage source?
> 
> Content-Type: text/html; charset=utf-8, and HTML source is shown as expected.
> i.e. Data of this mail is utf-8 binary, if characters in HTML is displayed
> as expected. 
> i.e. display-name part of From: header, text in Subject: header, is written
> in utf-8 binary, and is not correctly encoded by RFC2047.
> > http://tools.ietf.org/html/rfc2047
> i.e. Essentially fault of mail sender.
> 
> Reason why Subject is shown as you expect at Thread Pane.
>    Quirks for "non encoded non-ascii string in Subject:" works.
>    When Subject: is not RFC2047 encoded, Thunderbird doesn't convert
> Subject: data from ascii to utf-8.
>    Tb saves it as-is, and apply charset of mail(=utf-8 in your case) upon
> display.
> Reason why Subject: is shown as you expect at message header pane.
>   In Message header pane display, when Subject: is not RFC2047 encoded, data
> of Subject: header is placed in HTML element
>   such as <p>, <span>, as-is.
>   And charset of mail(=utf-8 in your case) is applied.  So, utf-8 binary in
> Subject: is shown as utf-8.
> When From:(and To:/CC: display), in both Thread Pane and Message Header
> Pane, different process is used, because "replacement by Display: in Address
> Book" is applied and because "IDN support on addr-spec of mail address" will
> be needed.
> 
> (Q1) Quirks for "non encoded non-ascii string in display-name of
> From:/To:/CC:" worked in former builds of Tb?
> 
> (Q2) Who received this mail?
> If received by older Thunderbird who showed the From: as you expect, data
> used for From column of Thread Pane is cached by the older Thunderbird. 
> Is From: display changed in current Thunderbird by next?
> - Tools/Opsions/Advanced/Readig&Display, Display,
>    Check "Show only..." option, Uncheck  "Show only..." option again, OK and
> close option setting.
>    Please keep "Show only..." option Unchecked, in order to make it simple,
> to surely rule out Address Book relevant issues.
> 
> (Q3) Is From: display changed in current Thunderbird by next? How about From
> column of Thread Pane?
> - Create a local mail folder(call FolderX), 
>   Folder Properties/General, Default character encoding: utf-8, Check [x]
> Apply default ... (may be "fall back charset" in trunk)
>   This is option for such "not correctly encoded" mail, mail of "wrong
> charset which is set by mail sender".
> - Copy the mail to FolderX, Folder Properties/General, Repair Folder

Hi!

I have to say that since English is not my mother tongue and there are technical terms that I'm not familiar with I may have misunderstood some of your points/questions but here are my comments/answers:

I used an image because I thought that if I pasted a text file of the source it would be mis-encoded at your end. I guess I was wrong so I have attached two new attachments (Politis.eml and Politis_Source_UTF-8.txt) with the message itself and the source as text. I hope this is what you wanted.

I'm not very literate with HTML matters so I can't follow your reasoning of encoding in the header section. I understand that the problem is to the mail sender?

Q1: Former builds of TB are attached (Attachments: ThunderBird_Ver24.5 and ThunderBird_Ver24.6). In both it shows alright.

Q2: It makes no difference to the end result. New or Old RSS feeds messages appear the same in EarlyBird.
I followed your instructions regarding Tools/Options/Advanced/Reading&Display and it didn't change anything. The Sender is not in my Address book as the messages are NOT e-mail messages.

Q3: Yes. The display has changed with the latest version of EarlyBird.
The From column of Thread Pane (in EarlyBird) shows fine.

I followed your instructions for the creation of FolderX. Unfortunately I cannot set the Default character encoding in the Properties (please see attachment). I attached a capture of the Properties Window. The dropdown button as it appears on it will not work i.e. It doesn't drop down when clicked.

I copied the news article to FolderX and repaired the folder with no change whatsoever.
Hello again!

Here is the source of a message received from another Greek source that SHOWS FINE in the Message Preview Pane:

From - Wed, 11 Jun 2014 05:24:33 +0000
X-Mozilla-Status: 0041
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:                                                                                 
Received: by localhost; Wed, 11 Jun 2014 08:29:14 +0300
Date: Wed, 11 Jun 2014 05:24:33 +0000
Message-Id: <http://ikypros.com/?p=22322@localhost.localdomain>
From: Ηλεκτρονική Πύλη Ikypros
MIME-Version: 1.0
Subject: Στο Κακουργιοδικείο σήμερα Λευκαρίτης – Νικολάου
Keywords: Έγκλημα,ΚΥΠΡΟΣ,ΚΥΡΙΕΣ ΕΙΔΗΣΕΙΣ,ΛΕΥΚΑΡΙΤΗΣ
Content-Transfer-Encoding: 8bit
Content-Base: http://ikypros.com/?p=22322
Content-Type: text/html; charset=UTF-8


And here is the source of the PROBLEM message:

From - Wed, 11 Jun 2014 07:40:00 +0300
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:                                                                                 
Received: by localhost; Wed, 11 Jun 2014 07:53:05 +0300
Date: Wed, 11 Jun 2014 07:40:00 +0300
Message-Id: <http://www.politisnews.eu/cgibin/hweb?-A=266694&-V=articles@localhost.localdomain>
From: «ΠΟΛΙΤΗΣ» ONLINE
MIME-Version: 1.0
Subject: Η πολιτισμική διπλωματία της ΕΕ θέλει νέα «ώθηση»
Keywords: Ευρώπη
Content-Transfer-Encoding: 8bit
Content-Base: http://www.politisnews.eu/cgibin/hweb?-A=266694&-V=articles
Content-Type: text/html; charset=UTF-8


Comparing the two the only difference I see is the X-Mozilla-Status: (0041 vs 0001). 
I have no idea what that is but I thought I should let you know. :)
(In reply to CAK from comment #11)
Thanks for valuable information.

Good.
> From: Ηλεκτρονική Πύλη Ikypros
> Subject: Στο Κακουργιοδικείο σήμερα Λευκαρίτης – Νικολάου
Bad.
> From: «ΠΟΛΙΤΗΣ» ONLINE
> Subject: Η πολιτισμική διπλωματία της ΕΕ θέλει νέα «ώθηση»

"«ΠΟΛΙΤΗΣ» ONLINE" part is shown like "� ... � ONLINE <>" in screen shot.
> � : http://www.fileformat.info/info/unicode/char/0fffd/index.htm
What is shown for  "«ώθηση»" part in Subject column of Thread Pane and Subject: field of Message Header Pane?
> « : http://www.fileformat.info/info/unicode/char/ab/index.htm
> » : http://www.fileformat.info/info/unicode/char/bb/index.htm

(In reply to CAK from comment #10)
> Q2: (snip) New or Old RSS feeds messages appear the same in EarlyBird.

Oh, it was RSS feed.
If so, who generated the From: header/Subject: header is perhaps Thunderird himself.
i.e. Who violated RFC2822/RFC2047 is Thunderbird.

Do you see problem by following?
1. Start Tb 24, Folder Properties of the RSS feed, default character encoding = UTF-8, check "Apply default..."
2. Start Tb 32. Does problem occur in Thread Pane and Message Header Pane of the RSS feed?
3. In Tb 32, Folder Properties/Repair Folder of the RSS feed.
    Does something wrong occur?
. Start Tb 24, Folder Properties of the RSS feed, default character encoding = UTF-8, check "Apply default..."
    2. Start Tb 32. Does problem occur in Thread Pane and Message Header Pane of the RSS feed?
    3. In Tb 32, Folder Properties/Repair Folder of the RSS feed.
        Does something wrong occur?
(In reply to CAK from comment #9)
> Encoded "From" shows correctly in TB Ver 24.5/24.6

Please note that utf-8 binary in From: header(and in Subject: header) is not correctly encoded using RFC 2047.
(.eml file == text file saved in UTF-8. i.e. mail data is actually written in UTF-8.)
"8bit binaly" MUST NOT be placed in From: header, Subject: header, if the mail like data is data of email which is based on RFC/5321/RFC5322/RFC2047 etc.
If RFC5322 is correctly/fully applied to the mail like data, "binary in From:/Subject: header of the mail" should be interpreted as us-ascii.

RSS feed code might have ignored RFCs, because Tb already had quirks for "no RFC2047 encoding on non-ascii binary in message header".
(In reply to CAK from comment #10)
> Q3: Yes. The display has changed with the latest version of EarlyBird.
> The From column of Thread Pane (in EarlyBird) shows fine.

It looks that utf-8 was fortunately used as "fallback encoding character", even if dropdown menu at Folder Properties panel is broken.

(In reply to CAK from comment #9)
> Created attachment 8438187 [details]
> Properties window of FolderX
(In reply to CAK from comment #10)
> Unfortunately I cannot set the Default character encoding in the Properties (please see attachment). 
> I attached a capture of the Properties Window. 
> The dropdown button as it appears on it will not work i.e. It doesn't drop down when clicked.

Term "default character encoding" in Tb 24 was changed to "fallback encoding character" in trunk, at the panel  and backend.
Quirks for "non-encoded non-ascii binary in From:/To:/CC:" is broken by change from "default character encoding" to "fallback encoding character" at the panel and bakend?
Anyway, thanks very much for attaching .eml file. We can do duplication test without typing text from screen shot.
Hello once more!

I think the offending character is the Unicode Character 'LEFT-POINTING DOUBLE ANGLE QUOTATION MARK' (U+00AB) (as you probably suspected yourself) :)

I have done some more testing and I managed to make the error show NOT in the preview pane but in the POPUP ALERT (please see attachment "Testing3.png"

To reproduce the problem:
1. Edit the settings of a mail account and change "Your Name" field to something that includes the offending Unicode Character.
2. Send a message to another account of yours.
3. The message alert shown has the unreadable characters.
(In reply to WADA from comment #12)
> What is shown for  "«ώθηση»" part in Subject column of Thread Pane and
> Subject: field of Message Header Pane?

Please see attachment "EarlyBird3.png"
Attached image EarlyBird3.png (obsolete) —
> 3. In Tb 32, Folder Properties/Repair Folder of the RSS feed.
>     Does something wrong occur?
> . Start Tb 24, Folder Properties of the RSS feed, default character encoding
> = UTF-8, check "Apply default..."
>     2. Start Tb 32. Does problem occur in Thread Pane and Message Header
> Pane of the RSS feed?
>     3. In Tb 32, Folder Properties/Repair Folder of the RSS feed.
>         Does something wrong occur?


1. In TB32: Repairing the FolderX causes the messages in it to appear with unreadable characters. (attachment EarlyBird4_FolderX.png)

2. Starting TB24 and checking the properties of FolderX it shows the Default Character Encoding to be Western (ISO-8859-1) and:
       a. The message body in unreadable (similar to attachment EarlyBird4_FolderX.png)
       b. The From/Subject fields are fine.

3. Changing the encoding to Unicode (UTF-8) (even without repairing) results to:
       a. The message body to be fine
       b. The From/Subject fields are fine.

4. Closing TB24 and starting TB32. The messages in FolderX:
      a. The message body is fine
      b. The From field contains unreadable characters
      c. The Subject field is fine

5. Repairing FolderX (still in TB32 and WITHOUT BEING ABLE TO SET the default character encoding)
      a. The message body is UNREADABLE (similar to attachment EarlyBird4_FolderX.png)
      b. The From field contains unreadable characters
      c. The subject field is fine.

6. Starting TB24 again shows the Default Character Encoding to be AGAIN: Western (ISO-8859-1)
      (so I guess the UNSETTABLE Fallback Character Encoding is Western (ISO-8859-1))
Attached image Message in FolderX after repairing. (obsolete) —
> It looks that utf-8 was fortunately used as "fallback encoding character",
> even if dropdown menu at Folder Properties panel is broken.

This does not appear to be case. I think what is used as Fallback Encoding Character is Western (ISO-8859-1) and I don't know of a way to change it since the drop down is not working.
(In reply to CAK from comment #22)
> I don't know of a way to change it since the drop down is not working.
There is a simple way: Go to Tb 24, change it as you want, then go back to Tb trunk :-)

Anyway, I could observe problem with test mail like following in Tb trunk nightly.
   From:    "123 «ΠΟΛΙΤΗΣ» 456 «ώθηση» 789" <a@a.a.a>
   To:         "123 «ΠΟΛΙΤΗΣ» 456 «ώθηση» 789" <b@a.a.a>
   CC:        "123 «ΠΟΛΙΤΗΣ» 456 «ώθηση» 789" <c@a.a.a>
   Subject: "123 «ΠΟΛΙΤΗΣ» 456 «ώθηση» 789" <a@a.a.a>
   Content-Type: text/plain;charset=utf-8, message body text is also same as Subject: content written in utf-8 binary.
Result:
   Thread Pane : Subject, From, Recipient column : String is shown as expected.
   Message Header Pane: 
       Subject: String is shown as expected.
       From:, To:, CC: Garbled text is shown : 123 ��������� 456 �θ�÷� 789 <#@a.a.a> where # is 'a' or 'b' or 'c'
       Note:
         '"' at both side for quoted-text is removed.
         as known by your test result of "� ...� ONLINE <>", From:/To:/CC: is re-constructed by Tb for Header Pane display.
  Message Body : because Content-Type: text/xxx;charset=utf-8 is effective, normally displayed.

Phenomenon is perhaps;
   Auto-detect is invoked on UTF-8 binary when showing From:, To:, CC: at Message Header Pane.
   Because of U+00AB/u+00BB in UTF-8 binary, and because volume of string is very small when mail address,
   auto-detect can't know that binary is utf-8(perfect auto-detect is essentially impossible).
   So, conversion from unknown-charset to UTF-8/UTF-16 occurs.
I guess that quirks of "apply charset of message body" is lost or broken in From:/To:/CC: re-construction, after change to "fallback character encoding" from "default character encoding" or after changes in mime/character encoding  relevant code.

Confirming.
Status: UNCONFIRMED → NEW
Component: Untriaged → Backend
Ever confirmed: true
Product: Thunderbird → MailNews Core
(In reply to CAK from comment #10)
> Unfortunately I cannot set the Default character encoding in the Properties (please see attachment). 
> I attached a capture of the Properties Window. 
> The dropdown button as it appears on it will not work i.e. It doesn't drop down when clicked.

This "broken dropdown of fallback character encoding" is a part of bug 1003716.
bug 1003716 may be relivant to problem of this bug.
So setting dependency for ease of analysis, tracking, search.
Depends on: 1003716
Keywords: regression
Summary: Message preview pane header has encoding problem → Message preview pane header has encoding problem (When RSS feed has not-rfc2047-encoded display-name of mail address and raw UTF-8 binary of « and » is contained in display-name part, mail address display at header pane is garbled)
This message, and all others from a subscription to, eg,
http://www.politis-news.com/rss/news_greece.xml
work fine in build Thunderbird/32.0a1 ID:20140601145900 CSet: 74976b70c827
but no longer work in current Tb32 or Tb33.

the most likely regression is Bug 858337 or friends.  jcranmer, thoughts?

note that UTF-8 is now valid in a subject, local-part of an address (and domain likely too per idn specs), and body; this is per RFC6532[1].

[1] https://tools.ietf.org/html/rfc6532#section-3.2
Flags: needinfo?(Pidgeot18)
> this is per RFC6532.

Can "parser of message header" always assume utf-8 instead of current us-ascii when not RC 2047 encoded?
If so, problem of this bug won't occur.
Is change in auto-detect needed?
   Convert from utf-8 to utf-16 with converting non-utf-8 char to U+FFFD.
   Count #of chars which is "not U+FFFD in utf-8 binary" but "U+FFFD in converted utf-16".
   If # of "charcter converted to U+FFFD" exceeds 10% of chars, try other charset than utf-8.

Is RFC6532 already widely accepted and used? Can any mailer put raw utf-8 binary in header without RFC2047 encoding already?
(In reply to WADA from comment #26)
> > this is per RFC6532.
> 
> Can "parser of message header" always assume utf-8 instead of current
> us-ascii when not RC 2047 encoded?

Ha ha ha ha ha ha ha ha ha ha ha. The idea that headers are/were 7-bit pre-6532 is laughable. If anything, UTF-8 is actually an unsafe default for 8-bit headers.

The jsmime logic is to try UTF-8 and then fallback to a default charset (usually the charset of the body text) if UTF-8 decoding fails.

(In reply to alta88 from comment #25)
> the most likely regression is Bug 858337 or friends.  jcranmer, thoughts?

I suspect we're at some point passing a decoded string header to the decoder, which causes it to convert to treat the JS string as a binary string instead of a Unicode string. This is the sort of thing that structured header support should eventually kill off for good.
Flags: needinfo?(Pidgeot18)
To be more precise, Bug 858337 rewrote the working parseHeadersWithArray to use parseEncodedHeader for a non 2047 encoded string, emitted to processHeaders for display, causing this bustage.
Blocks: 858337
OS: Windows 7 → All
Hardware: x86_64 → All
(In reply to alta88 from comment #28)
> To be more precise, Bug 858337 rewrote the working parseHeadersWithArray to
> use parseEncodedHeader for a non 2047 encoded string, emitted to
> processHeaders for display, causing this bustage.

parseHeadersWithArray did RFC 2047 decoding in the original C++ version (and people relied on this fact), therefore parseHeadersWithArray now uses parseEncodedHeader internally.
the previous parseHeadersWithArray handled a non 2047 string and its replacement doesn't.  otherwise we wouldn't be here, right?
alta88, good to hear from you!

We are very late in the TB 38 release cycle, so tracking is being really used for blockers. With no progress on this for over 10 months, I don't think that we should block release here.

But we are slowly fixing the jsmime regressions, and this one can also probably be fixed with a few day's work. Tracking-esr38 means that we will investigate getting this fixed in a point release.
See Also: → 1175190
Alta88:

Once upon a time I reported bug 1146099 after having noticed that threat pane and headers/message pane show some headers differently. Bug 1146099 is about different treatment of malformed headers, and this bug here is about incorrect treatment of headers which have UTF-8 in them according RFC 6532. This works in the tread pane but not the headers pane.

Meanwhile I looked at bug 1285481 and bug 1279804 and although we don't know whether the message source was correct in those bugs (might not have been UTF-8 bit but some other 8bit character and there were no messages supplied), at least the reporters noticed the inconsistency.

You've already done some analysis in comment #28.

Do you think it would be possible to come up with a patch to achieve the following?
1) UTF-8 in the headers (RFC 6532) to be displayed correctly not only in the tread pane
   but also in the headers/message pane.
2) As a side effect, headers to be displayed consistently everywhere
   (and malformed ones also to be displayed equally badly everywhere).

Would you be interested in working on that?
Flags: needinfo?(alta88)
See Also: → 1146099
Attached patch WIP: debug patch (obsolete) — Splinter Review
I've done a bit of digging around. When using a message with
To: Testäöü <test@test.com> - that is To: Testäöü <test@test.com> in UTF-8
I see:

=== OutputEmailAddresses +
====|Testäöü <test@test.com>|====
== JS stack> file:///C:/mozilla-source/comm-central/obj-i686-pc-mingw32/dist/bin/components/mimeJSComponents.js (391)
== JS stack> chrome://messenger/content/msgHdrViewOverlay.js (1231)
== JS stack> chrome://messenger/content/msgHdrViewOverlay.js (1126)
== JS stack> chrome://messenger/content/msgHdrViewOverlay.js (468)
== JS stack> chrome://messenger/content/msgHdrViewOverlay.js (569)
====|test@test.com|Test���|Test��� <test@test.com>|====
=== OutputEmailAddresses -

So something seems to go wrong in parseEncodedHeader().

Although this calls the jsmime parser with
MimeParser.HEADER_OPTION_ALL_I18N
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimeJSComponents.js#264

and HEADER_OPTION_ALL_I18N includes HEADER_OPTION_ALLOW_RAW:
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimeParser.jsm#216

As far as I can see, it should work, but it doesn't.

Note that the subject doesn't get processed in this code, so
Subject: Here we have UTF-8 äöüáóú - Subject: Here we have UTF-8 äöüáóú
is displayed correctly.
Attachment #8438184 - Attachment is obsolete: true
Kent, you and I are a great team to hunt down jsmime regressions. Here is yet another one.

Please use attachment 8438183 [details] or attachment 8769429 [details] for testing.

The fix is dead simple: Instead of passing HEADER_OPTION_ALL_I18N I pass

  HEADER_OPTION_DECODE_2231 | HEADER_OPTION_DECODE_2047

but not HEADER_OPTION_ALLOW_RAW any more.

(I'm superseding all screenshots attached to the bug to make it less confusing.)
Assignee: nobody → mozilla
Attachment #8437681 - Attachment is obsolete: true
Attachment #8437949 - Attachment is obsolete: true
Attachment #8438178 - Attachment is obsolete: true
Attachment #8438179 - Attachment is obsolete: true
Attachment #8438187 - Attachment is obsolete: true
Attachment #8438243 - Attachment is obsolete: true
Attachment #8438253 - Attachment is obsolete: true
Attachment #8438273 - Attachment is obsolete: true
Attachment #8769450 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Flags: needinfo?(alta88)
Attachment #8769453 - Flags: review?(rkent)
After applying this patch (and disabling my extension), I can verify it fixes the problem described here (and elsewhere [1]); jorg you're faster at fixes than most are at reading bugmail on weekends..

There are still issues[2] with Bug 858337 regressions. First, after being rammed in[3], it was almost immediately abandoned. Those trying to release 38 know this well. A review requirement to support groups was never implemented, problematic for the point below.

Feeds, unfortunately, has never addressed (no pun) From: correctly.  RFC5322 requires an addr-spec, but things worked without one, until 858337; feeds are sort of different (reply UI is disabled etc).  RFC6854[4] recognizes the need to relax strict addr-spec requirements for From: for display purposes (for mail not just our use in feeds) and does so by allowing the group syntax. So jsmime can be tweaked to not hardcode angle brackets in the absence of addr-spec, or groups can be supported, per spec, and feeds can store From: with a group syntax in the absence of addr-spec in their <author> tags.

[1]Bug 858337 comment 44
[2]Bug 858337 comment 45
[3]Bug 1055077
[4]https://tools.ietf.org/html/rfc6854
(In reply to alta88 from comment #41)
> jorg you're faster at fixes than most are at reading bugmail on weekends..
Thanks. This is not the first jsmime regression I and Kent are fixing. (Sadly you are right, jsmime was landed and then problems were left for others to fix, still open: bug 1175900 and others).

After looking at the strings in convert8BitHeader()

  Failed for the message pane:
  convert8BitHeader in  |"Test äöü" <test@test.com>|
  convert8BitHeader out |"Test ���" <test@test.com>|
  returned is Test ???

  Worked for the tread pane:
  convert8BitHeader in  |"Test äöü" <test@test.com>|
  convert8BitHeader out |"Test äöü" <test@test.com>|
  returned is Test äöü.

it was pretty clear that we are dealing with a simple double encoding problem here. There doesn't need to be any raw encoding in parseEncodedHeader().
Comment on attachment 8769453 [details] [diff] [review]
Simple fix to yet another jsmime regression.

Damn, this conflicts with the fix in bug 1146099.

So we can't always remove the raw option but need to do it a bit smarter. Coming up.
Attachment #8769453 - Flags: review?(rkent)
With the fix from bug 1146099 we need the raw option for the thread pane, so I created a new function parseEncodedHeaderNoRaw to use exclusively in the message pane.

Please apply together with the fix from bug 1146099 and then test with the following messages:
1) The message attached here in Greek (attachment 8438183 [details]).
2) Attachment 8769429 [details] from bug 1285481.
3) Two messages from ZIP file from bug 1146099 (attachment 8581255 [details]).

Remember that some of the strings are stored in the MSF file, so if you juggle patches, you either need to reimport the messages or repair the folder to recreate the MSF file.
Attachment #8769453 - Attachment is obsolete: true
Attachment #8769647 - Flags: review?(rkent)
Fixed checkin comment.
Attachment #8769647 - Attachment is obsolete: true
Attachment #8769647 - Flags: review?(rkent)
Attachment #8769651 - Flags: review?(rkent)
Comment on attachment 8769651 [details] [diff] [review]
Simple fix to yet another jsmime regression (v2a).

Review of attachment 8769651 [details] [diff] [review]:
-----------------------------------------------------------------

You need to at least do the change I suggested, as it breaks the design to do it the way you are proposing.

You should also try to do a test that is fixed by your changes.

Other than that, I have the same comment that I made on bug 1146099, namely am I really the best reviewer of this?

::: mailnews/mime/src/mimeJSComponents.js
@@ +263,5 @@
>      let value = MimeParser.parseHeaderField(aHeader,
>        MimeParser.HEADER_ADDRESS | MimeParser.HEADER_OPTION_ALL_I18N, aCharset);
>      return fixArray(value, aPreserveGroups, count);
>    },
> +  parseEncodedHeaderNoRaw: function (aHeader, aCharset, aPreserveGroups, count) {

You are adding here a method to the JS implementation of a component, but you really are not adding a method to the component. So you really do not want to add a method here unless you are willing to do the full treatment, like add it to nsIMsgHeaderParser, add tests for the specific method, etc.

@@ +395,5 @@
>    // What follows is the deprecated API that will be removed shortly.
>  
>    parseHeadersWithArray: function (aHeader, aAddrs, aNames, aFullNames) {
>      let addrs = [], names = [], fullNames = [];
> +    let allAddresses = this.parseEncodedHeaderNoRaw(aHeader, undefined, false);

Assuming that the code in parseEncodedHeaderNoRaw is correct, and that we really want to modify parseHeadersWithArray (neither of which have I confirmed yet), you could just add the code from parseEncodedHeaderNoRaw here directly rather than trying to (incorrectly) add a method to the component implementation.
Attachment #8769651 - Flags: review?(rkent) → review-
OK, no new method.

Kent, you're one of the suggested reviewers.

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=561cc6161b81
Mozmill is green, Xpcshell hasn't completed.

If you want, I can do a Mozmill test and have Aceman review it. It would just be:
Load a message from a file, check that the strings look OK in the UI.
Attachment #8769651 - Attachment is obsolete: true
Attachment #8769911 - Flags: review?(rkent)
parseHeadersWithArray() has a few call sites:
https://dxr.mozilla.org/comm-central/search?q=parseHeadersWithArray&redirect=false

Many of them are in tests, these are the non-test ones:

mail/base/content/mailWindowOverlay.js
https://dxr.mozilla.org/comm-central/source/mail/base/content/mailWindowOverlay.js#1221
https://dxr.mozilla.org/comm-central/source/mail/base/content/mailWindowOverlay.js#3194

mail/base/content/msgHdrViewOverlay.js
https://dxr.mozilla.org/comm-central/source/mail/base/content/msgHdrViewOverlay.js#1231

mail/components/compose/content/MsgComposeCommands.js
https://dxr.mozilla.org/comm-central/source/mail/components/compose/content/MsgComposeCommands.js#3203

mailnews/addrbook/content/abMailListDialog.js
https://dxr.mozilla.org/comm-central/source/mailnews/addrbook/content/abMailListDialog.js#546

mailnews/db/gloda/modules/utils.js
https://dxr.mozilla.org/comm-central/source/mailnews/db/gloda/modules/utils.js#50

Some of these are only interested in the count returned, like in IsReplyAllEnabled()
https://dxr.mozilla.org/comm-central/source/mail/base/content/mailWindowOverlay.js#1221

We know for sure that OutputEmailAddresses() at
https://dxr.mozilla.org/comm-central/source/mail/base/content/msgHdrViewOverlay.js#1231
calls parseHeadersWithArray() and must not use raw encoding.

Hard to say whether any other call site needs the raw encoding. I suspect that not, since these call are not with "raw" headers retrieved from a message source but with addresses which have already been converted from raw UTF-8.

The more conservative approach would be to create a new method parseHeadersWithArrayNoRaw() and only call that where it is required. The more risky approach would be to land it "as is" and if anything garbled appears in the UI, switch to the conservative approach.

The trouble is that there is no clear definition of what parseHeadersWithArray() should to with respect do decoding, and I have the feeling that just calling parseEncodedHeader() was the lazy way out with the known consequences.

Here's another try run *without* the patch from bug 1146099:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=ed42e5046611
That try is green (apart from the current bustage in bug 1285735).
Comment on attachment 8769911 [details] [diff] [review]
Fix to yet another jsmime regression (v3).

Review of attachment 8769911 [details] [diff] [review]:
-----------------------------------------------------------------

OK I've given it my best shot, and I can't see any flaws in your reasoning. r+=me

Note to future readers whining about the regression this causes: we tried to audit the upstream callers, but with multiple levels of poorly typed string inputs, it is really hard to ensure that there is not some hidden assumption of raw processing. As far as we can tell, there is not, but sometimes we just need more eyes and testing, which landing this will do. Thank you for finding the regression!

Jörg: Bonus points for attempting to document parseHeadersWithArray now that you are the world expert and have introduced some new behavior assumptions.

::: mailnews/mime/src/mimeJSComponents.js
@@ +386,5 @@
>    // What follows is the deprecated API that will be removed shortly.
>  
>    parseHeadersWithArray: function (aHeader, aAddrs, aNames, aFullNames) {
>      let addrs = [], names = [], fullNames = [];
> +    // Parse header, but without EADER_OPTION_ALLOW_RAW.

Nit: missing leading "H" in HEADER_OPTION_ALLOW_RAW
Attachment #8769911 - Flags: review?(rkent) → review+
Thanks for the quick review. Mojibake (https://en.wikipedia.org/wiki/Mojibake) is really a pet hate of mine ;-)

https://hg.mozilla.org/comm-central/rev/1ecabc4781e3 --> FIXED

I earned bonus points for adding a comment to the IDL file.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 50.0
Comment on attachment 8769911 [details] [diff] [review]
Fix to yet another jsmime regression (v3).

[Approval Request Comment]
Regression caused by (bug #): Bug 858337 (jsmime)
User impact if declined: https://en.wikipedia.org/wiki/Mojibake
Testing completed (on c-c, etc.): Manual testing on various messages.
Risk to taking this patch (and alternatives if risky):
Some risk, that's why I'm uplifting it to TB 49 so it can get into the next beta early.
Attachment #8769911 - Flags: approval-comm-aurora+
Comment on attachment 8769911 [details] [diff] [review]
Fix to yet another jsmime regression (v3).

[Approval Request Comment]
This is tracking-esr45 and has been through beta, but no previous request to uplift to tb45. Any reason not to?
Attachment #8769911 - Flags: approval-comm-esr45?
I didn't want to annoy the ESR45 manager ;-)
Comment on attachment 8769911 [details] [diff] [review]
Fix to yet another jsmime regression (v3).

https://hg.mozilla.org/releases/comm-esr45/rev/18061283a1e8
Attachment #8769911 - Flags: approval-comm-esr45? → approval-comm-esr45+
Depends on: 1528496
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: