If order of http-equiv and content in <meta> is reversed(<meta content="...;charset=..." http-equiv="Content-Type">), charset in <meta> is applied to mail data coverted to utf-8. Thus, non-ascii characters are shown in garbled

RESOLVED FIXED in Thunderbird 24.0

Status

MailNews Core
MIME
--
major
RESOLVED FIXED
7 years ago
4 years ago

People

(Reporter: World, Assigned: protz)

Tracking

(Blocks: 1 bug, {html5})

Trunk
Thunderbird 24.0
html5
Dependency tree / graph

Firefox Tracking Flags

(blocking-thunderbird5.0 -)

Details

(Whiteboard: [needs a mozmill test] [View/Message Body As/Original HTML case])

Attachments

(5 attachments, 3 obsolete attachments)

(Reporter)

Description

7 years ago
Created attachment 473359 [details]
Test mails in Shift_JIS

[Build ID]
> Mozilla/5.0 (Windows NT 5.1; rv:2.0b6pre) Gecko/20100906 Thunderbird/3.2a1pre

With html5.enable=true, <meta http-equiv="Content-Type" content="..."> is not properly processed and non-ascii characters are not shown correctly, if order of attributes in <meta> is reversed.

It was found during addtional testing for bug 572886 with some variants of test cases such as different charset, non-ascii data case. In testing, similar phenomenon to Bug 505072, Bug 508946, Bug 593894 was observed with some test cases, but such problem could not be observed with some other cases even though mail data is very similar(nearly same as mail produces problem). So I checked difference of test mails.   
Attached mail folder file contains three mails written in Shift_JIS(Content-Type: text/html; charset="Shift_JIS").
  mail-01 :
   <meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS">
  mail-02 :
   <meta content="text/html; charset=Shift_JIS" http-equiv="Content-Type">
  mail-03 :
   <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
Expected display of text is;
  ここは日本語です。 Here is Japanese.
Observed phenomenon.
  mail-01 : no problem
  mail-02 : garbage is shown for non-ascii characters.
            (probably phenomenon reported to Bug 593894)
  mail-03 : non-ascii character is shown as expected,
            if charset=utf-8 is specified in mail-02.

I don't know "internal reload by <meta> charset"(problem of bug 572886) happens or not in above cases.
(Reporter)

Updated

7 years ago
Attachment #473359 - Attachment mime type: application/octet-stream → text/plain
(Reporter)

Updated

7 years ago
Summary: With html5.enable=true, <meta http-equiv="Content-Type" content="..."> is not properly processed and non-ascii characters are not shown correctly, if order of attributes in <meta> is reversed. → With html5.enable=true, if order of http-equiv and content in <meta> is reversed(<meta content="...;charset=..." http-equiv="Content-Type">), charset in <meta> is applied to mail data coverted to utf-8. Thus, non-ascii characters are shown in garbled
(Reporter)

Comment 1

7 years ago
Note:
If http-equiv and content attributes are written in normal order in <meta>, charset in <meta> is properly ignored as expected, when charset is specified in Content-Type: header.
> <meta http-equiv="Content-Type" content="text/html; charset=xxxxx">
(Reporter)

Comment 2

7 years ago
CCing to Henri Sivonen.
No HTML5 parser side issue? MIME wrongly passes "charset in meta" to HTML5 parser if reversed order?
All the test emails have
Content-Type: text/html; charset="Shift_JIS"
on the MIME container. Thunderbird needs to pass Shift_JIS (without quotes) to the HTML parser with charset source set to kCharsetFromChannel.

If you can verify that Thunderbird does this and it *still* doesn't work, then there's a parser bug. However, considering that the problem hasn't been reported in the HTTP context, I bet Thunderbird isn't passing Shift_JIS with kCharsetFromChannel to the HTML parser.

Moreover, the MIME code shouldn't be looking at <meta> *at all*. That's the HTML parser's job--but only when the caller passes kCharsetFromUserDefault (which it should do when the charset parameter in the MIME container is absent.).
(Reporter)

Comment 4

7 years ago
Henri Sivonen, thanks for your comment.
CCing to Zane U. Ji.
Is there any fault of MIME or inconsistency between MIME and HTML5 parser?
Why can "order of http-equiv and content in <meta>" be relevant to problem?
(Reporter)

Comment 5

7 years ago
FYI.
If multiple <meta>'s is put in <head> section, following occurs.
(A) Normal order of http-equiv and content in all <meta>
Content-Type: text/html; charset="Shift_JIS"
1-st <meta>: http-equiv -> content, charset=charset-01
2-st <meta>: http-equiv -> content, charset=charset-02
3-rd <meta>: http-equiv -> content, charset=charset-03
=> charset-02 is always applied to data converted to utf-8
(B) Order of http-equiv and content is reversed in first <meta> only
Content-Type: text/html; charset="Shift_JIS"
1-st <meta>: content -> http-equiv, charset=charset-01
2-st <meta>: http-equiv -> content, charset=charset-02
3-rd <meta>: http-equiv -> content, charset=charset-03
=> charset-01 is always applied to data converted to utf-8.
   Same phenomenon as this bug.
I'll open separate bug if it's required, for ease of diagnosis.
(Reporter)

Updated

7 years ago
Blocks: 571704
(Reporter)

Comment 6

7 years ago
FYI.
Above phenomenon/problem is with View/Message Body As/Original HTML.
If "Simple HTML", garbled disply is shown even for next simple/quoted-printable HTML other than utf-8.
> Content-Type: text/html; charset=Shift_JIS
> Content-Transfer-Encoding: quoted-printable
> 
> <html><head></head>
> <body>
> <p>=93=FA=96{=8C=EA</p>
> </body></html>
It's similar phenomenon but different problem of "Simple HTML" with quoted-printable text/html or quoted-printable(text/plain or text/html) in multipart.
(Reporter)

Updated

7 years ago
Whiteboard: [View/Message Body As/Original HTML case]
(Reporter)

Updated

7 years ago
Blocks: 600178
(Reporter)

Updated

7 years ago
Blocks: 598740
(Reporter)

Comment 7

7 years ago
FYI. Bug for "Simple HTML" case is currently bug 598740.
status-thunderbird3.2: --- → ?
(Reporter)

Comment 8

7 years ago
As for Thunderbird release builds, simplest solution of this kind of bug is;
  Default of html5.enable=false, until this kind of bugs will be resolved.
(Reporter)

Updated

7 years ago
Keywords: html5
(Reporter)

Updated

7 years ago
Blocks: 593894
(Reporter)

Comment 9

7 years ago
After fix of Bug 582788, phenomenon/problem of Bug 572886 was morphed to this bug. Setting dependency to Bug 572886 for ease of tracking and iagnosis.
Depends on: 572886
(Reporter)

Updated

7 years ago
Duplicate of this bug: 607071
blocking-thunderbird3.2: --- → ?
status-thunderbird3.2: ? → ---
Flags: blocking-thunderbird-next+
(Reporter)

Updated

7 years ago
No longer blocks: 593894
Duplicate of this bug: 593894
(Reporter)

Updated

7 years ago
No longer blocks: 600178
(Reporter)

Updated

7 years ago
Duplicate of this bug: 571704
(Reporter)

Updated

7 years ago
No longer blocks: 598740
blocking-thunderbird5.0: --- → needed
Flags: blocking-thunderbird-next+
(Assignee)

Comment 13

7 years ago
Created attachment 493779 [details] [diff] [review]
Tentative patch

Wada-San, could you please tell me if this patch fixes the issue for you? It does for me in bug 615183.
Attachment #493779 - Flags: review?(hsivonen)
(Assignee)

Updated

7 years ago
Duplicate of this bug: 615183
(Assignee)

Comment 15

7 years ago
David you might want to have a look into this as well.
(Reporter)

Comment 16

7 years ago
(In reply to comment #13)
> Tentative patch

It looks for me right thing, because of next in comment #3 by Henri Sivonen.
> I bet Thunderbird isn't passing Shift_JIS with kCharsetFromChannel to the HTML parser.
> Moreover, the MIME code shouldn't be looking at <meta> *at all*.
> That's the HTML parser's job

(Q1) Is it correctly done for next?
> That's the HTML parser's job
> --but only when the caller passes kCharsetFromUserDefault.
> (which it should do when the charset parameter in the MIME container is
absent.).

(Q2) Can you explain why no problem with (A) and problem occurs only with (B)? (A) <meta http-equiv="Content-Type" content="text/html; charset=...">
(B) <meta content="text/html; charset=..." http-equiv="Content-Type">
As current code is next, I guess Tb misses charset in meta tag when (B) then passes null or garbage.
> muDV->SetHintCharacterSetSource(kCharsetFromMetaTag);

Jonathan Protzenko, where can I obtain build in which your patch landed? I don't have environment to build. (before it, I do't know how to build :-) )
(Assignee)

Comment 17

7 years ago
I've pushed this patch to a try server, which means I should be able to point you to a build in a couple hours.

My theory is that libmime actually takes care of all the encoding issues and already outputs UTF8, so the HTML parser actually must ignore the <meta> tag at all costs. What supports my theory is the following line:

http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMessenger.cpp#394

but I'm definitely no expert.
(Reporter)

Comment 18

7 years ago
(In reply to comment #17)
> and already outputs UTF8,
> so the HTML parser actually must ignore the <meta> tag at all costs.

Questions.

Why Tb should pass to HTML(5) parser converte-to-utf-8 version of HTML by Tb?
- non base64/quoted-printable, charset specified in Content-Type: header
  HTML in "charset in Content-Type" exists in mail source.
- non base64/quoted-printable, no charset in Content-Type: header
  HTML of unknown charset exists in mail source.
- base64/quoted-printable, charset specified in Content-Type: header
  Before covert to utf-8 by Tb, Tb has to decode data by base64 or QP.
  Decoded data exists and is data in "charset in Content-Type:",
- base64/quoted-printable, charset is not specified in Content-Type: header
  Before covert to utf-8 by Tb, Tb has to decode data by base64 or QP.
  Decoded data exists and is data in unknow charset.
If "charset in Content-Type:" and data of above instead of converted-to-utf-8 version is passed to HTML(5) parser, I think HTML(5) parser and rendering engine will do every thing including "charset in <meta>" handling.
Is it right?

Correct "charset of Content-Type: header" handling and correct "charset of <meta>" handling by Tb is mandatory for resolving <img src="cid:...">, so utf-8-version should be passed to HTML(5) parser and rendering engine?
If so, "charset in <meta>" should be passed when case (B) to fake HTML(5) parser, reversed order of attributes in <meta>, as done when case (A) to fake HTML(5) parser, normal order of attributes in <meta>, which won't produce problem.
Is it right?
(Assignee)

Comment 19

7 years ago
Builds available here http://stage.mozilla.org/pub/mozilla.org/thunderbird/tryserver-builds/jonathan.protzenko@gmail.com-3c07a32669ff/
(Assignee)

Comment 20

7 years ago
I think the bottom line is libmime already does handle the conversion to UTF8, and many other parts of Thunderbird are relying on this feature. Plus, no one really knows libmime, and it's been working fine so far, so I don't see any reason for changing this.

But once again, I'm no expert.

The OSX64 build seems to be taking longer than I thought, I hope you're not running OSX. There were no test failures with that patch activated.

Comment 21

7 years ago
Testing with the tryserver build in winxp, this seems to fix my testcase here:
https://bugzilla.mozilla.org/attachment.cgi?id=450875
Comment on attachment 493779 [details] [diff] [review]
Tentative patch

I'm not really the right person to review mailnews code.

Today, I discovered that there was an actual bug in this area in the HTML5 parser: kCharsetFromMetaTag was treated incorrectly. (kCharsetFromChannel was treated correctly, though.) This will be fixed in bug 594730. This probably explains why the patch here works (if it does; I didn't test it).

Anyway, if the MIME container has the 
Content-Type: text/html; charset=foo
header, the MIME code should pass the base64-decoded or quoted-printable-decoded bytes to the parser with foo as the charset and kCharsetFromChannel as the source.

If the MIME container doesn't have charset=foo, the MIME code shouldn't be sniffing for <meta>. It should pass the base64-decoded or quoted-printable-decoded bytes to the parser with the user's default charset and kCharsetFromUserDefault as the source.
Attachment #493779 - Flags: review?(hsivonen)
(Reporter)

Comment 23

7 years ago
(In reply to comment #22)
> Today, I discovered that there was an actual bug in this area in the HTML5
> parser: kCharsetFromMetaTag was treated incorrectly. (kCharsetFromChannel was
> treated correctly, though.) This will be fixed in bug 594730.

Questions to Henri Sivonen.

As for this bug's problem, problem only when reversed order of attributes in <meta> for http-equiv="Content-Type", it sounds for me that cause of the problem is "html5 parser ignored charset specified in <meta>".
Henri Sivonen, is it right?

If so, the tentative patch by Jonathan Protzenko is not needed and this bug is fixed by your patch for html5 parser only.
Is it right?

Even if this bug is fixed by your patch for html5 parser only, the tentative patch by Jonathan Protzenko should be landed to avoid other unwanted problems?
(Reporter)

Comment 24

7 years ago
(In reply to comment #21)
> Testing with the tryserver build in winxp, this seems to fix my testcase here:
> https://bugzilla.mozilla.org/attachment.cgi?id=450875

I couldn't reproduce this bug's problem(both Original HTML and Simple HTML) with your tryserver build. I could't reproduce problem of bug 598740(no <meta>, quoted-printable, Simple HTML only issue) too.
(In reply to comment #23)
> As for this bug's problem, problem only when reversed order of attributes in
> <meta> for http-equiv="Content-Type", it sounds for me that cause of the
> problem is "html5 parser ignored charset specified in <meta>".
> Henri Sivonen, is it right?

Probably not exactly right, but that doesn't really matter for the answers below.
 
> If so, the tentative patch by Jonathan Protzenko is not needed and this bug is
> fixed by your patch for html5 parser only.
> Is it right?

Most likely.

> Even if this bug is fixed by your patch for html5 parser only, the tentative
> patch by Jonathan Protzenko should be landed to avoid other unwanted problems?

Yes, assuming that no one want to make the MIME library have the right layering. Comment 20 suggests that there's no interest in making the MIME library have the right layering.

So I'm OK with attachment 493779 [details] [diff] [review] landing, but I'm not a mailnews reviewer. Someone who is should review.
(Reporter)

Comment 26

7 years ago
(In reply to comment #21)
> Testing with the tryserver build in winxp, this seems to fix my testcase here:
> https://bugzilla.mozilla.org/attachment.cgi?id=450875

I couldn't reproduce problem with multiple <meta> tags in comment #5 too with your tryserver build.

(In reply to comment #25)
> Yes, assuming that no one want to make the MIME library have the right layering.

Henri, correct assumption is perhaps next :-)
  No one here can make the MIME library have the right layering.
Do we need to check with your patch for html5 parser before land of patch by Jonathan Protzenko?
(In reply to comment #26)
> Do we need to check with your patch for html5 parser before land of patch by
> Jonathan Protzenko?

No, attachment 493779 [details] [diff] [review] can land first.
(Assignee)

Comment 28

7 years ago
Comment on attachment 493779 [details] [diff] [review]
Tentative patch

Andrew, following our discussion on IRC, I'm requesting approval for this patch.

:hsivonen, it does appear that my theory is correct and that libmime already handles the character decoding and outputs UTF8. This means any charset notification that comes from meta tags was until recently ignored, and should continue to be ignored. I can understand that you should regard this as pretty scandalous ;-), but until we actually perform the big libmime rewrite we all dream of, we'll have to stick with that...
Attachment #493779 - Flags: review?(bugmail)
Comment on attachment 493779 [details] [diff] [review]
Tentative patch

I suggest a lot of baking on trunk will be in order for this...
Attachment #493779 - Flags: review?(bugmail) → review+
Standard8, for your general release driver scary breakage awareness, may I present this bug...
(Assignee)

Comment 31

7 years ago
Hey, I pushed to try and it didn't break anything, so it should all be fine, right? Right? ...
Sorry, I may have made that sound more concerning that I really feel that it is.  This just seems like one of those problem spaces where I could see there existing many ridiculously illegal messages and that any change in behaviour may result in things that did work but should not have worked no longer working and resulting in a lot of new bug reports or what not.

Comment 33

7 years ago
FWIW I ran on the tryserver build for quite some time for a large variety of multi-part related newsgroups postings with no bad effects.
Might I suggest though that ignoring the meta tag data be made subject to a hidden pref. Without that any future changes in the html5 parser would be difficult to evaluate.
(Assignee)

Comment 34

7 years ago
Joe, I quite didn't get what you mean in your last comment. Since this is the right thing to do anyway, we'll never want the html5 parser to take into account the meta tags, so I'm afraid I don't see the point of having a pref. Could you elaborate? :)
Keywords: checkin-needed

Comment 35

7 years ago
(In reply to comment #34)
> Joe, I quite didn't get what you mean in your last comment. Since this is the
> right thing to do anyway, we'll never want the html5 parser to take into
> account the meta tags, so I'm afraid I don't see the point of having a pref.
> Could you elaborate? :)

I'm probably missing something...but might not there be meta-data (new for html5, or older) that would be handled better in the new parser, rather than our libemime. "<meta http-equiv="refresh" content="x" />" for remote content in an iframe comes to mind.
And for the future don't web-apps embed info in meta-data.

Granted, not many folks try these things in TB.
FWIW, bug 594730 landed, so it would be worthwhile to test if it fixed the problem already.

Comment 37

7 years ago
Testing with Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101210 Thunderbird/3.3a2pre
My testcase in comment 21 seems to work fine now in the stock nightly build.
(Reporter)

Comment 38

7 years ago
(In reply to comment #36)
> bug 594730 landed, (snip)

Tested with next build.
> Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101210 Thunderbird/3.3a2pre
Problems with html5.enable=true of all next types was not reproduced.
(a) reversed attributes order in meta tag for charset
(b) multiple meta tags for charset
(c) View/Message Body As/Simple HTML and meta tag(s) for charset
(Assignee)

Comment 39

7 years ago
:asuth, :bienvenu: :Standard8, the fix in the html5 parser seems to be enough to restore the previous behavior. Shall we consider that my patch is not needed anymore or do you feel like it might be worth applying?
I would suggest that we add a mozmill test that streams some messages that had issues before that most recent core patch.  We do not land the new patch, but make sure the mozmill test references this bug.  At such time as we can produce a mozmill test case that requires the patch to pass, we land the patch.
Keywords: checkin-needed
(Assignee)

Comment 41

7 years ago
Ok, I'll take care of it, adding this on my to-do list. I recall :bienvenu saying that streaming messages through a docshell was not entirely scriptable, so I might end up doing this through mozmill, but I all cases, I don't think it should be too hard to write.
Assignee: nobody → jonathan.protzenko
Status: NEW → ASSIGNED
blocking-thunderbird3.2: ? → ---
Based on comments 36 - 39 it looks like the core change fixed this bug.

I do agree with Andrew that we should get a testcase for this, either in this bug, or in a follow-up bug (with this one marked as wontfix).
blocking-thunderbird5.0: needed → -

Comment 43

5 years ago
Hi folks,
FYI, this bug is manifesting again, and you don't even need html5.enable=true, as that setting doesn't even seem to exist in Thunderbird 20:
Mozilla/5.0 (X11; Linux i686; rv:20.0) Gecko/20100101 Thunderbird/20.0a1
I tried adding the html5.enable setting in about config, and tried it at both true and false, and that doesn't affect anything.
I can attach a few more test messages, but WADA's test messages work fine as well.

Should I report this as a separate bug, or can it be dealt with here?
HTML5 parser was always enabled far back.
Summary: With html5.enable=true, if order of http-equiv and content in <meta> is reversed(<meta content="...;charset=..." http-equiv="Content-Type">), charset in <meta> is applied to mail data coverted to utf-8. Thus, non-ascii characters are shown in garbled → If order of http-equiv and content in <meta> is reversed(<meta content="...;charset=..." http-equiv="Content-Type">), charset in <meta> is applied to mail data coverted to utf-8. Thus, non-ascii characters are shown in garbled
(Reporter)

Comment 45

5 years ago
(In reply to Rimas Kudelis from comment #43)
> but WADA's test messages work fine as well.

As clearly written in bug summary, and as seen in very very simple test mails, this bug is for problem, with additional condition written at Whiteboard: field, in test mails which I attached to this bug. This bug is never for generic "garbled display" phenomenon.
It indicates that your case is different from my case(==this bug's case).  

> Should I report this as a separate bug, or can it be dealt with here?

Open separate bug for your case, with attaching mail, with clear description about specific conditions in your case, please.

Comment 46

5 years ago
I'm not exactly following you.
The problem I referred to also depends on same conditions ("View → Message body as → Original HTML" and the order of attributes in Content-Type meta header reversed). The only difference from your case is that html5.enable doesn't matter anymore (presumably because that pref is now non-functional and html5 is the default).

I can report it as a separate bug, no problem, but I strongly suspect that it's still the same problem.
(Reporter)

Comment 47

5 years ago
(In reply to Rimas Kudelis from comment #46)
Bug summry implies "problem never occurs if order is not reversed".
See simplest test case I attached, please.
Do you mean "no problem occurs in your case too if order of parameter in meta tag is changed to normai"?

Comment 48

5 years ago
(In reply to WADA from comment #47)
> Bug summry implies "problem never occurs if order is not reversed".
> See simplest test case I attached, please.
> Do you mean "no problem occurs in your case too if order of parameter in
> meta tag is changed to normai"?

Yes, that's what I mean.

Just test your own simple messages with html5.enable unset in latest nightly, and you should see the problem manifest itself.
(Reporter)

Comment 49

5 years ago
I also could observe garbled display with test mail-02 in my test case what I attached to this bug in following trunk nightly.
> Application Build ID=20121210030329 
> Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20100101 Thunderbird/20.0a1.
You said "no problem with my test case, with Tb 20, in your test", didn't you?
Did your "work fine" mean "successfully reproduce problem/regression"?

I couldn't see problem in trunk of Application Build ID=20121009030549.
> Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/19.0 Thunderbird/19.0a1
No html5.enabled is shown by Config Editor too. 

Apparently regression in Tb 20. Change like proposed patch by Jonathan Protzenko may be needed in Tb 20's code. 
Rimas Kudelis, open new bug with referring to this bug and test case of this bug, please, because original problem of this bug with html5.enabled=true itself was resolved in Tb 3.0beta and problem of this bug didn't occur till Tb 19.
(Reporter)

Updated

5 years ago
Whiteboard: [View/Message Body As/Original HTML case] → [resolved in Tb 3 by fix of bug 594730] [View/Message Body As/Original HTML case]
(Reporter)

Comment 50

5 years ago
Rimas Kudelis, if possible, find regression window by binary search of trunk nightly build, please.
(Reporter)

Updated

5 years ago
See Also: → bug 594730

Comment 51

5 years ago
(In reply to WADA from comment #49)
> I also could observe garbled display with test mail-02 in my test case what
> I attached to this bug in following trunk nightly.
> > Application Build ID=20121210030329 
> > Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20100101 Thunderbird/20.0a1.
> You said "no problem with my test case, with Tb 20, in your test", didn't
> you?
> Did your "work fine" mean "successfully reproduce problem/regression"?

Yes, that's what I meant: they work fine as reproducable testcases. Sorry for my vagueness.

> 
> I couldn't see problem in trunk of Application Build ID=20121009030549.
> > Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/19.0 Thunderbird/19.0a1
> No html5.enabled is shown by Config Editor too. 
> 
> Apparently regression in Tb 20. Change like proposed patch by Jonathan
> Protzenko may be needed in Tb 20's code. 
> Rimas Kudelis, open new bug with referring to this bug and test case of this
> bug, please, because original problem of this bug with html5.enabled=true
> itself was resolved in Tb 3.0beta and problem of this bug didn't occur till
> Tb 19.

OK. Not sure about narrowing the window, but I'll report the bug in a few minutes.

By the way, if this bug is fixed, then why is it still open?
(Reporter)

Comment 52

5 years ago
(In reply to Rimas Kudelis from comment #51)
> By the way, if this bug is fixed, then why is it still open?

(a) Original problem of this bug incidentally disappeared by change of bug 594730 in HTML parset/HTML5 parser, but no one knows correct solution in Tb side to perfectly resolve issues in Tb.
(b) "Patch attached by Jonathan Protzenko is needed in Tb or not" is still unclear.
(c) Automated test for this bug is not available yet.
(Reporter)

Comment 53

5 years ago
I'll close this bug after you open new bug for regression in Tb 20, in order to avoid confusion.

Comment 54

5 years ago
Reported as bug 820335
(Reporter)

Comment 55

5 years ago
Changing to FIXED.
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Reporter)

Updated

5 years ago
Whiteboard: [resolved in Tb 3 by fix of bug 594730] [View/Message Body As/Original HTML case] → [resolved in Tb 3.3 by fix of bug 594730] [View/Message Body As/Original HTML case]
Target Milestone: --- → Thunderbird 3.3a2
(Assignee)

Updated

5 years ago
Duplicate of this bug: 820335
(Assignee)

Comment 57

5 years ago
So apparently bug 820335 has a test case that shows that the patch from this bug is actually needed, so I'm reopening.

What remains to be done:
- write a mozmill test or an xpcshell test that exercises the bug,
- commit something in the tree.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 58

5 years ago
WADA, if you feel like this should remain in bug 820335, feel free to undo my changes :), but please post a copy of the patch over there then.
Duplicate of this bug: 822641

Comment 60

5 years ago
Gee, if noone is going to prepare any tests, can we please have the patch committed into the repo? It would absolutely suck if we'd keep this bug manifesting just because nobody has time or is willing to prepare tests. Tests should be there to prevent breakage, not to prevent bugfixes from happening...

please!
When looking at nsIMarkupDocumentViewer for an unrelated reason, I noticed that there are some sad code in comm-central for attempting to make the HTML parser use UTF-8. See bug 829543.

If you have a component in mailnews that converts messages to UTF-8 before passing them to the HTML parser, the best way to make sure that the HTML parser consumes the data as UTF-8 is to start the stream with the UTF-8 BOM.

Comment 62

5 years ago
*bump*
Jonathan, is there anything that has to be done prior to checking your patch in?
OS: Windows XP → All
Hardware: x86 → All
(Assignee)

Comment 63

5 years ago
In order to checkin my patch, there need to be a test for it first.

However, since I read bug 829543, I believe that fixing bug 829543 is the right way to fix our problems here, and once bug 829543 is fixed, the present bug will be fixed as a side-effect.

So fixing 829543 is the right thing to do now.

Comment 64

5 years ago
Are you sure that actually fixing this (even if temporarily) by your patch is a no-no? It would be sad to have this bug pushed on users just because a patch that works is not the Right Way. I wonder if there's any timeframe for Bug 829543. Does your patch even prevent Bug 829543 from being fixed anyhow?

Updated

5 years ago
Duplicate of this bug: 843990

Updated

5 years ago
Blocks: 843600

Updated

5 years ago
No longer blocks: 843600
Duplicate of this bug: 843600

Updated

5 years ago
Duplicate of this bug: 845963

Updated

5 years ago
Duplicate of this bug: 845833

Updated

5 years ago
Duplicate of this bug: 845421
(In reply to Rimas Kudelis from comment #64)
> Does your patch even prevent Bug 829543 from being fixed anyhow?

Not technically. In terms of sense of urgency, maybe.

Given how the duplicates keep flowing in, I think it would make sense to land the patch here.
Whiteboard: [resolved in Tb 3.3 by fix of bug 594730] [View/Message Body As/Original HTML case] → [needs a mozmill test] [View/Message Body As/Original HTML case]

Comment 71

5 years ago
Henri, well landing the patch would be appreciated, at least on my side...

What surprises me is how this issue has been handled up until now. It's rather obvious that bug 829543 isn't being fixed as quickly as desired, and now that the sh*t hit the fan, those dupes flowing are anything but unexpected.

I would encourage to commit this fix to comm-aurora and comm-beta as well, cause it seems really lame to let a regression like that slip freely into release just because some tests are missing, that it seems nobody is willing to write anyway.

Please forgive my grumpiness. I know it's not exactly the best tone I could use in Bugzilla, but I'm a Thunderbird user myself, and I'm really annoyed by what has happened.

Updated

5 years ago
Duplicate of this bug: 852984

Comment 73

5 years ago
So, seems we should land this, even if it turns out to be just a temp fix.
Flags: needinfo?(jonathan.protzenko)
(Assignee)

Comment 74

5 years ago
Yes, I think we should land this, even though I don't even have time to write tests for this. I didn't land it because I'd like to have confirmation from someone else (e.g. standard8) that landing this without tests is the right call.
Flags: needinfo?(jonathan.protzenko) → needinfo?(mbanner)
There is a student who has expressed interest in writing tests for this.
Joshua said over irc he'll look at moving this forward.
Flags: needinfo?(mbanner) → needinfo?(Pidgeot18)

Comment 77

5 years ago
Created attachment 728241 [details]
testcase, contains two mails, one with display problems and one correct

Hi,

here is a testcase (mailfolder called "Test" containing 2 mails which show the difference).

Best regards
J.
Duplicate of this bug: 851488
See Also: → bug 855329

Updated

5 years ago
Attachment #728241 - Attachment description: testcase, contains to mails, one with display problems and one correct → testcase, contains two mails, one with display problems and one correct
Duplicate of this bug: 857992
Created attachment 733720 [details] [diff] [review]
tests v1

The test compares the textContent of body tag for the .eml files.
It may be not the best solution. :)

Two tests included
- one for new line delimited "charset" attribute (see Bug 822641 Comment #2)
- and the other one for reverse ordered attributes

Interestingly the reverse ordered test passes.

Forgot to test with the fix applied. :/
Attachment #733720 - Flags: review?(mconley)
(In reply to Szabolcs Hubai (:xabolcs) from comment #80)
> Created attachment 733720 [details] [diff] [review]
> tests v1

/me smells the start of a test suite for comparing email rendering... exactly what I'll need down the line for the jsmime stuff. :-)

Running this test with and without the patch to see what happens.
Flags: needinfo?(Pidgeot18)
Comment on attachment 733720 [details] [diff] [review]
tests v1

One more question about those fancy characters.
Should I, could I check in those *.eml files as a real (with bom) utf-8 file?
Does mercrurial (and diff, and bugzilla) understand "bommed" files?

BTW I would be even more happy if I could add the japanese test messages too from WADA's attachement [1].


[1] https://bugzilla.mozilla.org/attachment.cgi?id=473359
(In reply to Szabolcs Hubai (:xabolcs) from comment #82)
> Comment on attachment 733720 [details] [diff] [review]
> tests v1
> 
> One more question about those fancy characters.
> Should I, could I check in those *.eml files as a real (with bom) utf-8 file?
> Does mercrurial (and diff, and bugzilla) understand "bommed" files?

The BOM doesn't go over well with email (and it's also frowned upon by Unicode people). A safer alternative is to encode the problematic characters with, say, quoted printable (a test case I use locally to verify the patch was using =91, for example).

Updated

4 years ago
Duplicate of this bug: 857974

Updated

4 years ago
Duplicate of this bug: 858982
Comment on attachment 733720 [details] [diff] [review]
tests v1

Building with c-c qparent: 1d584cce7616, m-c:97cfc16ba5dc (due to bug 858014) ...
... without tentative patch [1] applied, tests fails;
... with tentative patch applied, tests passes. :)


BTW I'd to do some experiments with messages containing quoted printable characters.



[1] https://bugzilla.mozilla.org/attachment.cgi?id=493779
Comment on attachment 733720 [details] [diff] [review]
tests v1

Review of attachment 733720 [details] [diff] [review]:
-----------------------------------------------------------------

Hey xabolcs, these are *fantastic* looking tests! Thanks so much for doing this!

Just a few nits below. I haven't had a chance to run these tests yet, but once you update the patch, I'll happily push to try to see how they perform.

Great work,

-Mike

::: mail/test/mozmill/message-reader/test-bug594646.js
@@ +1,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> +  * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> +  * You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +/**

Format block comments like:

/**
 * Blah blah..
 * blah blah blah..
 */

@@ +12,5 @@
> +const RELATIVE_ROOT = "../shared-modules";
> +const MODULE_REQUIRES = ["folder-display-helpers"];
> +
> +function setupModule(module) {
> +   collector.getModule("folder-display-helpers").installInto(module);

It looks like you're using 3 space indentation. Please use 2 instead.

@@ +13,5 @@
> +const MODULE_REQUIRES = ["folder-display-helpers"];
> +
> +function setupModule(module) {
> +   collector.getModule("folder-display-helpers").installInto(module);
> +   collector.getModule("window-helpers").installInto(module);

If this module uses window-helpers, then window-helpers should be added to MODULE_REQUIRES.

@@ +23,5 @@
> +
> +   // Be sure to view message body as Original HTML
> +   msgc.window.MsgBodyAllowHTML();
> +
> +   let textContent = msgc.window.msgWindow.messageWindowDocShell

For long chains like this, we tend to do something like:

let textContent = msgc.window
                      .msgWindow
                      .messageWindowDocShell
                      .contentViewer
                      .DOMDocument
                      .documentElement
                      .textContent;

@@ +30,5 @@
> +   close_window(msgc);
> +   return textContent;
> +}
> +
> +/**

Indent one more space.

@@ +39,5 @@
> +   let textContentB = extract_eml_body_textcontent(emlB);
> +   assert_equals(textContentA, textContentB);
> +}
> +
> +/**

Indent one more space.

@@ +49,5 @@
> +function test_original_html_characters_head_meta_content_charset_httpEq() {
> +   check_eml_textcontent("./bug594646_reference.eml", "./bug594646_reversed_order.eml");
> +}
> +
> +/**

Indent one more space.
Attachment #733720 - Flags: review?(mconley) → feedback+
(In reply to Mike Conley (:mconley) from comment #87)
> Comment on attachment 733720 [details] [diff] [review]
> tests v1
> 
> Review of attachment 733720 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Hey xabolcs, these are *fantastic* looking tests! Thanks so much for doing
> this!

Feedback like this makes me happy! :)

> 
> Just a few nits below. I haven't had a chance to run these tests yet, but
> once you update the patch, I'll happily push to try to see how they perform.
> 

I manually tested QP messages, and they are also garbled in original HTML mode.

To increase compatibility and portablility I'd like to "convert" the already
included *.emls into quoted printable format.
Should I keep the 8-bit encoded messages, and extend the test coverage with QP encoding,
or may I include QP encoded messages only?

> Great work,
> 
> -Mike
> 


Thanks!
(In reply to Szabolcs Hubai (:xabolcs) from comment #88)
> (In reply to Mike Conley (:mconley) from comment #87)
> > Comment on attachment 733720 [details] [diff] [review]
> > tests v1
> > 
> > Review of attachment 733720 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Hey xabolcs, these are *fantastic* looking tests! Thanks so much for doing
> > this!
> 
> Feedback like this makes me happy! :)
> 
> > 
> > Just a few nits below. I haven't had a chance to run these tests yet, but
> > once you update the patch, I'll happily push to try to see how they perform.
> > 
> 
> I manually tested QP messages, and they are also garbled in original HTML
> mode.
> 
> To increase compatibility and portablility I'd like to "convert" the already
> included *.emls into quoted printable format.
> Should I keep the 8-bit encoded messages, and extend the test coverage with
> QP encoding,
> or may I include QP encoded messages only?

I don't think I know enough about the problem domain to really say with any certainty. I just know how to write Mozmill tests. :)

Maybe jcranmer has a position on this?
Flags: needinfo?(Pidgeot18)

Updated

4 years ago
Duplicate of this bug: 861129
I tend to prefer QP-encoded/B64-encoded messages in tests to 8-bit messages, simply because they interact better with software. Our mime subsystem will anyways decode quoted-printable/base64 to Unicode before passing it off anyways, so QP encoding really oughtn't have an effect.
Flags: needinfo?(Pidgeot18)
Created attachment 737305 [details] [diff] [review]
tests v2

Addressed nits:
- 2 space indent
- block comments
- MODULE_REQUIRES
- textContext long chain

"one more space indent" doesn't applied yet.
Are you sure that is what you want?

Text context of the reference message was promoted to a global variable,
and function check_eml_textcontent() was modified to take only one eml,
and compare that one to the reference.

One more test was added - http-equiv attribute is in new line.

Messages was converted to QP encoding.
Interestingly messages in QP encoding shows different results from 8 bit messages.
Therefore I ask only for feedback.

jcranmer! It seems like there is a little effect due to encoding.
I'll include the 8 bit encoded messages in the next version in the next days.
Are there any bug about this difference?
Attachment #733720 - Attachment is obsolete: true
Attachment #737305 - Flags: feedback?(mconley)
Flags: needinfo?(Pidgeot18)
Comment on attachment 737305 [details] [diff] [review]
tests v2

Review of attachment 737305 [details] [diff] [review]:
-----------------------------------------------------------------

Looking good!

::: mail/test/mozmill/message-reader/test-bug594646.js
@@ +13,5 @@
> +
> +const RELATIVE_ROOT = "../shared-modules";
> +const MODULE_REQUIRES = ["folder-display-helpers", "window-helpers"];
> +
> +let referenceTextContent;

Globals should be prefixed with a g, so let gReferenceTextContent.
Attachment #737305 - Flags: feedback?(mconley) → feedback+
(In reply to Szabolcs Hubai (:xabolcs) from comment #92)
> jcranmer! It seems like there is a little effect due to encoding.
> I'll include the 8 bit encoded messages in the next version in the next days.
> Are there any bug about this difference?

Can you be more specific about what the difference is?
Flags: needinfo?(Pidgeot18)
Created attachment 737850 [details]
8bit and qp encoded messages

(In reply to Joshua Cranmer [:jcranmer] from comment #94)
> (In reply to Szabolcs Hubai (:xabolcs) from comment #92)
> > jcranmer! It seems like there is a little effect due to encoding.
> > I'll include the 8 bit encoded messages in the next version in the next days.
> > Are there any bug about this difference?
> 
> Can you be more specific about what the difference is?

Check zipped emls with affected builds! (I hope my QP messages are valid.)
* test_newline_charset_8bit.eml - mojibaked
* test_newline_charset_qp.eml - OK

* test_newline_httpequiv_8bit.eml - OK
* test_newline_httpequiv_qp.eml - mojibaked

* test_reverse_order_8bit.eml - mojibaked
* test_reverse_order_qp.eml - mojibaked


I tought you say, 8bit and QP encodings should provide the same test results,
but with the attached emls they provide different results.

Are there any bug about this difference?
Or this is a feature? ;)
(In reply to Szabolcs Hubai (:xabolcs) from comment #95)
> Are there any bug about this difference?
> Or this is a feature? ;)

I'm only mildly surprised at there being a difference, but I don't think it's worth filing a bug. If you have tests to see if any of them are mojibaked they can be fixed all at once.
Created attachment 738825 [details] [diff] [review]
tests v3

- added back the 8 bit encoded messages
- addressed gReferenceTextContent nit


And the testrun. :)

> INFO Passed: 5
> INFO Failed: 0
> INFO Skipped: 0
> SUMMARY-PASS | test-bug594646.js::setupModule
> SUMMARY-PASS | test-bug594646.js::test_original_html_characters_head_meta_content_charset_httpEq
> SUMMARY-PASS | test-bug594646.js::test_original_html_characters_head_meta_httpEq_content_newline_charset
> SUMMARY-PASS | test-bug594646.js::test_original_html_characters_head_meta_content_charset_newline_httpEq
> SUMMARY-PASS | test-bug594646.js::teardownModule
Attachment #737305 - Attachment is obsolete: true
Attachment #738825 - Flags: review?(mconley)
(Reporter)

Comment 98

4 years ago
(In reply to Szabolcs Hubai (:xabolcs) from comment #95)
> * test_newline_charset_8bit.eml - mojibaked
> * test_newline_charset_qp.eml - OK
> * test_newline_httpequiv_8bit.eml - OK
> * test_newline_httpequiv_qp.eml - mojibaked
> * test_reverse_order_8bit.eml - mojibaked
> * test_reverse_order_qp.eml - mojibaked
> Are there any bug about this difference?
> Or this is a feature? ;)

Quoted-printable case is bug 571704 and bug 598740.
- bug 571704 : Orginal HTML, quoted-printable, meta : reversed order
               I closed as dup of this bug.
- bug 598740 : Simple HTML, any of quoted-printable and base64,
               meta : any order
(Reporter)

Comment 99

4 years ago
FYI. Relevant conditions:
(1) View/Message Body As : (1-1) Original HTML, (1-2) Simple HTML
(2) Content-Type charset : (2-1) UTF-8,         (2-2) Non UTF-8
(3) C-T-Encoding : (3-1) 8/7bits, (3-2) quoted-printable, (3-3) base64
(4) <meta> tag           : (4-1) Normal order,  (4-2) Reversed order
(5) <meta> charset       : (5-1) UTF-8,         (5-2) Non UTF-8
(6) Number of <meta>     : (6-1) one <meta>,    (6-2) multiple <meta>
    if multi-meta, many variations in <meta> order and <meta> charset.
(7) Tb version :
    (7-1) Bug 594646 occured initially,
    (7-2) Bug 594646 disappeared once, after change in HTML parser
    (7-3) Bug 594646 came back
    (7-4) Patch for Bug 594646 by Jonathan Protzenko is applied
Duplicate of this bug: 857858
Comment on attachment 738825 [details] [diff] [review]
tests v3

FTR.

[21:20] xabolcs mconley: wada in comment #99 wrote a lot about combinations about buggy meta tags
[21:20] xabolcs mconley: how much of them should i add into the testcase?
[21:21] mconley xabolcs: oh my
[21:21] mconley xabolcs: that's a lot of cases
[21:21] mconley xabolcs: we probably shouldn't block landing bug 594646 for all of those
[21:21] xabolcs mconley: yeah
[21:21] mconley xabolcs: I'd rather we deal with them in follow-ups.

<snip>

[22:11] xabolcs mconley: well, i try to add one or two more test email to the testcase and update the patch
[22:11] mconley xabolcs: sorry - didn't notice you'd responded (include my name next time in case I'm unresponsive, as I may have moved channels)
[22:11] xabolcs mconley: and attach to the bug
[22:11] mconley xabolcs: I think that's a fine strategy
[22:11] xabolcs mconley: ok, will do
Attachment #738825 - Flags: review?(mconley)

Updated

4 years ago
Duplicate of this bug: 870301
(In reply to WADA from comment #99)
> <snip>
> FYI. Relevant conditions:
> (6) Number of <meta>     : (6-1) one <meta>,    (6-2) multiple <meta>
>     if multi-meta, many variations in <meta> order and <meta> charset.
> <snip>

FYI this fix doesn't fix emails with multiple meta tags (case (6-2), "see also" bug 855329).

protz, jcranmer: should it fix?
Flags: needinfo?(jonathan.protzenko)
Flags: needinfo?(Pidgeot18)
(In reply to Szabolcs Hubai (:xabolcs) from comment #103)
> (In reply to WADA from comment #99)
> > <snip>
> > FYI. Relevant conditions:
> > (6) Number of <meta>     : (6-1) one <meta>,    (6-2) multiple <meta>
> >     if multi-meta, many variations in <meta> order and <meta> charset.
> > <snip>
> 
> FYI this fix doesn't fix emails with multiple meta tags (case (6-2), "see
> also" bug 855329).
> 
> protz, jcranmer: should it fix?

It's best to keep that to a separate bug, especially since it's a less common issue.
Flags: needinfo?(jonathan.protzenko)
Flags: needinfo?(Pidgeot18)
(In reply to Joshua Cranmer [:jcranmer] from comment #104)
> (In reply to Szabolcs Hubai (:xabolcs) from comment #103)
> > 
> > FYI this fix doesn't fix emails with multiple meta tags (case (6-2), "see
> > also" bug 855329).
> > 
> > protz, jcranmer: should it fix?
> 
> It's best to keep that to a separate bug, especially since it's a less
> common issue.

Thanks!
Will hack some base64 encoded mail and update the test patch today.
Created attachment 753208 [details] [diff] [review]
tests v4

Added base64 encoded messages.


Tested as
> make SOLO_TEST=message-reader/test-bug594646.js  mozmill-one

Interestingly I got 1 TEST-UNEXPECTED-FAIL with c-c: fd449e55a4d3 m-c: c21ef3664c67.
See below!

> INFO Passed: 5
> INFO Failed: 1
> INFO Skipped: 0
> SUMMARY-UNEXPECTED-FAIL | test-nntp-helpers.js | test-nntp-helpers.js::<TOP_LEVEL>
>   EXCEPTION: MailServices is not defined
>     at: nonesuch line 129
> SUMMARY-PASS | test-bug594646.js::setupModule
> SUMMARY-PASS | test-bug594646.js::test_original_html_characters_head_meta_content_charset_httpEq
> SUMMARY-PASS | test-bug594646.js::test_original_html_characters_head_meta_httpEq_content_newline_charset
> SUMMARY-PASS | test-bug594646.js::test_original_html_characters_head_meta_content_charset_newline_httpEq
> SUMMARY-PASS | test-bug594646.js::teardownModule
> make: *** [mozmill-one] Error 1
Attachment #738825 - Attachment is obsolete: true
Attachment #753208 - Flags: review?(mconley)

Comment 107

4 years ago
That's also discussed in bug 99019.

Comment 108

4 years ago
Err, bug 874690.
mconley: ping.

(In reply to Szabolcs Hubai (:xabolcs) from comment #106)
> Created attachment 753208 [details] [diff] [review]
> tests v4
> 
> Added base64 encoded messages.
> 
>
(Reporter)

Comment 110

4 years ago
Question to developers:
What is reason why "Tentative patch by Jonathan Protzenko"(one line change) is still not landed even on trunk, even though status = review++ ?
Automated test is mandatory to land and ship the one-line patch?
"not land the patch" is needed to complete auto-test implmentation?
(Assignee)

Comment 111

4 years ago
Last time I checked, a test was mandatory to check that in, and I don't have any time to write such a test. I suggest, with the upcoming release of Thunderbird 24, that we check the patch in without tests.

Mark, would you be ok with giving me a green light for committing this as an interim fix? I feel like the issue stings users pretty hard, and we wouldn't want this to be in the next stable release...
(Assignee)

Comment 112

4 years ago
WADA: re-reading the bug, I think we just need to ping mconley for the test review. The test is already written, we just need it to be reviewed.

Mark, could we set this bug as blocking-thunderbird24?
Flags: needinfo?(mconley)
Flags: needinfo?(mbanner)
I had the patch in my queue ready for imminent landing, I was just waiting for the test to get reviewed, which has gone through far more iterations than I was expecting.
Comment on attachment 753208 [details] [diff] [review]
tests v4

Review of attachment 753208 [details] [diff] [review]:
-----------------------------------------------------------------

An excellently written test. Thanks Szabolcs!
Attachment #753208 - Flags: review?(mconley) → review+
Flags: needinfo?(mconley)
https://hg.mozilla.org/comm-central/rev/706c1e8df227
https://hg.mozilla.org/comm-central/rev/4f4be8d2de81
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago4 years ago
Flags: needinfo?(mbanner)
Resolution: --- → FIXED
Target Milestone: Thunderbird 3.3a2 → Thunderbird 24.0
(Reporter)

Updated

4 years ago
Duplicate of this bug: 884973
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:25.0) Gecko/20100101 Firefox/25.0 SeaMonkey/2.22a1 ID:20130625003001 c-c:dd35e67c6af4 m-c:bc569033125a

I VERIFY that HTML emails where accented Latin letters were corrupt before because of this bug, appear now all right, even with View → Message Body As → Original HTML.

If someone from the Thunderbird QA team can do the same check on his/her side, the Status of this bug can be changed to VERIFIED.
Target Milestone: Thunderbird 24.0 → ---
Version: Trunk → Other Branch
... and I didn't request a Version or Milestone change: :-?
Target Milestone: --- → Thunderbird 24.0
Version: Other Branch → Trunk
Blocks: 889656
Duplicate of this bug: 892914
You need to log in before you can comment on or make changes to this bug.