Closed
Bug 572886
Opened 15 years ago
Closed 15 years ago
Problems display base64 text/html message (With html5.enable=true set by default, if charset=utf-16 is specified in <meta http-equiv> of non-utf-16 text/html, internal reload is invoked, and if text/html is base64 encoded, HTML is rendered in utf-16)
Categories
(MailNews Core :: MIME, defect)
MailNews Core
MIME
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: iannbugzilla, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: ["Internal Reload due to meta charset" was fixed by Bug 582788])
Attachments
(6 files)
For some reason SeaMonkey cannot display a message which has the following settings:
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: base64
Attachment #452105 -
Attachment mime type: application/x-mimearchive → text/plain
Comment 1•15 years ago
|
||
WOKSFORME with Sm 2.0.2 and Tb 3.0.4 on MS Win-XP.
With which build of SeaMonkey?
What phenomenon do yo mean by "cannot display a message"?
(In reply to comment #1)
> WOKSFORME with Sm 2.0.2 and Tb 3.0.4 on MS Win-XP.
> With which build of SeaMonkey?
> What phenomenon do yo mean by "cannot display a message"?
Happens on trunk so both SM 2.1a2pre and TB 3.2a1pre
Component: MailNews: Message Display → Backend
Product: SeaMonkey → MailNews Core
QA Contact: message-display → backend
Comment 4•15 years ago
|
||
(In reply to comment #2)
> Screenshot demonstrating issue
Some characters looks shown in Tamil font.
What is your Character Encoding related settings?
- Selected View/Character Encoding when you saw the screen shot.
- Folder Properties/General Informtion, your charset related settings.
- Edit/Prferences/Appearance/Fonts, Your font choice for all of "Fonts for:".
Especially for "Other Languages" which for Unicode.
(In reply to comment #4)
> (In reply to comment #2)
> > Screenshot demonstrating issue
>
> Some characters looks shown in Tamil font.
As far as I am aware the actual email, once decoded out of base64, is just English and HTML.
> What is your Character Encoding related settings?
> - Selected View/Character Encoding when you saw the screen shot.
Unicode (UTF-8)
> - Folder Properties/General Informtion, your charset related settings.
Default Character Encoding is Western (ISO-8859-1)
> - Edit/Prferences/Appearance/Fonts, Your font choice for all of "Fonts for:".
> Especially for "Other Languages" which for Unicode.
Proportional - Serif
Serif/Cursive/Fantasy - serif
Sans-serif - san-serif
Monospace - monospace
Comment 6•15 years ago
|
||
(In reply to comment #5)
Do you mean next is shown?
> Fonts for: Other Languages
> Typeface
> Proportional: Serif
> Serif: serif
> Sans-serif: san-serif
> Cursive: serif
> Fantasy: serif
> Monospace: monospace
If so, do you really have font named serif or sans-serif or monospace?
Comment 7•15 years ago
|
||
I could see same result with next Tb 3.2a1 build, with profile which was used by Tb 3.0(no problem with Tb 3.0, font setting correctly points required fonts).
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a6pre) Gecko/20100617 Shredder/3.2a1pre
Following lines are written in text/html part.
> <html>
> <head>
> <META http-equiv="Content-Type" content="text/html; charset=utf-16">
> <title>npower</title>
> <meta http-equiv="content-type" content="text/html; charset=utf-8">
> <meta http-equiv="Content-Language" content="EN-GB">
> </head>
A new friend of Bug 227360, Bug 367240, Bug 378008, Bug 505072.
See also Bug 506504.
Problem occurs with the decoded HTML text.
Bug 505072 was regressed? Or problem with multiple <meta http-equiv>?
Comment 8•15 years ago
|
||
Comment 9•15 years ago
|
||
The special character is next. If attached HTML text is viewed in GB2312 by browser, it's shown like next.
> <td>聽</td>
> <p>聽</p>
Comment 10•15 years ago
|
||
"The" should have been "A"...
If simple text/html mail, problem due to <meta> doesn't occur. So Bug 505072 is still fixed.
If <meta> tags are removed, proble doesn't occur. So, it's multipart/alternative version of Bug 505072.
Note:
In check of simple text/html case, phenomenon of Bug 506504 was observed. It may be caused by detection of the special character(probably it was not properly converted to utf-8 nor utf-16 from initial binary for EN-GB).
Comment 11•15 years ago
|
||
Binay of next part.
> < t d > 聽 < / t d >
> 3C74643E C2A0 3C2F74643E
Comment 12•15 years ago
|
||
CC-ing Zane U. Ji who is patch creator of Bug 505072.
Different change seems required for a part in multipart/xxx.
Updated•15 years ago
|
Summary: Problems display base64 text/html message → Problems display base64 text/html message (charset in <meta http-equiv> is used instead of charset in Content-Type: header upon rendering of text/html part in multipart/alternative)
Comment 13•15 years ago
|
||
Bug 528736 is for "charset in Content-Type: == charset in <meta>" of text/htmp part in multipart/xxx mail, and is for problem of "reloading" only.
Phenomenon of Bug 506504 I saw with simple text/html looks Bug 528736.
(Case-1)
charset in Content-Type: != charset in <meta>, text/htmp part inmultipart
Bug 572886(this bug) occurrs.
(Case-2)
charset in Content-Type: == charset in <meta>, text/htmp part inmultipart
Bug 528736 occurrs(reload by <meta> with charset), then phenomenon of Bug 506504 happens.
(Case-3)
Even if normal mail, Bug 506504 occurs, when View -> Character Encoding -> Auto Detect is touched.
Comment 14•15 years ago
|
||
FYI.
Main cause of "Bug 528736 then Bug 506504" with simple text/html was "multiple meta/http-equiv/charset".
Comment 15•15 years ago
|
||
This bug doesn't occur with html5.enable=false.
Next Problem I saw also disappered by html5.enable=false.
"Bug 528736 then Bug 506504" occurs even with simple text/html
when <meta http-equiv="content-type" ...> of charset which is different from
charset specified in Content-Type: header.
As purpose of Thunderbird is never testing oh HTML5, default of html5.enable=false is better for Thunderbird until implementation of HTML5 support will be completed.
Updated•15 years ago
|
Summary: Problems display base64 text/html message (charset in <meta http-equiv> is used instead of charset in Content-Type: header upon rendering of text/html part in multipart/alternative) → Problems display base64 text/html message (with html5.enable=true set by default, charset in <meta http-equiv> is used instead of charset in Content-Type: header upon rendering of text/html part in multipart/alternative)
Comment 16•15 years ago
|
||
Can different html5.enabled be used for Browser and Mail&News of SeaMonkey?
Comment 17•15 years ago
|
||
When I changed base64 encoded part of text/html to non-encoded part(use script to avoid issue by wrong binaary), original problem of attached screen shot was not observed(with html5.enable=true).
Observed phenomenon was;
internal reload occurs, then same phenomenon as Bug 506504 happens.
It looks base64 encoded part or encoded part only issue.
Comment 18•15 years ago
|
||
My humble guess is that "Content-Transfer-Encoding: base64" is ignored. It seems that MimeInlineTextHTML_parse_begin should call MimeLeaf_parse_begin.
Comment 19•15 years ago
|
||
With html5.enable=true.
# : "Bug 528736 then Bug 506504" occurs
X : Same display as attached screen shot by original mail
Y : Similar to attached screen shot, forced by Content-Type:charset=utf-16
O : Displayed as expected
# mail-A-1 : plain text/html, meta-charset=utf-16/gb2312
X mail-A-2 : base64 text/html, meta-charset=utf-16/gb2312
# mail-B-1 : plain text/html, meta-charset=gb2312/utf-16
X mail-B-2 : base64 text/html, meta-charset=gb2312/utf-16
O mail-C-1 : plain text/html, meta-charset=windows-1252/gb2312
O mail-C-2 : base64 text/html, meta-charset=windows-1252/gb2312
O mail-D-1 : plain text/html, meta-charset=gb2312/windows-1252
O mail-D-2 : base64 text/html, meta-charset=gb2312/windows-1252
Y mail-X : plain text/html, no meta-charset, Content-Type: charset=utf-16
# mail-Y : multipart/mixed, plain text/html, meta-charset=utf-16/gb2312
html5.enable=true and meta-charset=utf-16 seems to generate problem of Bug 528736(then phenomenon of Bug 506504 happens).
It looks that both of "charset=utf-16 in meta tag" and "base64 encoding" is required to see rendering in utf-16.
Multipart seems irrelevant to problem.
Wrong binary in original mail seems irrelevant.
Updated•15 years ago
|
Summary: Problems display base64 text/html message (with html5.enable=true set by default, charset in <meta http-equiv> is used instead of charset in Content-Type: header upon rendering of text/html part in multipart/alternative) → Problems display base64 text/html message (With html5.enable=true set by default, if charset=utf-16 is specified in <meta http-equiv> of non-utf-16 text/html, internal reload is invoked, and if text/html is base64 encoded, HTML is rendered in utf-16)
Comment 20•15 years ago
|
||
NSPR log for load of next mails.
offset=0 # mail-A-1 : plain text/html, meta-charset=utf-16/gb2312
offset=815 X mail-A-2 : base64 text/html, meta-charset=utf-16/gb2312
offset=3548 O mail-C-1 : plain text/html, meta-charset=windows-1252/gb2312
offset=4375 O mail-C-2 : base64 text/html, meta-charset=windows-1252/gb2312
InternalLoad is executed twice for mail-A-1(number=0) and mail-A-2(number=815).
Comment 21•15 years ago
|
||
Don't assume the final data type is the same as that of the origin ones.
Attachment #453373 -
Flags: review?(cbiesinger)
Updated•15 years ago
|
Component: Backend → HTML: Parser
Product: MailNews Core → Core
QA Contact: backend → parser
Comment 22•15 years ago
|
||
Changed to Component=HTML:Parse per Bug 528736 Comment #42.
Comment 23•15 years ago
|
||
Does this problem occur if you change the type of the MIME part from
Content-Type: text/html; charset="utf-8"
to
Content-Type: text/html; charset=utf-8
?
What charset and charset source is passed to the HTML parser?
If this were a parser bug, attachment 452423 [details] should show the problem over HTTP, shouldn't it?
Comment 24•15 years ago
|
||
(In reply to comment #23)
> Does this problem occur if you change the type of the MIME part from
> Content-Type: text/html; charset="utf-8"
> to
> Content-Type: text/html; charset=utf-8
> ?
Henri Sivonen (:hsivonen), what is reason why you suspect problem due to differenece between with charset="utf-8" and charset=utf-8? (difference between charset quoted with " and not quoted with ")?
Have you checked by yourself using your hand and brain?
If so, what is difference from my test result(report of this bug) from your test result?
Comment 25•15 years ago
|
||
(In reply to comment #24)
> (In reply to comment #23)
> > Does this problem occur if you change the type of the MIME part from
> > Content-Type: text/html; charset="utf-8"
> > to
> > Content-Type: text/html; charset=utf-8
> > ?
>
> Henri Sivonen (:hsivonen), what is reason why you suspect problem due to
> differenece between with charset="utf-8" and charset=utf-8? (difference between
> charset quoted with " and not quoted with ")?
I'm suspecting that the parser isn't initialized properly with external encoding information. Having heard an IRC remark that Gecko has (had?) a bug in this area, I thought one possibility might be that the MIME wrapper is tripping over the quotes.
> Have you checked by yourself using your hand and brain?
No. I don't have SeaMonkey or Thunderbird builds set up, so I was hoping someone who does would take the time to check what charset and charset source is passed to the HTML5 parser.
Since so far, there have been no reported bugs in this area in the HTTP context and since the test case work right over HTTP, I'm inclined to suspect the bug is in the MIME wrapper code and not in the parser.
Comment 26•15 years ago
|
||
(In reply to comment #25)
> I'm inclined to suspect the bug is in the MIME wrapper code and not in the parser.
Please note that Internal Reload due to charset in <meta> is done by HTML parser instead of MIME code, and it happens only when html5.enable=true. Have read thru Bug 528736?
Comment 27•15 years ago
|
||
(In reply to comment #26)
> (In reply to comment #25)
> > I'm inclined to suspect the bug is in the MIME wrapper code and not in the parser.
>
> Please note that Internal Reload due to charset in <meta> is done by HTML
> parser instead of MIME code, and it happens only when html5.enable=true.
If the MIME wrapper set the charset source correctly, the parser shouldn't do any <meta> stuff at all.
> Have read thru Bug 528736?
I have now, and from the comments it sure looks like the bug is that the MIME wrapper code fails to set the charset source properly to kCharsetFromChannel.
Updated•15 years ago
|
Attachment #452423 -
Attachment mime type: text/plain; charset="utf-8" → text/html; charset="utf-8"
Updated•15 years ago
|
Attachment #452423 -
Attachment mime type: text/html; charset="utf-8" → text/plain; charset="utf-8"
Comment 28•15 years ago
|
||
(In reply to comment #23)
> What charset and charset source is passed to the HTML parser?
> If this were a parser bug, attachment 452423 [details] should show the problem over HTTP, shouldn't it?
(In reply to comment #27)
> the MIME wrapper code fails to set the charset source properly to kCharsetFromChannel.
Next understanding is right?
As HTTP GET is not issued when HTML mail, MIME wrapper should pass charset information in Content-Type: header to HTML parser and/or HTTP channel via kCharsetFromChannel, but MIME wrapper fails to pass it correctly.
Comment 29•15 years ago
|
||
As my patch indicates, the problem is we use a wrong content type when we reload the data. The real type is MIME, yet we ask it to be loaded as HTML.
It happens only when html5.enable=true because when html5.enable is false, the data is reloaded from history WITHOUT a type. If you check the constructor of nsSHEntry, you'll notice that there is a comment "// XXX why not copy mContentType?". I guess this bug just reveals one reason.
Comment 30•15 years ago
|
||
Henri Sivonen and Zane U. Ji looks interested in different thing.
Henri Sivonen : Internal Reload is invoked by <meta> tag,
even though Content-Type: header has charset=utf-8.
Zane U. Ji : content type text/html is set instead of MIME
when Internal Reload is invoked.
Both are required to resolve problems relevant to this bug, although original phenomenon in comment #0 will disappear if "Internal Reload" will not be invoked.
Comment 31•15 years ago
|
||
This might have the same cause as bug 582788. It's probably worthwhile to wait for that bug to get fixed in the parser before landing patches outside the parser for this bug.
Depends on: 582788
Comment 32•15 years ago
|
||
Bug 582788 landed. Please re-test with a build that has that fix.
Comment 33•15 years ago
|
||
(In reply to comment #32)
> Bug 582788 landed. Please re-test with a build that has that fix.
Any news if this problem persists after that landing?
Updated•15 years ago
|
Priority: -- → P3
Comment 34•15 years ago
|
||
"Display of full message source" due to internal reload by <meta> charset(same display as bug 506504) was not observed by next builds(6/09 build produces this bug). This bug with base64 encoded HTML was not reproduced by these build too.
> Mozilla/5.0 (Windows NT 5.1; rv:2.0b4pre) Gecko/20100806 Shredder/3.2a1pre
> Mozilla/5.0 (Windows NT 5.1; rv:2.0b4pre) Gecko/20100906 Shredder/3.2a1pre
i.e. Unable to reproduce bug 528736 by test cases for this bug any more.
Comment 35•15 years ago
|
||
> (6/09 build produces this bug)
I'm not sure I follow. That remaining behavior is keeping this bug open?
Comment 36•15 years ago
|
||
(In reply to comment #35)
> > (6/09 build produces this bug)
> I'm not sure I follow. That remaining behavior is keeping this bug open?
Sorry for cofusing comment. I wanted to say:
BAD : 2010/6/09 build(Jun 9th)
Patch for bug 582788 is landed after 2010/6/09 before 2010/8/06
OK : 2010/8/06 build(Aug 6th), 2010/9/06 build(Sep 6th)
Reason why not closed as FIXED or WORKSFORME.
(a) As you know, there is at least one problem around <meta> charset and HTML5
parser, Bug 593894. I don't know some other changes are required on HTML5
parser for this bug.
(b) I don't know change mentioned by Zane U. Ji is still required or not after
patch for Bug 593894.
Comment 37•15 years ago
|
||
I observed "garbled display of non-ascii caused by reversed order of http-equiv and content in <meta>" during some additional checking for this bug. I opened Bug 594646 for it.
Comment 38•15 years ago
|
||
Not an HTML parser bug anymore. Moving over to MailNews Core.
Component: HTML: Parser → MIME
Priority: P3 → --
Product: Core → MailNews Core
QA Contact: parser → mime
Comment 39•15 years ago
|
||
(In reply to comment #38)
> Not an HTML parser bug anymore. Moving over to MailNews Core.
Henri Sivonen, wait for a moment, please.
I looked into my test results of comment #19 again and looked patch for bug 582788 and I did some quick tests with non-ascii data. And, I've understood that cause of internal-reload of this bug was utf-16 in <meta>.
- With html5.enable=true, with single <meta> tag with utf-16,
with normal order of http-equiv and content in <meta>,
with plain text/html of non-ascii characters
- 6/09(Jun 9th) build(patch for bug 582788 is not landed yet)
- Internal reload was invoked, "whole mail source display" was observed.
- 9/06(Sep 6th) build(patch for bug 582788 is already landed)
- mail was displayed as expected, so internal reload was not executed,
and non-ascii characters were shown as expected.
However, when I checked this bug's test case(simple text/plain, multiple <meta> of different charset, normal order of http-equiv and content) with non-ascii data in HTML using 9/06(Sep 6th) build, garbled display of non-ascii was observed, even though normal order of http-equiv and content in <meta>.
It indicates that there is still problem in multiple <meta charset> processing even after "internal-reload due to utf-16" is resolved. And, "multiple <meta>" is condition of this bug, in addition to utf-16 in <meta>.
Henri Sivonen:
Is such remaining issue(observed with test cases for this bug with non-ascii data) irrelevant to HTML5 parser?
Am I better to open separate bug for "non-ascii display problem with multiple <meta charset>, even with normal order of http-equiv and content in <meta>", with keeping this bug for issue of "internal reload due to utf-16 in <meta>" only?
Comment 40•15 years ago
|
||
If the Content-Type header on the MIME layer doesn't have a charset parameter, the HTML parser should be initialized with the user's default charset and kCharsetFromUserDefault. If there is a charset parameter on the MIME layer, the HTML parser should be initialized with that charset and kCharsetFromChannel.
In either case, the bytes extracted from the MIME encapsulation should be passed to the HTML parser unmodified.
If you can't reproduce the problem on HTTP without a charset parameter on the HTTP layer and with the meta within the first 1024 bytes of the payload, chances are the mailnews side is to blame. OTOH, if you can, then there's an HTML parser bug.
Comment 41•15 years ago
|
||
(In reply to comment #40)
I know next only about internal reload by parser in real HTTP environment.
No charset in Content-Type: header
Auto-detect = On
Different byte code from default charset(charset1) within 1024 bytes
or first chunk, before <meta> tag, and auto-detect changes charset(charset2).
<meta> with third charset(charset3) appears(may ber after 1024 bytes).
=> Internal reload is invoked.
I didn't know internal reload like bug 582788 by some special charsets.
I'll check multiple <meta> case with real HTTP first, to see parser's behaviour. And I'll open separate bug for multiple <meta> case.
For "reversed order of http-equiv and content in single <meta>" and "garbled display of non-ascii" part(Bug 594646).
HTML5 parser is irrelevant? "Charset information in <meta> passed by MIME to HTML5 parser" is cause? If so, why no problem if normal order of http-equiv and content in single <meta>? MIME passes charset in <meta> if reversed order but MIME doesn't pass charset in <meta> if normal order?
Updated•15 years ago
|
Whiteboard: ["Internal Reload due to meta charset" was fixed by Bug 582788]
Comment 42•15 years ago
|
||
FYI.
After fix of "Internal Reload due to meta charset" by Bug 582788, problems with html5.enable=true are morphed to next;
(1) Bug 594646
"Order of attributes in <meta> for charset" is relevant.
"quoted-printable/base64 or not" is irrelevant, if "View/Message Body
As/Original HTML". Observed with "View/Message Body As/Original HTML" only,
if not quoted-printable/base64.
(2) Bug 598740
"existence of <meta> for charset" is irrelevant.
"quoted-printable/base64 or not" is relevant.
Observed with "View/Message Body As/Simple HTML" only,
if no <meta> tag for charset.
I don't know which bug causes garbled display by Original HTML or Simple HTML, when HTML of <meta> for charset(reversed attribute order) is encoded in quoted-printable or base64.
I think change like "Possible patch" proposed to Comment #21 by Zane U. Ji is still needed to resolve problem like Bug 598740.
Comment 43•15 years ago
|
||
FYI.
If html5.enable=true related issues in Mail&News are called "regression", regression window is apparent - when patch for bug 373864, changed default to html5.enable=true, was landed, on 2010-05-03.
Simplest solution of html5.enable=true related issues for Tb release build:
Change deault of Tb's official releases to html5.enable=false.
It's not applicable to SeaMonkey, as Sm has Browser component in it.
Comment 44•15 years ago
|
||
All of next problems disappered with one line patch proposed to Bug 594646.
Bug 594646 comment #0 : reversed attribute order in <meta>
Bug 594646 comment #5 : multiple <meta>, successor of this bug's case
Bug 598740 : no <meta>, quoted-printable/base64, Simple HTML only issue
Patch proposed to Bug 594646.
> mailnews/base/src/nsMessenger.cpp
> - muDV->SetHintCharacterSetSource(kCharsetFromMetaTag);
> + muDV->SetHintCharacterSetSource(kCharsetFromChannel);
Problem seems next;
If kCharsetFromMetaTag is set, html5 parser fails to process charset in
meta tag correctly in some situations. (bug 594730)
Above patch passes kCharsetFromChannel as parser expects, and there is no
parser side problem in kCharsetFromChannel handling, so problem in
parser side is bypassed.
Zane U. Ji, I think this bug's original problem of "garbled display due to internal reload if UTF-16 is specified in <meta>" is completely resolved by fixing of html5 parser bug of Bug 582788, and I think your "Possible patch" proposed to Comment #21 is not needed any more.
What do you think?
Comment 45•15 years ago
|
||
(In reply to comment #44)
> Zane U. Ji, I think this bug's original problem of "garbled display due to
> internal reload if UTF-16 is specified in <meta>" is completely resolved by
> fixing of html5 parser bug of Bug 582788, and I think your "Possible patch"
> proposed to Comment #21 is not needed any more.
> What do you think?
As content type is not used when we reload a history entry, maybe it is best that we don't use it when reloading a URL. Anyway, that patch can be ignored as long as everything works.
Comment 46•15 years ago
|
||
(In reply to comment #45)
> As content type is not used when we reload a history entry, maybe it is best
> that we don't use it when reloading a URL.
> Anyway, that patch can be ignored as long as everything works.
It sounds that your patch is still required for issues around "internal reload is invoked", although your patch is not requied for this bug.
Are we better to open separate bug for issue after "internal reload is invoked", in both "needless internal reload" case and "required internal reload" case?
Comment 47•15 years ago
|
||
Comment on attachment 453373 [details] [diff] [review]
Possible patch
Based on Zane's comment it looks like this patch is no longer required. In any case, cbiesinger isn't going to be able to review it.
This bug seems to be covering multiple issues - one of which is fixed, the other is unclear from my reading as to what the actual issue is, if there is a real issue there.
I'm therefore going to mark this bug as fixed by bug 582788. If there are still outstanding issues, please file a new bug(s) with clear and as simple as possible steps to repeat with a test case.
Attachment #453373 -
Flags: review?(cbiesinger)
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•