Closed
Bug 599320
Opened 14 years ago
Closed 12 years ago
Don't inherit non-ASCII superset encoding as the tentative encoding of subframes
Categories
(Core :: DOM: HTML Parser, defect, P3)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
FIXED
mozilla19
People
(Reporter: moga_marius22, Assigned: hsivonen)
References
()
Details
Attachments
(2 files, 3 obsolete files)
422.64 KB,
image/jpeg
|
Details | |
18.19 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Build Identifier: Firefox 3.6.10
Wrong display of some pages from www.manastireazemes.ro. Any other browser like Internet explorer, Opera, or Google Chrome works fine and the Multimedia page or Contact page are displayed corectly.
Reproducible: Always
Steps to Reproduce:
1.Start mozilla firefox 3.6.10
2.Browse to http://www.manastireazemes.ro/
3.Select Multimedia and Contact menus.
Actual Results:
Instead of the Multimedia and Contact pages, you will see a lot of chinese characters.
Expected Results:
Those pages should be displayed corectly on your browser, like any other browser
Works fine with Firefox 3.6.8
Comment 1•14 years ago
|
||
The reason is that the originating page uses the character encoding: text/html (BOM UTF-16, litte-endian) given as http header.
The iframe ( http://www.manastireazemes.ro/HTMLRO/Contact.html ) is using iso8859-1 and that carset info is only present in the html header as <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
I can confirm this with SM trunk but it works for me with FF3.6.10
Status: UNCONFIRMED → NEW
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → general
Version: unspecified → Trunk
![]() |
||
Comment 2•14 years ago
|
||
The <meta> tag should override the charset inherited from parent. So I have no idea why you're seeing this issue. If you _are_ seeing it, care to debug?
Reporter | ||
Comment 3•14 years ago
|
||
Yes, I am interested in finding a solution. I have access to modify the html code for this website.
Reporter | ||
Comment 5•14 years ago
|
||
Reporter | ||
Comment 6•14 years ago
|
||
Ok, sorry for misunderstanding, I have attached a jpg for this issue.
Kindly thank you.
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.11pre) Gecko/20100924 Namoroka/3.6.11pre
Able to reproduce. When we directly click the Contact menu, we see the map. To reproduce click on the Menus one by one and wait for the page to load and at the last click on the Contact menu, we can see the lots of chinese characters instead of displaying the map. Able to reproduce in 3.6.8 version also, but not in 3.6.7
Comment 8•14 years ago
|
||
The change between 1.9.2 and trunk is the html5 parser. Disabling the html5 parser makes it working again on the trunk.
I tested that with FF4.0b6 on windows.
I can't reproduce the issue with 1.9.2 on my system.
@bz: sure, i can try to debug it if you have suggestions how to do it.
![]() |
||
Comment 9•14 years ago
|
||
Well, a good start is stepping through nsHTMLDocument::StartDocumentLoad and seeing where we end up picking up our charset from.
Though given the HTML5 parser thing, perhaps Henri just needs to look at this. But I thought we had disabled the HTML5 parser on branch, so that wouldn't be what Marius is seeing....
Comment 10•14 years ago
|
||
Oh, you mean real debugging. I'm sorry but that is beyond of my possibilities :-(
What I'm seeing on my system seems to be different from that what Marius is seeing.
Reporter | ||
Comment 11•14 years ago
|
||
I've made some changes on http://www.manastireazemes.ro/HTMLRO/Multimedia.html code and that page is working now. I think that I will make those changes on every web page. There are other Multimedia pages that right now are displaing those chineze characters.
These are the changes for Multimedia.html:
1. The meta info is now before javascript code
2. On <style type="text/css"> I have deleted the spaces and new lines from navigation_menu's properties
3. I have changed the summary parameter name, but I think this is not important.
If it's alright with you I will make the changes.
Regards
![]() |
||
Comment 12•14 years ago
|
||
Actually, it would be good if there were a page that shows the problem, so Henri can actually debug and fix it....
Was it just moving the <meta> to before the <script> that did it? But that shouldn't matter on 1.9.2, without the HTML5 parser.
Reporter | ||
Comment 13•14 years ago
|
||
MultimediaFotoViata.html, MultimediaFotoPeisaje.html, MultimediaFotoIstorie.html, MultimediaFotoConstructie.html are similar with Multimedia.html but I have not modified them yet. You can access them from
the iframe menu (on the top of the iframe, go to "Foto" and than select another submeniu than "Asezamant")
Reporter | ||
Comment 14•14 years ago
|
||
Ok, I'm not going to modify MultimediaFotoPeisaje.html (on the top of the iframe, go to "Foto" and than select "Peisaje" submeniu) and next I will try to find out if the position of <meta> was the real problem.
Reporter | ||
Comment 15•14 years ago
|
||
Yes, this is the problem. When I move the <meta> before the <script> it is working.
On my PC every page is working after I delete browsers history, restart an than load the website again, but after I restart the browser again (without deleting history) and than reload a webpage (that I haven't modified yet) it appears the chinese problem.
Assignee | ||
Comment 16•14 years ago
|
||
I don't see the described problem on trunk on Mac or Linux. Have all the pages on the site now been modified not to show the problem? Is there still a test case somewhere?
Using UTF-16 (any flavor) on the Web is just asking for trouble.
Maybe we shouldn't inherit non-ASCII superset encodings to subframes, because if the meta prescan fails, a later meta fail to correct the situation if the tentative encoding wasn't an ASCII superset.
Component: General → HTML: Parser
QA Contact: general → parser
Reporter | ||
Comment 17•14 years ago
|
||
@hsivonen: Yes there is now only one page intentionally left for you to debug, please go to "Multimedia" menu item and than on the top of the iframe, go to "Foto" menu and than select "Peisaje" submeniu item. I'm writing about the "http://www.manastireazemes.ro/HTMLRO/MultimediaFotoPeisaje.html" page. I couldn't let the website uncovered with all the pages damaged like this one. Thank you for understanding.
Assignee | ||
Comment 18•14 years ago
|
||
(In reply to comment #17)
> @hsivonen: Yes there is now only one page intentionally left for you to debug,
> please go to "Multimedia" menu item and than on the top of the iframe, go to
> "Foto" menu and than select "Peisaje" submeniu item. I'm writing about the
> "http://www.manastireazemes.ro/HTMLRO/MultimediaFotoPeisaje.html" page. I
> couldn't let the website uncovered with all the pages damaged like this one.
> Thank you for understanding.
Thanks. Since the <meta> isn't within the 1 KB, the meta prescan fails to recover from the bad inherited encoding (UTF-16) and since the bytes don't decode to something that parses as a <meta> tag, the late <meta> reload mechanism can't correct the situation.
As mentioned in comment 16, this can be fixed by not inhering non-ASCII superset encodings as subframe defaults.
Summary: please browse to Multimedia and than to Contact menu → Don't inherit non-ASCII superset encoding as the tentative encoding of subframes
Assignee | ||
Updated•14 years ago
|
Priority: -- → P3
Assignee | ||
Comment 19•12 years ago
|
||
In addition to parent encoding, this patch also refuses hints from the previous doc and the cache entry the same way.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Assignee | ||
Comment 20•12 years ago
|
||
The binary patch is like the -ref version except:
1) It’s in UTF-16.
2) It references the non-“-ref” child frame.
3) It has class="reftest-wait".
#3 is required because the parser in the child frame restarts, which would confuse the reftest framework without reftest-wait.
Attachment #666928 -
Attachment is obsolete: true
Attachment #667352 -
Flags: review?(bugs)
Comment 21•12 years ago
|
||
This patch may break some old Japanese pages whose encoding is ISO-2022-JP. Some Japanese user complained when IE8 dropped the auto-detect for ISO-2022-JP.
Do other browsers also ignore parent frame, previous doc, and cache entry for ISO-2022-JP?
Assignee | ||
Comment 22•12 years ago
|
||
Comment on attachment 667352 [details] [diff] [review]
Refuse encodings that are not rough ASCII supersets as hints, v2
These days, the spec allows encodings that may encode non-Basic Latin to ASCII bytes as long as Basic Latin encodes to ASCII bytes. That is, the legacy Japanese encodings should not be blacklisted here.
Attachment #667352 -
Attachment is obsolete: true
Attachment #667352 -
Flags: review?(bugs)
Assignee | ||
Comment 23•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #21)
> This patch may break some old Japanese pages whose encoding is ISO-2022-JP.
> Some Japanese user complained when IE8 dropped the auto-detect for
> ISO-2022-JP.
> Do other browsers also ignore parent frame, previous doc, and cache entry
> for ISO-2022-JP?
This patch didn’t affect ISO-2022-JP in any way. The Japanese encodings I had blacklisted were not supported by Gecko anyway, they were cruft carried over from the blacklist I had for Java. Removing those. Also removing x-user-defined, because it is ASCII-compatible by code inspection.
Attachment #667431 -
Flags: review?(bugs)
Comment 24•12 years ago
|
||
Comment on attachment 667431 [details] [diff] [review]
Blacklist UTF-16 from being inherited from parent
>- else if (kCharsetFromCache <= parentSource) {
>- // Make sure that's OK
>- if (!aParentDocument || !CheckSameOrigin(this, aParentDocument)) {
>- return false;
>- }
>-
>- source = kCharsetFromParentFrame;
>+
>+ // if parent is posted doc, set this prevent autodections
I know you just moved this stuff, but 'autodections' ?
> nsHTMLDocument::UseWeakDocTypeDefault(int32_t& aCharsetSource,
>- nsACString& aCharset)
>+ nsACString& aCharset)
Odd indentation
>- // fallback value in case docshell return error
>- aCharset.AssignLiteral("ISO-8859-1");
>+ return;
>
> const nsAdoptingCString& defCharset =
> Preferences::GetLocalizedCString("intl.charset.default");
>
>- if (!defCharset.IsEmpty()) {
>+ // Don't let the user break things by setting intl.charset.default to
>+ // not a rough ASCII superset
>+ if (!defCharset.IsEmpty() && IsAsciiCompatible(defCharset)) {
> aCharset = defCharset;
>- aCharsetSource = kCharsetFromWeakDocTypeDefault;
>+ } else {
>+ aCharset.AssignLiteral("windows-1252");
So I still don't understand the reason to change default to windows-1252
Anyway it shouldn't be about this bug, so could to just not do it.
Attachment #667431 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 25•12 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #24)
> Comment on attachment 667431 [details] [diff] [review]
> Blacklist UTF-16 from being inherited from parent
>
>
> >- else if (kCharsetFromCache <= parentSource) {
> >- // Make sure that's OK
> >- if (!aParentDocument || !CheckSameOrigin(this, aParentDocument)) {
> >- return false;
> >- }
> >-
> >- source = kCharsetFromParentFrame;
> >+
> >+ // if parent is posted doc, set this prevent autodections
> I know you just moved this stuff, but 'autodections' ?
Fixed.
> > nsHTMLDocument::UseWeakDocTypeDefault(int32_t& aCharsetSource,
> >- nsACString& aCharset)
> >+ nsACString& aCharset)
> Odd indentation
Fixed. (The monospace font in my editor had different widths for regular and bold. Sigh.)
> >- // fallback value in case docshell return error
> >- aCharset.AssignLiteral("ISO-8859-1");
> >+ return;
> >
> > const nsAdoptingCString& defCharset =
> > Preferences::GetLocalizedCString("intl.charset.default");
> >
> >- if (!defCharset.IsEmpty()) {
> >+ // Don't let the user break things by setting intl.charset.default to
> >+ // not a rough ASCII superset
> >+ if (!defCharset.IsEmpty() && IsAsciiCompatible(defCharset)) {
> > aCharset = defCharset;
> >- aCharsetSource = kCharsetFromWeakDocTypeDefault;
> >+ } else {
> >+ aCharset.AssignLiteral("windows-1252");
> So I still don't understand the reason to change default to windows-1252
> Anyway it shouldn't be about this bug, so could to just not do it.
Changed back to ISO-8859-1. Filed bug 802030.
Thanks for the r+. Landed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6db369a15ea7
Assignee | ||
Comment 26•12 years ago
|
||
I forgot that there was a GUI for the fallback encoding. Filed bug 802033.
Comment 27•12 years ago
|
||
Backed out for intermittent (but frequent) Win7 debug reftest failures in bug599320-1.html, eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16150844&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=16153575&tree=Mozilla-Inbound
Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ea4e5cb4e240
Assignee | ||
Comment 28•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+1] from comment #27)
> Backed out for intermittent (but frequent) Win7 debug reftest failures in
> bug599320-1.html, eg:
The log says:
REFTEST INFO | Saved log: START file:///c:/talos-slave/test/build/reftest/tests/parser/htmlparser/tests/reftest/bug599320-1.html
REFTEST INFO | Saved log: [CONTENT] OnDocumentLoad triggering WaitForTestEnd
REFTEST INFO | Saved log: [CONTENT] WaitForTestEnd: Adding listeners
REFTEST INFO | Saved log: Initializing canvas snapshot
REFTEST INFO | Saved log: DoDrawWindow 0,0,800,1000
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in file:///c:/talos-slave/test/build/reftest/tests/parser/htmlparser/tests/reftest/bug599320-1.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
REFTEST INFO | Saved log: [CONTENT] Rect: 10 154 310 555
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: DoDrawWindow 10,154,300,401
REFTEST INFO | Saved log: [CONTENT] WaitForTestEnd: Adding listeners
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in file:///c:/talos-slave/test/build/reftest/tests/parser/htmlparser/tests/reftest/bug599320-1.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
REFTEST INFO | Saved log: [CONTENT] Rect: 10 154 310 555
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in file:///c:/talos-slave/test/build/reftest/tests/parser/htmlparser/tests/reftest/frame599320-1.html
REFTEST INFO | Saved log: [CONTENT] AttrModifiedListener fired
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in file:///c:/talos-slave/test/build/reftest/tests/parser/htmlparser/tests/reftest/bug599320-1.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: [CONTENT] AttrModifiedListener fired
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in file:///c:/talos-slave/test/build/reftest/tests/parser/htmlparser/tests/reftest/bug599320-1.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in file:///c:/talos-slave/test/build/reftest/tests/parser/htmlparser/tests/reftest/frame599320-1.html
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: dispatching MozReftestInvalidate
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_FOR_REFTEST_WAIT_REMOVAL
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
REFTEST INFO | Saved log: [CONTENT] MakeProgress: Completed
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_COMPLETED
The iframe has a restarting parser. The parser restart confuses the reftest framework on all platforms, but the reftest-wait stuff is supposed to take care of it.
Any ideas why the snapshotting would misfire on Windows despite reftest-wait?
Should I just mark the test as intermittent on Windows and not think about it too much? The code being tested is cross-platform after all, so there’s a good reason to believe the actual code change is doing OK.
Assignee | ||
Comment 29•12 years ago
|
||
This patch is the same as the one landed before except the removal of reftest-wait tries harder to defer removal of the class. This was green on Windows debug on try, but proper landing is needed to see if it is still random.
Attachment #667431 -
Attachment is obsolete: true
Attachment #672733 -
Flags: review?(bugs)
Updated•12 years ago
|
Attachment #672733 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 30•12 years ago
|
||
Comment 31•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
You need to log in
before you can comment on or make changes to this bug.
Description
•