Last Comment Bug 675499 - in a utf-8 page, a utf-8 encoded NBSP (c2a0) from a linked css with unspecified encoding is decoded and rendered as latin
: in a utf-8 page, a utf-8 encoded NBSP (c2a0) from a linked css with unspecifi...
Status: RESOLVED FIXED
[inbound]
:
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: unspecified
: x86 Windows XP
: -- normal (vote)
: mozilla8
Assigned To: Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
:
Mentors:
http://es5.github.com/
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-31 07:57 PDT by al_9x
Modified: 2011-08-03 02:15 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
new profile (36.48 KB, image/png)
2011-07-31 11:08 PDT, al_9x
no flags Details
Move charset setting to the other op queue (9.28 KB, patch)
2011-08-02 04:16 PDT, Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
bzbarsky: review+
Details | Diff | Splinter Review

Description al_9x 2011-07-31 07:57:52 PDT
Fx 5.0.1, new profile
load http://es5.github.com/

"NOTEÂ Â Â " will be seen at the beginning and throughout

the three utf-8 encoded NBSP following NOTE are coming from a linked css whose encoding is not specified (http, bom, @charset)

.note::after {
  content: "   ";
}

according to http://www.w3.org/TR/CSS2/syndata.html#charset

the css should inherit the encoding (if not specified) of the doc, which is utf-8
Comment 1 j.j. 2011-07-31 10:28:27 PDT
Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0            Mozilla/5.0 (Windows NT 5.1; rv:8.0a1) Gecko/20110729 Firefox/8.0a1

> "NOTEÂ Â Â " will be seen at the beginning
cannot reproduce
Comment 2 al_9x 2011-07-31 11:08:58 PDT
Created attachment 549671 [details]
new profile
Comment 3 al_9x 2011-07-31 11:54:23 PDT
8.0a1 (2011-07-31) as well
Comment 4 Matthias Versen [:Matti] 2011-07-31 13:34:19 PDT
wfm with Mozilla/5.0 (Windows NT 6.1; rv:8.0a1) Gecko/20110731 Firefox/8.0a1 SeaMonkey/2.5a1
Comment 5 al_9x 2011-07-31 15:18:09 PDT
If I copy and load the page (and css) locally (file:) I can still repro, but if I remove all but the first DIV from the body, the problem goes away.

The page is large, perhaps it's a question of timing: the determination of the  page's charset is not being made quickly enough (due to its size) so that when the presumably async load of the css tries to determine its own charset, the page still thinks it's latin (which is the default)
Comment 6 Matthias Versen [:Matti] 2011-07-31 15:33:33 PDT
the <meta charset="utf-8"> is at the top jut before the css link.
Comment 7 al_9x 2011-07-31 15:45:27 PDT
(In reply to comment #6)
> the <meta charset="utf-8"> is at the top jut before the css link.

Nevertheless, the problem is present when the page is large and slower to load and absent when small.
Comment 8 Boris Zbarsky [:bz] (TPAC) 2011-07-31 17:35:26 PDT
Henri, it looks like the style preload (off of nsHtml5LoadFlusher::Run calling FlushSpeculativeLoads) happens before the document charset is set (that comes via nsHtml5ExecutorFlusher::Run calling nsHtml5TreeOpExecutor::RunFlushLoop).

Is that sort of expected?  I can add the current document charset charset to the hashtable key, I suppose, but it's a little weird to have the preload start (and presumably get as far as OnDetermineCharset, so the data is already partially available!) before the charset is set on the document.
Comment 9 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2011-08-01 08:53:49 PDT
(In reply to comment #8)
> Henri, it looks like the style preload (off of nsHtml5LoadFlusher::Run
> calling FlushSpeculativeLoads) happens before the document charset is set
> (that comes via nsHtml5ExecutorFlusher::Run calling
> nsHtml5TreeOpExecutor::RunFlushLoop).
> 
> Is that sort of expected?

Given the code, one can expect this to happen, but this is an implementation error on my part. eTreeOpSetDocumentCharset shouldn't be a tree op in the first place. It should be in the preload op queue instead.
Comment 10 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2011-08-02 04:16:38 PDT
Created attachment 550040 [details] [diff] [review]
Move charset setting to the other op queue

I don't see the problem in action on my computer, but by code inspection, the bug is clearly there. Due to the race condition nature of the bug, there isn't a way to write a deterministic test case for this.

Anyway, let's use this fix if the tryserver is OK with it.
Comment 11 Boris Zbarsky [:bz] (TPAC) 2011-08-02 08:10:44 PDT
I can in fact reproduce the problem, and the attached patch fixes it.
Comment 12 Boris Zbarsky [:bz] (TPAC) 2011-08-02 08:15:10 PDT
Comment on attachment 550040 [details] [diff] [review]
Move charset setting to the other op queue

>+++ b/parser/html/nsHtml5SpeculativeLoad.cpp
>+        aExecutor->SetDocumentCharsetAndSource(narrowName,
>+                                             intSource);

Fix indent, please?

>+++ b/parser/html/nsHtml5SpeculativeLoad.h
>+     * queue as opposed to tree operation queue is that the manifest must get

"manifest"?  You mean "charset change"?

>+     * processed before any actually speculative loads such as style sheets.

s/actually/actual/ ?

>+     * If mOpCode is eSpeculativeLoadImage, this is the value of the
>+     * "crossorigin" attribute.  If mOpCode is eSpeculativeLoadStyle
>+     * or eSpeculativeLoadScript then this is the value of the
>+     * "charset" attribute.  Otherwise it's empty.

This needs fixing for the new eSpeculativeLoadSetDocumentCharset opcode.

r=me with those nits.
Comment 13 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2011-08-02 10:50:30 PDT
Thanks. Pushed with the nits fixed:
http://hg.mozilla.org/integration/mozilla-inbound/rev/8e4e181a9ce9
Comment 14 Daniel Holbert [:dholbert] 2011-08-02 12:11:51 PDT
Landed followup to fix maemo build bustage:
https://bugzilla.mozilla.org/show_bug.cgi?id=675499
e.g.
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1312310251.1312310578.31642.gz
> /builds/slave/m-in-lnx-maemo5-gtk/build/parser/html/nsHtml5SpeculativeLoad.h:54: error: comma at end of enumerator list

Note You need to log in before you can comment on or make changes to this bug.