Last Comment Bug 437915 - acid3 test 33 fails: whitespace error in class processing
: acid3 test 33 fails: whitespace error in class processing
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla1.9.1a1
Assigned To: David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
:
Mentors:
http://acid3.acidtests.org/
Depends on:
Blocks: acid3
  Show dependency treegraph
 
Reported: 2008-06-08 20:45 PDT by David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
Modified: 2011-10-09 13:29 PDT (History)
12 users (show)
dbaron: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
simplified testcase (251 bytes, text/html; charset=UTF-8)
2008-06-08 20:46 PDT, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
no flags Details
test of implementations (2.30 KB, text/html; charset=UTF-8)
2008-06-09 18:32 PDT, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
no flags Details
patch (8.84 KB, patch)
2008-06-09 20:18 PDT, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
bzbarsky: review+
jonas: superreview+
Details | Diff | Splinter Review
more thorough test of implementations (2.41 KB, text/html; charset=UTF-8)
2008-06-10 11:53 PDT, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
no flags Details
more thorough test of implementations (2) (3.03 KB, text/html; charset=UTF-8)
2008-06-10 13:55 PDT, David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch)
no flags Details

Description David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-08 20:45:02 PDT
Test 33 in http://acid3.acidtests.org/ fails with the message "whitespace error in class processing".  I'll attach a simplified testcase (which Webkit passes).

I still need to figure out what spec says our behavior is wrong and what it says we should do instead.
Comment 1 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-08 20:46:11 PDT
Created attachment 324236 [details]
simplified testcase
Comment 2 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-08 20:53:28 PDT
CSS 2.1 says:

[att~=val]
    Match when the element's "att" attribute value is a space-separated list of "words", one of which is exactly "val". If this selector is used, the words in the value must not contain spaces (since they are separated by spaces). 


... that was a change from earlier drafts, I think.  I think Acid3 is testing something that's since been changed in the specs.
Comment 3 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-08 20:58:53 PDT
It's actually always been that way in CSS2 / CSS 2.1.

I do recall the WG discussing the discrepancy at some point, though.
Comment 4 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-08 21:33:35 PDT
It's also possible the discussion was about the internal discrepancy in css3-selectors between the summary table in section 2 and the text in 6.3.1, but that looks like it was fixed by Hixie in http://lists.w3.org/Archives/Member/w3c-css-wg/2006JanMar/0073.html without it being discussed elsewhere.  Then again, I really do remember a discussion, but I can't find any record of it.

Maybe the discussion I remember is this one, though:
http://lists.w3.org/Archives/Public/www-style/2005Mar/0107.html
http://lists.w3.org/Archives/Public/www-style/2005Mar/0111.html
http://lists.w3.org/Archives/Public/www-style/2005Mar/0120.html
http://lists.w3.org/Archives/Public/www-style/2007Mar/0000.html
http://lists.w3.org/Archives/Public/www-style/2007Mar/0001.html
Comment 5 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-08 21:39:46 PDT
Ah, the discussion I was remembering was this one, I think:
http://lists.w3.org/Archives/Member/w3c-css-wg/2005OctDec/0093.html
http://lists.w3.org/Archives/Member/w3c-css-wg/2005OctDec/0158.html
http://lists.w3.org/Archives/Member/w3c-css-wg/2005OctDec/0160.html
... which resulted in the change being made to css3-selectors between:
http://www.w3.org/TR/2001/CR-css3-selectors-20011113/#attribute-selectors
and
http://www.w3.org/TR/2005/WD-css3-selectors-20051215/#attribute-representation

So in this case, the newer change was to css3-selectors, and this probably is a bug.  But we should probably get the change backported to CSS 2.1.
Comment 6 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-08 21:49:33 PDT
So, it turns out we need to fix this bug separately for ~= and for class.

For ~= we need to patch ValueIncludes in nsCSSRuleProcessor.cpp.

class attribute parsing, however, lives somewhere in content code, perhaps multiple places.
Comment 7 Patrick Dark 2008-06-09 17:07:58 PDT
(In reply to comment #0)
> Test 33 in http://acid3.acidtests.org/ fails with the message "whitespace error
> in class processing".  I'll attach a simplified testcase (which Webkit passes).
> 
> I still need to figure out what spec says our behavior is wrong and what it
> says we should do instead.

Based exclusively upon looking at the test case that you uploaded in comment #2, the relevant specification seems to be that of HTML 4.01. Section 7.5.2 [1] says that class names specified in |class| attribute values need to be separated by white space characters. White space characters are defined in section 9.1 [2] to be the following:

U+0009 <control> (HORIZONTAL TABULATION)
U+000B <control> (VERTICAL TABULATION)
U+000C <control> (FORM FEED)
U+0020 SPACE

Also included are the line‐breaking characters [3]:

U+000A <control> (LINE FEED)
U+000D <control> (CARRIAGE RETURN)
U+000A U+000D <control> (LINE FEED) <control> (CARRIAGE RETURN)

Therefore, the |class| attribute in your test should be equivalent to |a b c d e f&#x2003;g&#x3000;h|.

[1] http://www.w3.org/TR/html401/struct/global.html#adef-class
[2] http://www.w3.org/TR/html401/struct/text.html#h-9.1
[3] http://www.w3.org/TR/html401/struct/text.html#line-breaks
Comment 8 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-09 18:13:45 PDT
But unless HTML defines normalization, what CSS matches is defined by CSS, which is different.
Comment 9 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-09 18:22:32 PDT
HTML 5, at http://www.whatwg.org/specs/web-apps/current-work/#space , defines white space as: "U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR)".

css3-selectors says: "Only the characters "space" (U+0020), "tab" (U+0009), "line feed" (U+000A), "carriage return" (U+000D), and "form feed" (U+000C) can occur in white space."

What we currently implement (in both places) uses nsCRT::IsAsciiSpace, which NS_IsAsciiWhitespace, which checks for ' ', '\r', '\n', and '\t'.

So it seems like the following characters count as whitespace:

U+0020: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000D: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000A: HTML 4.01, HTML 5, css3-selectors, current implementation
U+0009: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000C: HTML 4.01, HTML 5, css3-selectors
U+000B: HTML 4.01, HTML 5

I'd note that Hixie's test tests U+000C, but not U+000B.

I'm curious what other implementations do here.
Comment 10 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-09 18:32:09 PDT
Created attachment 324380 [details]
test of implementations
Comment 11 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-09 18:37:10 PDT
Webkit trunk from a few days/weeks ago follows css3-selectors.

IE7 follows HTML.

Opera 9.27 and 9.5b2 consider all but two of the characters as whitespace.
Comment 12 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-09 18:57:56 PDT
I think your summary of HTML4 was actually incorrect, though.  It makes U+200B (zero-width space) a whitespace character, not U+000B (vertical tab).

So what IE7 matches is HTML5.
Comment 13 Patrick Dark 2008-06-09 19:35:40 PDT
(In reply to comment #8)
> But unless HTML defines normalization, what CSS matches is defined by CSS,
> which is different.

I figured that the HTML parser would split the names based on its own white space rules then pass the separated names to the CSS parser for use with class selectors. I guess not.

(In reply to comment #12)
> I think your summary of HTML4 was actually incorrect, though.  It makes U+200B
> (zero-width space) a whitespace character, not U+000B (vertical tab).

Yes, I noticed this myself and was going to correct it before you replied. Somehow I turned a two into a zero.

> So what IE7 matches is HTML5.

Using your comment #10 test case as a measure, Win. Internet Explorer 8 Beta 1 (8.0.6001.17184) treats all of the following characters as white space for the purposes of class selector matching: U+0009, U+000A, U+000B, U+000C, U+000D, and U+0020. (Interestingly enough, this includes U+000B.) The “Emulate IE7” mode behaves identically.

Based on comment #9, IE8 and its IE7 emulation mode match the rules of css3-selectors and do not match HTML 4.01/5 rules (U+200B is missing).
Comment 14 Patrick Dark 2008-06-09 19:48:43 PDT
(In reply to comment #13)
> Based on comment #9, IE8 and its IE7 emulation mode match the rules of
> css3-selectors and do not match HTML 4.01/5 rules (U+200B is missing).

Sorry, it matches HTML 5 and not HTML 4.01.

Comment 15 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-09 19:57:13 PDT
So, for now, I'm going to patch this to match WebKit and css3-selectors, since that's the smaller change and the obviously correct part.

I think HTML's definition of the class attribute needs to match css3-selectors's definition of attribute selectors (but that need not match other whitespace handling in either language), but I don't have the energy for the spec fight right now.
Comment 16 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-09 20:18:39 PDT
Created attachment 324389 [details] [diff] [review]
patch

This does as I described in my previous comment -- adding only U+000C (form feed) to our set of whitespace characters.
Comment 17 Boris Zbarsky [:bz] 2008-06-09 22:14:43 PDT
Comment on attachment 324389 [details] [diff] [review]
patch

r=bzbarsky
Comment 18 Hixie (not reading bugmail) 2008-06-10 10:12:41 PDT
I'd like to make HTML5 consistent with CSS and with the most widely deployed browser. My impression is that changing CSS will be easier. Any chance we can push back on the CSSWG and have CSS include the same whitespace characters?
Comment 19 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-10 11:27:06 PDT
Sure, I'm ok with going that direction; I almost did that yesterday, actually.
Comment 20 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-10 11:53:20 PDT
Created attachment 324491 [details]
more thorough test of implementations
Comment 21 Hixie (not reading bugmail) 2008-06-10 11:55:07 PDT
k, mailed the csswg
Comment 22 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-10 13:55:20 PDT
Created attachment 324515 [details]
more thorough test of implementations (2)

Here's a similar one that works in IE.

IE's CSS parsing is really hard to test since it recovers from errors in ridiculous ways (e.g., it accepts "1pxA2pxA3pxA4px" as a value for margin, meaning "1px 2px 3px 4px" -- this makes it hard to tell what is whitespace versus what's "not part of identifiers", though this test tries two approaches).

But it seems that IE uses the same or very similar lists of characters for whitespace in HTML and CSS, and I'd really rather not change CSS's overall definition of whitespace here, since that's been stable for over a decade and is implemented correctly by everyone other than IE.  So, actually, I think HTML5 should change.
Comment 23 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-06-10 14:08:56 PDT
Comment on attachment 324389 [details] [diff] [review]
patch

_Personally_ I would prefer if what's considered whitespace was consistent across CSS, HTML and XML. And even better if that list was as short as possible.

But I'll rely on dbaron here and he says this is what we want to do at least for now :)
Comment 24 Hixie (not reading bugmail) 2008-06-10 14:33:06 PDT
I removed U+000B from HTML5.
Comment 25 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-06-10 18:52:30 PDT
Fixed:
http://hg.mozilla.org/mozilla-central/index.cgi/rev/b220fe3a520b
Comment 26 Wayne Mery (:wsmwk, NI for questions) 2011-10-09 13:29:21 PDT
*** Bug 106311 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.