Test 33 in http://acid3.acidtests.org/ fails with the message "whitespace error in class processing". I'll attach a simplified testcase (which Webkit passes).
I still need to figure out what spec says our behavior is wrong and what it says we should do instead.
Created attachment 324236 [details]
CSS 2.1 says:
Match when the element's "att" attribute value is a space-separated list of "words", one of which is exactly "val". If this selector is used, the words in the value must not contain spaces (since they are separated by spaces).
... that was a change from earlier drafts, I think. I think Acid3 is testing something that's since been changed in the specs.
It's actually always been that way in CSS2 / CSS 2.1.
I do recall the WG discussing the discrepancy at some point, though.
It's also possible the discussion was about the internal discrepancy in css3-selectors between the summary table in section 2 and the text in 6.3.1, but that looks like it was fixed by Hixie in http://lists.w3.org/Archives/Member/w3c-css-wg/2006JanMar/0073.html without it being discussed elsewhere. Then again, I really do remember a discussion, but I can't find any record of it.
Maybe the discussion I remember is this one, though:
Ah, the discussion I was remembering was this one, I think:
... which resulted in the change being made to css3-selectors between:
So in this case, the newer change was to css3-selectors, and this probably is a bug. But we should probably get the change backported to CSS 2.1.
So, it turns out we need to fix this bug separately for ~= and for class.
For ~= we need to patch ValueIncludes in nsCSSRuleProcessor.cpp.
class attribute parsing, however, lives somewhere in content code, perhaps multiple places.
(In reply to comment #0)
> Test 33 in http://acid3.acidtests.org/ fails with the message "whitespace error
> in class processing". I'll attach a simplified testcase (which Webkit passes).
> I still need to figure out what spec says our behavior is wrong and what it
> says we should do instead.
Based exclusively upon looking at the test case that you uploaded in comment #2, the relevant specification seems to be that of HTML 4.01. Section 7.5.2  says that class names specified in |class| attribute values need to be separated by white space characters. White space characters are defined in section 9.1  to be the following:
U+0009 <control> (HORIZONTAL TABULATION)
U+000B <control> (VERTICAL TABULATION)
U+000C <control> (FORM FEED)
Also included are the line‐breaking characters :
U+000A <control> (LINE FEED)
U+000D <control> (CARRIAGE RETURN)
U+000A U+000D <control> (LINE FEED) <control> (CARRIAGE RETURN)
Therefore, the |class| attribute in your test should be equivalent to |a b c d e f g　h|.
But unless HTML defines normalization, what CSS matches is defined by CSS, which is different.
HTML 5, at http://www.whatwg.org/specs/web-apps/current-work/#space , defines white space as: "U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR)".
css3-selectors says: "Only the characters "space" (U+0020), "tab" (U+0009), "line feed" (U+000A), "carriage return" (U+000D), and "form feed" (U+000C) can occur in white space."
What we currently implement (in both places) uses nsCRT::IsAsciiSpace, which NS_IsAsciiWhitespace, which checks for ' ', '\r', '\n', and '\t'.
So it seems like the following characters count as whitespace:
U+0020: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000D: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000A: HTML 4.01, HTML 5, css3-selectors, current implementation
U+0009: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000C: HTML 4.01, HTML 5, css3-selectors
U+000B: HTML 4.01, HTML 5
I'd note that Hixie's test tests U+000C, but not U+000B.
I'm curious what other implementations do here.
Created attachment 324380 [details]
test of implementations
Webkit trunk from a few days/weeks ago follows css3-selectors.
IE7 follows HTML.
Opera 9.27 and 9.5b2 consider all but two of the characters as whitespace.
I think your summary of HTML4 was actually incorrect, though. It makes U+200B (zero-width space) a whitespace character, not U+000B (vertical tab).
So what IE7 matches is HTML5.
(In reply to comment #8)
> But unless HTML defines normalization, what CSS matches is defined by CSS,
> which is different.
I figured that the HTML parser would split the names based on its own white space rules then pass the separated names to the CSS parser for use with class selectors. I guess not.
(In reply to comment #12)
> I think your summary of HTML4 was actually incorrect, though. It makes U+200B
> (zero-width space) a whitespace character, not U+000B (vertical tab).
Yes, I noticed this myself and was going to correct it before you replied. Somehow I turned a two into a zero.
> So what IE7 matches is HTML5.
Using your comment #10 test case as a measure, Win. Internet Explorer 8 Beta 1 (8.0.6001.17184) treats all of the following characters as white space for the purposes of class selector matching: U+0009, U+000A, U+000B, U+000C, U+000D, and U+0020. (Interestingly enough, this includes U+000B.) The “Emulate IE7” mode behaves identically.
Based on comment #9, IE8 and its IE7 emulation mode match the rules of css3-selectors and do not match HTML 4.01/5 rules (U+200B is missing).
(In reply to comment #13)
> Based on comment #9, IE8 and its IE7 emulation mode match the rules of
> css3-selectors and do not match HTML 4.01/5 rules (U+200B is missing).
Sorry, it matches HTML 5 and not HTML 4.01.
So, for now, I'm going to patch this to match WebKit and css3-selectors, since that's the smaller change and the obviously correct part.
I think HTML's definition of the class attribute needs to match css3-selectors's definition of attribute selectors (but that need not match other whitespace handling in either language), but I don't have the energy for the spec fight right now.
Created attachment 324389 [details] [diff] [review]
This does as I described in my previous comment -- adding only U+000C (form feed) to our set of whitespace characters.
Comment on attachment 324389 [details] [diff] [review]
I'd like to make HTML5 consistent with CSS and with the most widely deployed browser. My impression is that changing CSS will be easier. Any chance we can push back on the CSSWG and have CSS include the same whitespace characters?
Sure, I'm ok with going that direction; I almost did that yesterday, actually.
Created attachment 324491 [details]
more thorough test of implementations
k, mailed the csswg
Created attachment 324515 [details]
more thorough test of implementations (2)
Here's a similar one that works in IE.
IE's CSS parsing is really hard to test since it recovers from errors in ridiculous ways (e.g., it accepts "1pxA2pxA3pxA4px" as a value for margin, meaning "1px 2px 3px 4px" -- this makes it hard to tell what is whitespace versus what's "not part of identifiers", though this test tries two approaches).
But it seems that IE uses the same or very similar lists of characters for whitespace in HTML and CSS, and I'd really rather not change CSS's overall definition of whitespace here, since that's been stable for over a decade and is implemented correctly by everyone other than IE. So, actually, I think HTML5 should change.
Comment on attachment 324389 [details] [diff] [review]
_Personally_ I would prefer if what's considered whitespace was consistent across CSS, HTML and XML. And even better if that list was as short as possible.
But I'll rely on dbaron here and he says this is what we want to do at least for now :)
I removed U+000B from HTML5.
*** Bug 106311 has been marked as a duplicate of this bug. ***