acid3 test 33 fails: whitespace error in class processing

RESOLVED FIXED in mozilla1.9.1a1

Status

()

RESOLVED FIXED
11 years ago
8 years ago

People

(Reporter: dbaron, Assigned: dbaron)

Tracking

Trunk
mozilla1.9.1a1
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(5 attachments)

Test 33 in http://acid3.acidtests.org/ fails with the message "whitespace error in class processing".  I'll attach a simplified testcase (which Webkit passes).

I still need to figure out what spec says our behavior is wrong and what it says we should do instead.
Blocks: 410460
CSS 2.1 says:

[att~=val]
    Match when the element's "att" attribute value is a space-separated list of "words", one of which is exactly "val". If this selector is used, the words in the value must not contain spaces (since they are separated by spaces). 


... that was a change from earlier drafts, I think.  I think Acid3 is testing something that's since been changed in the specs.
It's actually always been that way in CSS2 / CSS 2.1.

I do recall the WG discussing the discrepancy at some point, though.
It's also possible the discussion was about the internal discrepancy in css3-selectors between the summary table in section 2 and the text in 6.3.1, but that looks like it was fixed by Hixie in http://lists.w3.org/Archives/Member/w3c-css-wg/2006JanMar/0073.html without it being discussed elsewhere.  Then again, I really do remember a discussion, but I can't find any record of it.

Maybe the discussion I remember is this one, though:
http://lists.w3.org/Archives/Public/www-style/2005Mar/0107.html
http://lists.w3.org/Archives/Public/www-style/2005Mar/0111.html
http://lists.w3.org/Archives/Public/www-style/2005Mar/0120.html
http://lists.w3.org/Archives/Public/www-style/2007Mar/0000.html
http://lists.w3.org/Archives/Public/www-style/2007Mar/0001.html
So, it turns out we need to fix this bug separately for ~= and for class.

For ~= we need to patch ValueIncludes in nsCSSRuleProcessor.cpp.

class attribute parsing, however, lives somewhere in content code, perhaps multiple places.

Comment 7

11 years ago
(In reply to comment #0)
> Test 33 in http://acid3.acidtests.org/ fails with the message "whitespace error
> in class processing".  I'll attach a simplified testcase (which Webkit passes).
> 
> I still need to figure out what spec says our behavior is wrong and what it
> says we should do instead.

Based exclusively upon looking at the test case that you uploaded in comment #2, the relevant specification seems to be that of HTML 4.01. Section 7.5.2 [1] says that class names specified in |class| attribute values need to be separated by white space characters. White space characters are defined in section 9.1 [2] to be the following:

U+0009 <control> (HORIZONTAL TABULATION)
U+000B <control> (VERTICAL TABULATION)
U+000C <control> (FORM FEED)
U+0020 SPACE

Also included are the line‐breaking characters [3]:

U+000A <control> (LINE FEED)
U+000D <control> (CARRIAGE RETURN)
U+000A U+000D <control> (LINE FEED) <control> (CARRIAGE RETURN)

Therefore, the |class| attribute in your test should be equivalent to |a b c d e f&#x2003;g&#x3000;h|.

[1] http://www.w3.org/TR/html401/struct/global.html#adef-class
[2] http://www.w3.org/TR/html401/struct/text.html#h-9.1
[3] http://www.w3.org/TR/html401/struct/text.html#line-breaks
But unless HTML defines normalization, what CSS matches is defined by CSS, which is different.
HTML 5, at http://www.whatwg.org/specs/web-apps/current-work/#space , defines white space as: "U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR)".

css3-selectors says: "Only the characters "space" (U+0020), "tab" (U+0009), "line feed" (U+000A), "carriage return" (U+000D), and "form feed" (U+000C) can occur in white space."

What we currently implement (in both places) uses nsCRT::IsAsciiSpace, which NS_IsAsciiWhitespace, which checks for ' ', '\r', '\n', and '\t'.

So it seems like the following characters count as whitespace:

U+0020: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000D: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000A: HTML 4.01, HTML 5, css3-selectors, current implementation
U+0009: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000C: HTML 4.01, HTML 5, css3-selectors
U+000B: HTML 4.01, HTML 5

I'd note that Hixie's test tests U+000C, but not U+000B.

I'm curious what other implementations do here.
Webkit trunk from a few days/weeks ago follows css3-selectors.

IE7 follows HTML.

Opera 9.27 and 9.5b2 consider all but two of the characters as whitespace.
I think your summary of HTML4 was actually incorrect, though.  It makes U+200B (zero-width space) a whitespace character, not U+000B (vertical tab).

So what IE7 matches is HTML5.

Comment 13

11 years ago
(In reply to comment #8)
> But unless HTML defines normalization, what CSS matches is defined by CSS,
> which is different.

I figured that the HTML parser would split the names based on its own white space rules then pass the separated names to the CSS parser for use with class selectors. I guess not.

(In reply to comment #12)
> I think your summary of HTML4 was actually incorrect, though.  It makes U+200B
> (zero-width space) a whitespace character, not U+000B (vertical tab).

Yes, I noticed this myself and was going to correct it before you replied. Somehow I turned a two into a zero.

> So what IE7 matches is HTML5.

Using your comment #10 test case as a measure, Win. Internet Explorer 8 Beta 1 (8.0.6001.17184) treats all of the following characters as white space for the purposes of class selector matching: U+0009, U+000A, U+000B, U+000C, U+000D, and U+0020. (Interestingly enough, this includes U+000B.) The “Emulate IE7” mode behaves identically.

Based on comment #9, IE8 and its IE7 emulation mode match the rules of css3-selectors and do not match HTML 4.01/5 rules (U+200B is missing).

Comment 14

11 years ago
(In reply to comment #13)
> Based on comment #9, IE8 and its IE7 emulation mode match the rules of
> css3-selectors and do not match HTML 4.01/5 rules (U+200B is missing).

Sorry, it matches HTML 5 and not HTML 4.01.

So, for now, I'm going to patch this to match WebKit and css3-selectors, since that's the smaller change and the obviously correct part.

I think HTML's definition of the class attribute needs to match css3-selectors's definition of attribute selectors (but that need not match other whitespace handling in either language), but I don't have the energy for the spec fight right now.
Assignee: nobody → dbaron
Created attachment 324389 [details] [diff] [review]
patch

This does as I described in my previous comment -- adding only U+000C (form feed) to our set of whitespace characters.
Attachment #324389 - Flags: superreview?(jonas)
Attachment #324389 - Flags: review?(bzbarsky)
Comment on attachment 324389 [details] [diff] [review]
patch

r=bzbarsky
Attachment #324389 - Flags: review?(bzbarsky) → review+
I'd like to make HTML5 consistent with CSS and with the most widely deployed browser. My impression is that changing CSS will be easier. Any chance we can push back on the CSSWG and have CSS include the same whitespace characters?
Sure, I'm ok with going that direction; I almost did that yesterday, actually.
k, mailed the csswg
Created attachment 324515 [details]
more thorough test of implementations (2)

Here's a similar one that works in IE.

IE's CSS parsing is really hard to test since it recovers from errors in ridiculous ways (e.g., it accepts "1pxA2pxA3pxA4px" as a value for margin, meaning "1px 2px 3px 4px" -- this makes it hard to tell what is whitespace versus what's "not part of identifiers", though this test tries two approaches).

But it seems that IE uses the same or very similar lists of characters for whitespace in HTML and CSS, and I'd really rather not change CSS's overall definition of whitespace here, since that's been stable for over a decade and is implemented correctly by everyone other than IE.  So, actually, I think HTML5 should change.
Attachment #324515 - Attachment is patch: false
Attachment #324515 - Attachment mime type: text/plain → text/html; charset=UTF-8
Comment on attachment 324389 [details] [diff] [review]
patch

_Personally_ I would prefer if what's considered whitespace was consistent across CSS, HTML and XML. And even better if that list was as short as possible.

But I'll rely on dbaron here and he says this is what we want to do at least for now :)
Attachment #324389 - Flags: superreview?(jonas) → superreview+
I removed U+000B from HTML5.
Fixed:
http://hg.mozilla.org/mozilla-central/index.cgi/rev/b220fe3a520b
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.1

Updated

8 years ago
Duplicate of this bug: 106311
You need to log in before you can comment on or make changes to this bug.