The default bug view has changed. See this FAQ.

acid3 test 33 fails: whitespace error in class processing

RESOLVED FIXED in mozilla1.9.1a1

Status

()

Core
CSS Parsing and Computation
RESOLVED FIXED
9 years ago
6 years ago

People

(Reporter: dbaron, Assigned: dbaron)

Tracking

Trunk
mozilla1.9.1a1
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(5 attachments)

(Assignee)

Description

9 years ago
Test 33 in http://acid3.acidtests.org/ fails with the message "whitespace error in class processing".  I'll attach a simplified testcase (which Webkit passes).

I still need to figure out what spec says our behavior is wrong and what it says we should do instead.
(Assignee)

Comment 1

9 years ago
Created attachment 324236 [details]
simplified testcase
Blocks: 410460
(Assignee)

Comment 2

9 years ago
CSS 2.1 says:

[att~=val]
    Match when the element's "att" attribute value is a space-separated list of "words", one of which is exactly "val". If this selector is used, the words in the value must not contain spaces (since they are separated by spaces). 


... that was a change from earlier drafts, I think.  I think Acid3 is testing something that's since been changed in the specs.
(Assignee)

Comment 3

9 years ago
It's actually always been that way in CSS2 / CSS 2.1.

I do recall the WG discussing the discrepancy at some point, though.
(Assignee)

Comment 4

9 years ago
It's also possible the discussion was about the internal discrepancy in css3-selectors between the summary table in section 2 and the text in 6.3.1, but that looks like it was fixed by Hixie in http://lists.w3.org/Archives/Member/w3c-css-wg/2006JanMar/0073.html without it being discussed elsewhere.  Then again, I really do remember a discussion, but I can't find any record of it.

Maybe the discussion I remember is this one, though:
http://lists.w3.org/Archives/Public/www-style/2005Mar/0107.html
http://lists.w3.org/Archives/Public/www-style/2005Mar/0111.html
http://lists.w3.org/Archives/Public/www-style/2005Mar/0120.html
http://lists.w3.org/Archives/Public/www-style/2007Mar/0000.html
http://lists.w3.org/Archives/Public/www-style/2007Mar/0001.html
(Assignee)

Comment 5

9 years ago
Ah, the discussion I was remembering was this one, I think:
http://lists.w3.org/Archives/Member/w3c-css-wg/2005OctDec/0093.html
http://lists.w3.org/Archives/Member/w3c-css-wg/2005OctDec/0158.html
http://lists.w3.org/Archives/Member/w3c-css-wg/2005OctDec/0160.html
... which resulted in the change being made to css3-selectors between:
http://www.w3.org/TR/2001/CR-css3-selectors-20011113/#attribute-selectors
and
http://www.w3.org/TR/2005/WD-css3-selectors-20051215/#attribute-representation

So in this case, the newer change was to css3-selectors, and this probably is a bug.  But we should probably get the change backported to CSS 2.1.
(Assignee)

Comment 6

9 years ago
So, it turns out we need to fix this bug separately for ~= and for class.

For ~= we need to patch ValueIncludes in nsCSSRuleProcessor.cpp.

class attribute parsing, however, lives somewhere in content code, perhaps multiple places.

Comment 7

9 years ago
(In reply to comment #0)
> Test 33 in http://acid3.acidtests.org/ fails with the message "whitespace error
> in class processing".  I'll attach a simplified testcase (which Webkit passes).
> 
> I still need to figure out what spec says our behavior is wrong and what it
> says we should do instead.

Based exclusively upon looking at the test case that you uploaded in comment #2, the relevant specification seems to be that of HTML 4.01. Section 7.5.2 [1] says that class names specified in |class| attribute values need to be separated by white space characters. White space characters are defined in section 9.1 [2] to be the following:

U+0009 <control> (HORIZONTAL TABULATION)
U+000B <control> (VERTICAL TABULATION)
U+000C <control> (FORM FEED)
U+0020 SPACE

Also included are the line‐breaking characters [3]:

U+000A <control> (LINE FEED)
U+000D <control> (CARRIAGE RETURN)
U+000A U+000D <control> (LINE FEED) <control> (CARRIAGE RETURN)

Therefore, the |class| attribute in your test should be equivalent to |a b c d e f&#x2003;g&#x3000;h|.

[1] http://www.w3.org/TR/html401/struct/global.html#adef-class
[2] http://www.w3.org/TR/html401/struct/text.html#h-9.1
[3] http://www.w3.org/TR/html401/struct/text.html#line-breaks
(Assignee)

Comment 8

9 years ago
But unless HTML defines normalization, what CSS matches is defined by CSS, which is different.
(Assignee)

Comment 9

9 years ago
HTML 5, at http://www.whatwg.org/specs/web-apps/current-work/#space , defines white space as: "U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR)".

css3-selectors says: "Only the characters "space" (U+0020), "tab" (U+0009), "line feed" (U+000A), "carriage return" (U+000D), and "form feed" (U+000C) can occur in white space."

What we currently implement (in both places) uses nsCRT::IsAsciiSpace, which NS_IsAsciiWhitespace, which checks for ' ', '\r', '\n', and '\t'.

So it seems like the following characters count as whitespace:

U+0020: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000D: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000A: HTML 4.01, HTML 5, css3-selectors, current implementation
U+0009: HTML 4.01, HTML 5, css3-selectors, current implementation
U+000C: HTML 4.01, HTML 5, css3-selectors
U+000B: HTML 4.01, HTML 5

I'd note that Hixie's test tests U+000C, but not U+000B.

I'm curious what other implementations do here.
(Assignee)

Comment 10

9 years ago
Created attachment 324380 [details]
test of implementations
(Assignee)

Comment 11

9 years ago
Webkit trunk from a few days/weeks ago follows css3-selectors.

IE7 follows HTML.

Opera 9.27 and 9.5b2 consider all but two of the characters as whitespace.
(Assignee)

Comment 12

9 years ago
I think your summary of HTML4 was actually incorrect, though.  It makes U+200B (zero-width space) a whitespace character, not U+000B (vertical tab).

So what IE7 matches is HTML5.

Comment 13

9 years ago
(In reply to comment #8)
> But unless HTML defines normalization, what CSS matches is defined by CSS,
> which is different.

I figured that the HTML parser would split the names based on its own white space rules then pass the separated names to the CSS parser for use with class selectors. I guess not.

(In reply to comment #12)
> I think your summary of HTML4 was actually incorrect, though.  It makes U+200B
> (zero-width space) a whitespace character, not U+000B (vertical tab).

Yes, I noticed this myself and was going to correct it before you replied. Somehow I turned a two into a zero.

> So what IE7 matches is HTML5.

Using your comment #10 test case as a measure, Win. Internet Explorer 8 Beta 1 (8.0.6001.17184) treats all of the following characters as white space for the purposes of class selector matching: U+0009, U+000A, U+000B, U+000C, U+000D, and U+0020. (Interestingly enough, this includes U+000B.) The “Emulate IE7” mode behaves identically.

Based on comment #9, IE8 and its IE7 emulation mode match the rules of css3-selectors and do not match HTML 4.01/5 rules (U+200B is missing).

Comment 14

9 years ago
(In reply to comment #13)
> Based on comment #9, IE8 and its IE7 emulation mode match the rules of
> css3-selectors and do not match HTML 4.01/5 rules (U+200B is missing).

Sorry, it matches HTML 5 and not HTML 4.01.

(Assignee)

Comment 15

9 years ago
So, for now, I'm going to patch this to match WebKit and css3-selectors, since that's the smaller change and the obviously correct part.

I think HTML's definition of the class attribute needs to match css3-selectors's definition of attribute selectors (but that need not match other whitespace handling in either language), but I don't have the energy for the spec fight right now.
Assignee: nobody → dbaron
(Assignee)

Comment 16

9 years ago
Created attachment 324389 [details] [diff] [review]
patch

This does as I described in my previous comment -- adding only U+000C (form feed) to our set of whitespace characters.
Attachment #324389 - Flags: superreview?(jonas)
Attachment #324389 - Flags: review?(bzbarsky)
Comment on attachment 324389 [details] [diff] [review]
patch

r=bzbarsky
Attachment #324389 - Flags: review?(bzbarsky) → review+
I'd like to make HTML5 consistent with CSS and with the most widely deployed browser. My impression is that changing CSS will be easier. Any chance we can push back on the CSSWG and have CSS include the same whitespace characters?
(Assignee)

Comment 19

9 years ago
Sure, I'm ok with going that direction; I almost did that yesterday, actually.
(Assignee)

Comment 20

9 years ago
Created attachment 324491 [details]
more thorough test of implementations
k, mailed the csswg
(Assignee)

Comment 22

9 years ago
Created attachment 324515 [details]
more thorough test of implementations (2)

Here's a similar one that works in IE.

IE's CSS parsing is really hard to test since it recovers from errors in ridiculous ways (e.g., it accepts "1pxA2pxA3pxA4px" as a value for margin, meaning "1px 2px 3px 4px" -- this makes it hard to tell what is whitespace versus what's "not part of identifiers", though this test tries two approaches).

But it seems that IE uses the same or very similar lists of characters for whitespace in HTML and CSS, and I'd really rather not change CSS's overall definition of whitespace here, since that's been stable for over a decade and is implemented correctly by everyone other than IE.  So, actually, I think HTML5 should change.
(Assignee)

Updated

9 years ago
Attachment #324515 - Attachment is patch: false
Attachment #324515 - Attachment mime type: text/plain → text/html; charset=UTF-8
Comment on attachment 324389 [details] [diff] [review]
patch

_Personally_ I would prefer if what's considered whitespace was consistent across CSS, HTML and XML. And even better if that list was as short as possible.

But I'll rely on dbaron here and he says this is what we want to do at least for now :)
Attachment #324389 - Flags: superreview?(jonas) → superreview+
I removed U+000B from HTML5.
(Assignee)

Comment 25

9 years ago
Fixed:
http://hg.mozilla.org/mozilla-central/index.cgi/rev/b220fe3a520b
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.1
(Assignee)

Updated

9 years ago
Target Milestone: mozilla1.9.1 → mozilla1.9.1a1
Duplicate of this bug: 106311
You need to log in before you can comment on or make changes to this bug.