Closed Bug 254337 Opened 20 years ago Closed 16 years ago

className attribute on DOM-created elements prematurely stripped of trailing whitespace

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jkilmer, Unassigned)

References

()

Details

(Keywords: testcase)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.1) Gecko/20040707
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.1) Gecko/20040707

Starting with the Mozilla 1.7 / Firefox .9x releases, trailing whitespace is
being prematurely stripped (?) from the className attribute of elements
constructed in the DOM using Javascript.  Prior to 1.7/.9x, Mozilla was handling
this "correctly" -- or, at least was consistent with other browsers.

In 1.7 and later, if a className attribute is set using postfix-whitespace as in
the test case at http://penguin.theopalgroup.com/~jkilmer/testcase.html, the
whitespace is immediately removed.  Thus, if an additional style is appended on,
the result is className="class1class2" instead of "class1 class2", which is not
valid with respect to the intended style sheet, and results in effectively no
styles being applied at all.  

For example, the following breaks:
   x.className = "class1 "
   x.className += "class2"

One trivial solution is to switch to prefix-whitespace usage, as in:
  x.className = "class1"
  x.className += " class2"

... which will work.  I've searched through the DOM spec, and can't find any
explicit guidance on this, so it may be a new bug, or it may be some obscure
spec rule I can't find, and Mozilla is now the only browser doing correctly... 
In either event, there seems to be ample examples of both syntaxes floating
around on the web.

Any thoughts?  Thanks for your time.


Reproducible: Always
Steps to Reproduce:
sicking, this is fallout from your attr landing...

The basic issue, of course, is that className is not round-trippable.  This bug
is a particular instance of the problem, but anyone trying to manipulate
className as a string could run into other issues of this sort.
yeah, this was known when i designed this. I'm not sure what to do. We could of
course keep the classname both as a string and as an atomlist. It wouldn't be
hard codewise, it'd just take up more memory.

Another alternative would be to store information about how many whitespace
characters were found between each atom. That would solve things most of the
time, but wouldn't cover things like tabs or newlines as whitespace.

One solution i just thought about now is to optionally store a string with the
atomlist, but only do it when 'extra' or non-0x20 whitespace were found. That
would only cost 4 bytes most of the time. And it wouldn't cost anything in the
case of a single atom without any whitespace (since we then store an atom rather
then an atomlist)
Status: UNCONFIRMED → NEW
Ever confirmed: true
*** Bug 339992 has been marked as a duplicate of this bug. ***
(In reply to comment #2)
> yeah, this was known when i designed this. I'm not sure what to do. We could of
> course keep the classname both as a string and as an atomlist. It wouldn't be
> hard codewise, it'd just take up more memory.

The issue here is one of compatibility.  Every other browser stores class names as strings (apparently, though no idea what else they are doing) and does not strip preceding or trailing white space.  

If it's a choice between efficiency and compatibility, I suggest compatibility.
Without addressing the issue of compatibility with other browsers, we did go through the process of "fixing" our code to eliminate this issue, which was easy if annoying.  Efficiency may not be the primary concern.

The official spec for the matter (http://www.w3.org/TR/1999/REC-html401-19991224/types.html#type-cdata) states that any CR and tab characters in a declaration can be replaced with a single space, and that user agents can ignore leading or trailing whitespace on a className declaration.  In fact, it specifically states: "Authors should not declare attribute values with leading or trailing white space" and blesses agents which ignore them.

So while I was certainly annoyed with it enough to file this bug way back when, it would seem to be trivially worked around on the developer's part, and officially, Gecko isn't doing anything wrong here.  So that leaves us at: Follow the spec, or follow the pack and be conveniently broken like everyone else... ;)
(In reply to comment #5)
> So while I was certainly annoyed with it enough to file this bug way back when,
> it would seem to be trivially worked around on the developer's part, and
> officially, Gecko isn't doing anything wrong here.  So that leaves us at:
> Follow the spec, or follow the pack and be conveniently broken like everyone
> else... ;)

Thanks for the reference.  Still, I think it's bad form to force millions of web site developers to deal with two different implementations unnecessarily.

A good example of this was the XMLHttp object, which in IE is an ActiveX control and everywhere else is a native object.  Even Microsoft decided to make it a native object in IE7 to be compatible with everyone else.

Two implementation require developers to insert additional code in their pages, bloating them out and increasing the download sizes.  This isn't much of a problem with one different implementation, but these things are cumulative and pretty soon you have code with all kinds of different checks and code paths for different code.

Maybe it's just me, but when you're the lone man out, and everyone else is doing it a different way, maybe you should just go along with the crowd unless there is a major reason not to.
Blocks: acid3
Assignee: general → nobody
QA Contact: ian → general
(In reply to comment #5)
> The official spec for the matter
> (http://www.w3.org/TR/1999/REC-html401-19991224/types.html#type-cdata) states
> that any CR and tab characters in a declaration can be replaced with a single
> space, and that user agents can ignore leading or trailing whitespace on a
> className declaration.  In fact, it specifically states: "Authors should not
> declare attribute values with leading or trailing white space" and blesses
> agents which ignore them.

Hixie, what do you say to this?
It's not clear to me whether the HTML4 text refers to the parsing rules, or the rules for interpreting the value once it is in the DOM. Since Mozilla only does this for certain attributes, not all of them (e.g. it doesn't strip spaces at the start of 'title' attributes) my impression is that it is being interpreted as the latter. FWIW, HTML5 defines this to explicitly require attributes not to be munged.
HTML spec says that class names are a list separated by "white space characters" and then goes on to define that set of characters as space, tab, form feed, zero-width space, line feed, and carriage return.  Therefore I don't really see any solution to this bug other than carrying around the exact string value provided by the called, alongside the internal AtomArray.
OS: Windows 2000 → All
Hardware: PC → All
Note that the existing test being used for the delimiter, nsCRT::IsAsciiSpace, doesn't meet the requirements because it doesn't include ASCII 0x0C form feed nor U+200B zero-width space.  Not really sure why the W3C felt compelled to include those.
I think that for any attribute, if reserializing the parsed form doesn't give the original string, then we should store the original string along with the parsed form.
Seems like that's doable in the current storage framework. Strings and atoms always roundtrip. For integers, colors, enums and percentages, if the parsed value doesn't roundtrip then fall back to the MiscContainer. Give the MiscContainer an nsString member to hold the original string --- but only fill it in if it's needed.
Looks like there's a patch in related bug 433533.
Depends on: 433533
Keywords: testcase
yes
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.