Closed Bug 95754 Opened 23 years ago Closed 23 years ago

Should strip spaces from ends of CDATA attributes

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

VERIFIED DUPLICATE of bug 87894
mozilla1.0

People

(Reporter: jim_nance, Assigned: harishd)

References

()

Details

Attachments

(1 file)

This page does not layout correctly.  Parts of it are just missing.  It works
Mozilla decided that I wanted to submit that last piece all by itself.  I guess
that is another bug :-)  Anyway I was saying that it renders fine with Netscape
4.77.  I tried with mozilla 0.9.3 and with last nights build and both of them
fail to render the page.
Hmm Composer pulls it in better.. you can look at its "fixed" source and see the
problems.  I think that all those
<img src="(newline)
(spaces)imagename" ....>and
<a href="(newline)
(spaces)link" ....>
are really screwing things up.
I don't know how to classify this though, INVALID or not..
I just don't understand why it wasn't written
<img(newline)
(spaces)src="imagename" ....>
and
<a(newline)
href="link" ....>

Anyway this is why the images aren't coming up.. I guess this is affecting
alignment too which may be actually be a bug..
OS: Linux → All
Hardware: PC → All
This is probably evangelism, see bug 42287.
Reassigning to parser.
Assignee: karnaze → harishd
Component: Layout → Parser
QA Contact: petersen → bsharma
Actually, HTML 4.01 (6.2) says this about CDATA attributes:

| User agents may ignore leading and trailing white space in CDATA attribute
| values (e.g., "   myval   " may be interpreted as "myval"). Authors should
| not declare attribute values with leading or trailing white space.

Because we're already ignoring '\t', '\n', '\r' and '\b', I think we should
ignore spaces too. This would be a simple 2-line change at
http://lxr.mozilla.org/seamonkey/source/content/html/document/src/nsHTMLContentSink.cpp#710

This seems to be an issue for quite a few real web pages.
I haven't duped this to bug 42287, because that is only about non-CDATA
attributes. But it's basically the same bug.
Summary: Page does not render correctly → Should strip spaces from ends of CDATA attributes
*** Bug 97129 has been marked as a duplicate of this bug. ***
The patch is an easy way to fix this, but it would be even better to do it in
the tokenizer and remove HTMLContentSink::GetAttributeValueAt() altogether,
because for XHTML this applies only to non-CDATA attributes, and even then
a tab resulting from "&x9;" should not be stripped. Only the tokenizer knows
if a tab was once a character entity.
Keywords: patch
<broken_record>Perhaps we should consider this a quirks-mode-only change?</broken_record>Rationale: as you point out, XML, unlike HTML, does not allow this bit of normalization in non-CDATA attributes.  (i.e., all attributes, for our non-validating-XML-parser purposes).  People shouldn't have their pages breaking when they convert from standards mode into XML.  (Indeed, I wonder if such ambiguity in whitespace processing contravenes the SGML spec...but beating that horse is Arjun Ray's job.)
*** Bug 101087 has been marked as a duplicate of this bug. ***
QA Contact: bsharma → moied
Cc:ing bratell since LXR gives him the blame for that code which was checked in 
to fix bug 83945.
I just moved the functionality to a faster tokenizer. The tokenizer still parses
the data the same way. You may want to look at the "blame" for the code I replaced.
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.5
--> 0.9.6
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Moving lower priority bugs to m1.0.
Target Milestone: mozilla0.9.6 → mozilla1.0

*** This bug has been marked as a duplicate of 87894 ***
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Verified duplicate of 87894
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: