Closed Bug 121040 Opened 23 years ago Closed 14 years ago

Attribute values not normalized in XML

Categories

(Core :: XML, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID
Future

People

(Reporter: brant, Unassigned)

References

Details

(Keywords: testcase, xhtml)

Attachments

(2 files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:0.9.7) Gecko/20011221
BuildID:    2001122106

Mozilla is rendering multiple spaces when it should be rendering only one.  I
quote from the XHTML 1.0 specification:

4.7 Whitespace handling in attribute values

In attribute values, user agents will strip leading and trailing whitespace from
attribute values and map sequences of one or more whitespace characters
(including line breaks) to a single inter-word space (an ASCII space character
for western scripts). See Section 3.3.3 of [XML].

I could not find the section in the HTML 4.01 specs that addresses this.  It
seems to only address extra white space between characters.

Reproducible: Always
Steps to Reproduce:
1. Open example URL.
2. The first line of the third paragraph contains "WYSIWYG" as an acronym with
title attribute.

Actual Results:  I believe the title attribute is rendered incorrectly.  It is
rendered with multiple spaces when it should, to my knowledge, be rendered with
a single space in place of the multiple spaces.

Expected Results:  I believe, to my knowledge that it should be rendered with a
single space in place of the multiple spaces.

The suspect HTML code is:
<acronym
title="What You See Is What You (might)                                
Get">"WYSIWYG</acronym>

Notice, there are multiple spaces after (might).
Over to parser.  This happens on linux too.
Assignee: asa → harishd
Status: UNCONFIRMED → NEW
Component: Browser-General → Parser
Ever confirmed: true
OS: Windows XP → All
QA Contact: doronr → moied
Hardware: PC → All
Similar to bug 47078.  There's some bugs floating out there about strippingspaces in non-CDATA attributes too (for instance), but I don't remember ifthere's anything on this specifically.
The people at the original URL removed the extra space.  It can no longer be
used as an example of the problem.
URL removed because it is no longer an example
Status: NEW → ASSIGNED
This is a test case for the white space in acronym title bug.  The rendered
tool tip will contain five spaces between "test" and "case" although it should
be rendered with only one space between the two.
Simplified test case has been added.
Keywords: testcase
All tested browsers get this wrong: IE6, Mozilla, W3C's Amaya, Opera.
Not a high priority. 

This bug has been marked "future" because the original netscape engineer working 
on this is over-burdened. If you feel this is an error, that you or another 
known resource will be working on this bug,or if it blocks your work in some way 
-- please attach your concern to the bug for reconsideration. 

Target Milestone: --- → Future
It isn't uncommon for editors to wordwrap and pretty print their HTML.  If this
happens to any title attribute, then this problem will occur.  I think it has a
fair likelihood of repetition.
Repurposing this bug for the XML case, HTML case is a duplicate of bug 47078. 
Maybe heikki's bug?
Summary: title attribute of acronym element rendered incorrectly → Attribute values not normalized in XML
Attachment #69987 - Attachment mime type: text/html → application/xhtml+xml
Keywords: xhtml
The relative seriousness of the bug shows up better if CR/LFs are inserted into
the title-attributes. Instead of the CR/LFs, strange looking characters are
displayed. An extra incentive should be that IE6 handles the CR/LFs correctly.
*** Bug 299365 has been marked as a duplicate of this bug. ***
Moving to XML based on comment 10.
Assignee: harishd → xml
Status: ASSIGNED → NEW
Component: HTML: Parser → XML
QA Contact: moied → ashshbhatt
Assignee: xml → nobody
QA Contact: ashshbhatt → xml
Marking invalid. XHTML1 misrepresents the XML specification here. XML states: [1]

> If the attribute type ***is not CDATA,*** then the XML processor MUST further
> process the normalized attribute value by discarding any leading and trailing
> space (#x20) characters, and by replacing sequences of space (#x20) characters
> by a single space (#x20) character.

(Emphasis mine.) This makes attachment 69987 [details] incorrect, as @title *is* CDATA. [2]
However, we don't seem to normalize for non-CDATA attributes either, but as we aren't a validator, and we don't actually read the full DTD, the following exception applies: [1]

> All attributes for which no declaration has been read SHOULD be treated by a
> non-validating processor as if declared CDATA.

... which makes our behavior for those attributes correct as well.

[1] http://www.w3.org/TR/REC-xml/#AVNormalize
[2] http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-strict.dtd_coreattrs
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: