Closed Bug 153777 Opened 23 years ago Closed 16 years ago

Entities should not be parsed in HTML ID attribute values

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: bugmail, Unassigned)

References

()

Details

(Keywords: testcase)

Attachments

(2 files, 1 obsolete file)

Character references in ID and class attribute values should not be interpreted to correspond to their decoded counterparts in style data, since such references are not allowed in ID and class attribute values. That is, an element <p id="&#68;"> should not get style data from a statement with the selector, "#D", and likewise for classes.
Attached file Reduced testcase (obsolete) —
Scratch the part about classes; character references are allowed in those. My mistake.
Summary: Character references in IDs and classes should not be interpreted for style purposes → Character references in IDs should not be interpreted for style purposes
Attachment #88880 - Attachment is obsolete: true
I believe the CLASS part of this bug is invalid, since http://www.w3.org/TR/html4/struct/global.html#h-7.5.2 says that class is a cdata-list, whose definition explicitly includes character entity handling, and the definition of CLASS makes no exception. The issue with IDs is valid but is a bug in the HTML parser, not the style system.
Assignee: dbaron → harishd
Component: Style System → Parser
QA Contact: ian → moied
Summary: Character references in IDs should not be interpreted for style purposes → Entities should not be handled in HTML ID attributes
I submitted the previous comment over a mid-air collision since I wanted to override the changes. Confirming bug, since the collision handling didn't do that right the first time.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Blocks: 153699
The bug will be hard to fix, unfortunately. Our attribute parsing more or less assumes that all attributes are CDATA, so at the point where the tokenizer actually expands entities in attributes, it doesn't know an ID from its own right elbow. This might change when harishd tries to remove the tokenizer.
Keywords: testcase
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → Future
Also reproduced using Win32/2003010408. Setting All/All.
OS: MacOS X → All
Hardware: Macintosh → All
Summary: Entities should not be handled in HTML ID attributes → Entities should not be parsed in HTML ID attribute values
Christopher, is there a bug about harishd's change you refer to in your comment 6?
Bug 105138, but it's very blue-sky at the moment; harishd has essentially placed parser in maintenance mode (he's working mostly on SOAP/P3P/WSDL), and it's rather regression-prone, to say the least. There might be an opportunity for this in 1.5 alpha, but I don't know who'd do the work...
Chris: I did land a patch, some time last year, to reduce heap overhead caused by the parser nodes ( bug 177994 ), however I never got the time to spend on tokens. That said, eliminating tokens is non trivial and I dare not do it :(
According to http://www.w3.org/TR/html4/types.html#type-id ID's should all match the regexp: [a-zA-Z][a-zA-Z0-9:._-]* I believe this is relevant, as another side effect of parsing the ID attribute as CDATA is that it allows other illegal ID forms such as id="42" id="Div 1" id="item#3" None of these are matched by css selectors, but all are accessible via javascript. (See attached example in #11.)
Assignee: harishd → nobody
Status: ASSIGNED → NEW
QA Contact: moied → parser
I'm taking the liberty to mark this WONTFIX per HTML5.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: