Closed Bug 20253 Opened 26 years ago Closed 26 years ago

when parsing html, eHTMLTag_entity GetText includes ; but not &

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: akkzilla, Assigned: akkzilla)

Details

When parsing XIF to the nsHTMLContentSinkStream, when we get an eHTMLTag_entity tag in AddLeaf at line 996, GetText() returns the inner part of the entity (e.g. "lt"). But when we're parsing html, GetText for an eHTMLTag_entity includes the semicolon. This causes us to get double semicolons in html output that was originally parsed from html (e.g. in automated tests, or in a stream converter). To see this, build in htmlparser/tests/outsinks, add a printf or set a breakpoint to see what GetText is returning, then go to dist/bin and run: TestOutput -i text/html -o text/html -f 0 OutTestData/simple.html I don't see a way to hack in a temporary workaround, because nsHTMLContentSinkStream doesn't know whether it's being called from parsing XIF or HTML; it needs to be able to rely on the result being consistent either way.
Assignee: harishd → akkana
This isn't a bug in the parser. When the HTML file is loaded, the entity is stored as it was seen, so the semicolon may or may not be present. We don't strip the semicolons. The XIF buffer that is provided to the XIFDTD, and subsequently to the nsHTMLContentSinkStream has stripped the semicolons from the entities. There's nothing I can do about that. The semi's need to come out of the content model if they we present when we read the file.
Status: NEW → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
Rick and I discussed this: turns out that the ; isn't actually required, and isn't always there, so the parser includes it when it is there. The &, on the other hand, is required, so the parser doesn't bother to include it. We've changed the nsXIFDTD to include the semicolon like the CNavDTD does, and removed the inclusion of the semicolon from the sink.
Status: RESOLVED → VERIFIED
QA: you can't verify this with a release build, and no one else cares, so I'll mark it verified.
You need to log in before you can comment on or make changes to this bug.