Closed Bug 44372 Opened 25 years ago Closed 25 years ago

& becomes & in attributes (when saving from editor)

Categories

(Core :: DOM: Serializers, defect, P2)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: hobbit_mak, Assigned: akkzilla)

References

()

Details

(Keywords: testcase, Whiteboard: [nsbeta3+][p:2])

Attachments

(2 files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m16) Gecko/20000621 BuildID: 2000062108 I wrote html <IMG SRC="http://s003.ultraranking.com/cgi-shl/nph-ultra.asp?515&amp;*&amp;none" class=nobdr ISMAP alt="Ultra Ranking"> as in HTML 4.01 B2.2(http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2) Mozilla changed &amp; to & Reproducible: Always Steps to Reproduce: 1. Edit page of URI 2. Save it to local file 3. &amp; changed to & only. &amp; should not be changed.
Looks like the output system needs to convert ampersands to &amp; on output. Naoki: shouldn't the entity converter be handling that? Or should we get eHTMLTag_entity for the ampersand, in which case we'd handle it directly in nsHTMLContentSinkStream::AddLeaf? Cc'ing Harish, who might know.
Assignee: beppe → nhotta
Component: Editor → DOM to Text Conversion
Oops, forgot to cc some people who wanted to be cc'ed when I made that last comment.
Status: UNCONFIRMED → NEW
Ever confirmed: true
>Naoki: shouldn't the entity converter be handling that? No, the entity converter's input can be HTML data which may already contain '&'. So it doen not generate &quot, &amp, &lt, &gt.
Okay, I thought the job of the entity converter was to convert unicode-encoded entities into the & entity form. I thought that was the whole reason we were using it. If it's not the entity encoder that does this, there must be a service somewhere that does. Clearly the output sink is the wrong place to maintain the list of all possible entities in their unicode and ampersand representations; I'm sure this code has already been written somewhere else. Any idea what service does handle this?
It is possible to add a function to the entity converter which first converts &quot, &amp, &lt, &gt then do the conversion for everything else.
Well, this is about CDATA encoding within %uri context... <a href="foo.cgi?a=<hmmm>">hmmm</a> (1) <a href="foo.cgi?a=%3chmmm%3e">hmmm</a> (2) <a href="foo.cgi?a=&lt;hmmm&gt;">hmmm</a> (3) (1) is not valid URI syntax. (2) is. In html however one is allowed to encode it as (3) which will be written out by the editor as (1), but should be written out as (2) or (3). Same (more so) for &, %26 and &amp;. (2) appeals to me since that's what the browser will need anyway.
Marking this All All (guessing it's XP ;-) )
OS: Windows 2000 → All
Hardware: PC → All
Okay, trying for the third time to make this comment (the first time, mozilla crashed, the second time, bugzilla lost my comment in a midair collision and "submit anyway" failed but caused the form cache to lose the original info). Guess I'd better do it in small chunks this time.
Assignee: nhotta → akkana
Harish and I talked about this for a while. This only happens in attributes. It turns out that the code that currently decides whether to generate inflated entities (in text nodes, but not inside attributes) lives in nsXIFConverter, so it makes sense for the attribute-expansion code to live there, too. I'll do this part.
Status: NEW → ASSIGNED
Summary: &amp; becomes & → &amp; becomes & in attributes
Target Milestone: --- → M18
However, there are problems with that. What happens if we expand the entity recognition code to include quotes (currently it's only &, <, >) and someone has an attribute bordered by quotes but including a &quot;? Not clear that there's anything the nsXIFConverter can do in this case -- it doesn't have enough information on the difference between the two quotes. Harish said he'd look into trying to find a way for the parser to retain information about which characters were originally entities, which would solve this problem the right way. However, he's not sure that this is possible, so for now I'm taking the bug to do the simpler fix.
Keywords: correctness, nsbeta3
setting to nsbeta3+
Whiteboard: nsbeta3+
setting priority in status whiteboard
Priority: P3 → P4
Whiteboard: nsbeta3+ → [nsbeta3+][p:4]
Moving up to P2 so I'll have a fix early enough that Harish can decide whether he has time to do the other part.
Priority: P4 → P2
Whiteboard: [nsbeta3+][p:4] → [nsbeta3+][p:2]
cCanging summary so that I don't panic each time I see this bug (because on the browser side of things '&amp;' *should* become '&' in attributes!).
Keywords: testcase
Summary: &amp; becomes & in attributes → &amp; becomes & in attributes (when saving from editor)
Whiteboard: [nsbeta3+][p:2] → [nsbeta3+][p:2] FIX IN HAND, AWAITING REVIEW
Fixed.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Scott showed me a more efficient way to do this; attaching patch (which I'm about to check in).
Whiteboard: [nsbeta3+][p:2] FIX IN HAND, AWAITING REVIEW → [nsbeta3+][p:2]
Attached patch Patch from ScottSplinter Review
verified in 8/30 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: