Closed Bug 155723 Opened 23 years ago Closed 11 years ago

innerHTML will need to be fixed to work with XHTML

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: bzbarsky, Unassigned)

References

Details

(Whiteboard: [has landed, needs testing?])

Attachments

(4 files, 1 obsolete file)

With the fix for bug 125746, we will need to do some changes to innerHTML to
make it work right with XHTML elements (eg, CDATA kids of textareas are not well
supported by the current setup).
Um, CDATA kids of textareas? Sorry folks but when XHTML is served with the
correct MIME type (application/xhtml+xml), innerHTML does not work *at all*...
Yeah, that's bug 133827.  I plan to fix it sometime in the not-too-far future. 
That's why I used the future tense when filing this bug -- "will need to fix"
instead of "need to fix".
QA Contact: lchiang → ian
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031119

<br /> are transformed into <br> when I look at the innerhtml value.

That's maybe not the best place to ask but why is the html code
transformed ?
I see that IE also transforms that code (and of course the way to
transform it is different :( ).
I agree. The innerHTML should fit the page's DOCTYPE. IE is already far behing
as it converts every tag to capitals, alphabetizes all html tag attributes, and
even unquotes non-numeric values. I think we should fix this so we can remain
forward compatible. M$ dug their own grave :p. I also think that since an XHTML
document would have proper tag endings, it would benefit the browser in speed
when working with innerHTML.
Attached patch tentative patchSplinter Review
Reasons for the changes:

Change in ConfirmPrefix():
 nsXMLContentSerializer::ConfirmPrefix(nsAString& aPrefix,
				       const nsAString& aURI)
 {
-  if (aPrefix.Equals(kXMLNS)) {
+  if (aPrefix.Equals(kXMLNS) ||
+     (aPrefix.Equals(kXMLPrefix) && aURI.Equals(kXMLNameSpaceURI))) {
     return PR_FALSE;
   }

needed to stop, e.g., <p xml:lang="en"> from becoming <p xml:lang="en"
xlmns:xml="http://www.w3.org/XML/1998/namespace">

(i.e., xml: is built-in and doesn't need to be declared)

=============
Change in nsGenericHTMLElement::SetAttrAndNotify()
+    rv = mNodeInfo->NodeInfoManager()->GetNodeInfo(aAttribute, aPrefix,
						    aNamespaceID,
						    getter_AddRefs(ni));

Needed because the prefix (e.g. on xml:lang) is lost with XHTML elements for no
apparent reasons. This has been a bug all along. But it does no harm because
nobody has been using the prefix so far. Consumers relies on the namespaceID
which is kept properly. For serialization purposes however, the prefix is not
optional, it is needed.

EXAMPLE OF OUTPUT with the patch
=================================
The output isn't pretty with the patch however... In its eagerness to produce a
self-contained output, the XML serializer gives an output with "generated"
prefices... For example, in my internal testing with view-selection-source
somewhere:

<a0:body xmlns:a0="http://www.w3.org/1999/xhtml">
<a0:span lang="test">lang</a0:span>
<a0:br/> break <a0:br/>
</a0:body>

===========
It is okay otherwise. But such an output will make people wonder was is going
on.
> Needed because the prefix (e.g. on xml:lang) is lost with XHTML elements for no
> apparent reasons.

sicking?  What's up with that?

rbs, there is also the SetInnerHTML issue, though bug 133827 covers most of
that... once that's fixed, we'll have to see what remains to be done.  For
example, what if SetInnerHTML is passed a string that's not well-formed?
Depends on: 133827
> what if SetInnerHTML is passed a string that's not well-formed?

Seems fair to fire a DOM error in the usual DOM way. (i.e., only commit changes
to the DOM if everything is ok; preserve things as they were before an invalid
SetInnerHTML).
Yeah, i noticed that we're loosing the prefix the other day too. It's actually
fairly bad since it'll affect callers of 'setAttribute'. Same problem exists in
XUL too.
Bug 45627 was the last time that a major surgery happened to the XML serializer.
Cc:ing heikki and jst. I wonder if the "generated prefixes" can go away -- c.f.
comment #5. 

Heikki was apparently puzzled by |ConfirmPrefix| too and ignored it at some
point, but re-enabled it again, writing that it was needed for scripts. But he
left it in the dark the problem that it solved? Also, jst mentioned that things
might be different with nsINodeInfo. Well, nsINodeInfo is the foundation now,
any thoughts on how to prettify the output -- beyond what is in comment #5 above?
*** Bug 240652 has been marked as a duplicate of this bug. ***
The change in comment 5 for nsGenericHTMLElement::SetAttrAndNotify() to get the
real prefix has been checked in bug 248172.
Depends on: 248172
(In reply to comment #9)
> Bug 45627 was the last time that a major surgery happened to the XML serializer.
> Cc:ing heikki and jst. I wonder if the "generated prefixes" can go away -- c.f.
> comment #5. 
> 
> Heikki was apparently puzzled by |ConfirmPrefix| too and ignored it at some
> point, but re-enabled it again, writing that it was needed for scripts. But he
> left it in the dark the problem that it solved? Also, jst mentioned that things
> might be different with nsINodeInfo. Well, nsINodeInfo is the foundation now,
> any thoughts on how to prettify the output -- beyond what is in comment #5 above?

The problem that generated prefixes fix is that anyone can insert nodes through
the DOM that have a namespace URI and a random or empty or previously existing
prefix that's totally unrelated to the prefixes declared at that point through
xmlns attributes. There's other ways to solve this, but we always have to keep
track of the prefix->namespace URI mapping.
I think your example in comment 5 could be improved by checking for an empty
prefix and use a default namespace instead of generating a prefix if there
hasn't been a default namespace set before. Or you could go all out and always
map the prefix from the DOM, it'll lead to surprising behaviour in certain edge
cases but I'm not sure we care and I think it would work.
Note that you'll still need to handle conflicting prefixes (inserting
http://foo/|x:foo and http://bar/|x:bar attributes in one element for example).
Conflicting attributes are pretty rare anyway -- but yeah, when they occur we 
need to (probably arbitrarily, since I doubt we can know which one came first) 
change one of the prefixes on output.
Blocks: 251695
This doesn't assume that non-HTML documents have to be XHTML.  You can have
other documents (eg svg, etc)... Probably doesn't matter much for serialization
purposes, but....

First issue:  The XML serializer is pretty terrible.  With this patch, if I
take:

<div id="thetarget" onclick="clickHandler()">Click me <br/>  More text here
<span>And more</span>  <img /></div>

the innerHTML of the <div/> is:

Click me <a0:br xmlns:a0="http://www.w3.org/1999/xhtml"/>  More text here
<a1:span xmlns:a1="http://www.w3.org/1999/xhtml">And more</a1:span>  <a2:img
xmlns:a2="http://www.w3.org/1999/xhtml"/>

The code really needs to be better at not outputing redundant namespace
URIs.... and perhaps at using xmlns="whatever" in preference to making up
random prefixes for nodes that started out without prefix.
BTW, these redundant namespace declarations and prefixes often break rendering.
  There are many users out there that serialize i.e. transformation result trees
to (X)HTML and the browser simply doesn't render that properly with the
redundant prefixes/declarations. Could be the chrome CSS, i dont really know. i
have even reccomended the xml method without namespaces in the result tree! The
browser will render that fine when the result tree has no namespace and the host
document is XHTML which is actually wrong behaviour.

Just wanted to also say a better serializer is needed, but *please* dont change
the interfaces in a non backwards compatible way.

Thanks for listening.
While I agree about the prefixes I wonder which namespace declaration is
redundant in

Click me <a0:br xmlns:a0="http://www.w3.org/1999/xhtml"/>  More text here
<a1:span xmlns:a1="http://www.w3.org/1999/xhtml">And more</a1:span>  <a2:img
xmlns:a2="http://www.w3.org/1999/xhtml"/>
Ah, oops.  You're right that none of those tags are nested...  The real testcase
should be:

<div id="thetarget" onclick="clickHandler()"><span>And more<img /> </span>  </div>

which gives:

<a0:span xmlns:a0="http://www.w3.org/1999/xhtml">And more<a0:img/> </a0:span>  

By using default namespaces, we could reduce the number of namespace prefixes
hanging about considerably, I think.

Manos, the XML generated here is valid (if ugly), so if there is a problem with
rendering it please file a bug on it (and cc me).
Ok, submitted bug 263144. IMHO this should be fixed via innerHTML as I also
suggest and try to back up there.
Depends on: 263225
I filed bug 263225, with patch, on the unnecessary namespace prefixes the
serializer creates.
I did some research on this.

This is NOT a bug.  It is valid according to the XHTML 1.1 standards.

Say you have valid XHTML 1.1, then someone does this: object.innerHTML =
'<b><i></b></i>', you are now officially breaking the most fundamental of all
standards rules.  Your standards are completely broken.

This is critical for two reasons:
1) XHTML 1.1 is XML and 
2) XHTML 1.1 is sent with application/xhtml+xml ONLY(not text/html!!), thus it
is parsed by the browser.

Having it parsed properly in it's static structure and then destroying it in the
dynamic structure is inappropriate.

IE team promotes the ability to write to innerHTML, the Mozilla team promotes
standards.  The two a mutually exclusive.

If you need similar functionality, then use the DOM to create the elements.

Please close this bug.
> It is valid according to the XHTML 1.1 standards.

XHTML has nothing to say about this extension to the DOM.

> then someone does this: object.innerHTML = '<b><i></b></i>'

Mozilla throws a DOM_SYNTAX_ERR exception if you try to do this in XHTML.  As
you would know if you'd, say, tried it.

> Please close this bug.

Please do your research on the code you're commenting on, not on unrelated
specifications, before commenting on bugs.
Well, the truth will prove itself in the end.

XHTML has nothing to do with the DOM.  PERIOD(well, you know that).  That's why
we use the DOM to get the functionality that XHTML cannot provide.  Two separate
worlds which can live in harmony.  You know this, you sound like a serious pro.
 Probably better than me in some areas, but I respecfully think you are flawed here.

It throws an exception to make sure you do not violate the standards.  This is
very apprpriate.  I wish IE did that.  In fact in my .NET code, in all my custom
libraries.  If you try to violate a standard.  BAM.  Exception it thrown. 
Mozilla is forcing you to be proper.  Please thank them.  I found this out the
hard way by switching to XHTML1.1 with application/xhtml+xml.  I immediately
posted in all the bug forums about this "SERIOUS" bug and that it "NEEDS" to be
fixed.  Well, then I grew up and recanted of that.  I rewrote my code to be more
standards based.  The DOM helps you do that.

Please don't be a jerk, this is not a forum.  This is not a bug, it is a
complaint about convienence which I admit that I was party to,  But I retract my
statement on calling this a bug and I wish I could remove my bug reports from a
few websites.  It was completely immature of me to do that.  You're right.  I
should have done my research first.  I'm sorry.

End of Thread.

(In reply to comment #23)
> > It is valid according to the XHTML 1.1 standards.
> 
> XHTML has nothing to say about this extension to the DOM.
> 
> > then someone does this: object.innerHTML = '<b><i></b></i>'
> 
> Mozilla throws a DOM_SYNTAX_ERR exception if you try to do this in XHTML.  As
> you would know if you'd, say, tried it.
> 
> > Please close this bug.
> 
> Please do your research on the code you're commenting on, not on unrelated
> specifications, before commenting on bugs.

David, I think you misunderstand what this bug is about.  It's about two issues,
mostly:

1)  _Reading_ the innerHTML property is broken in XHTML in some cases (while
    writing works).
2)  Both reading and writing are broken for certain (X)HTML node classes that
    are parsed "specially" in HTML (but not in XHTML, hence it being broken).

Please feel free to send me mail privately if you have any more questions about
this bug, but don't add any more long misguided comments here, ok?  Certainly
not ones in which you quote an entire previous comment.  As you said, this is
not a message board.  And this bug is not exactly a discussion.  It's just a way
to track development filed based on a discussion about innerHTML that happened
back in 2002 between one Mozilla developer (me) and another Mozilla developer
(the bug's assignee).
Comment on attachment 161091 [details] [diff] [review]
Another partial patch

I think we should just take this patch for 1.8.  It makes things much better,
though possibly not completely perfect...
Attachment #161091 - Flags: superreview?(jst)
Attachment #161091 - Flags: review?(bugmail)
Flags: blocking1.8b4?
Flags: blocking1.8b4? → blocking1.8b4+
Comment on attachment 161091 [details] [diff] [review]
Another partial patch

sr=jst
Attachment #161091 - Flags: superreview?(jst) → superreview+
Comment on attachment 161091 [details] [diff] [review]
Another partial patch

Requesting 1.8b4 approval.  This just makes the innerHTML serialize as the MIME
type of the document (so as XML for XML documents, basically).	HTML documents
are not affected, but for XHTML/XML we get a string that can be passed to the
innerHTML setter and not cause an XML parse error...
Attachment #161091 - Flags: approval1.8b4?
Attachment #161091 - Flags: approval1.8b4? → approval1.8b4+
bz, this causes lonely elements from media documents (such as from view image)
to fail because there are no serializers for img/gif & etc. Suggesting to add
this fallback (it would be more consistent to use the original containing
mimetype where these elements came from):

   [...]
+  if (!docEncoder)
+-   docEncoder = do_CreateInstance(..."text/html"...)...

   NS_ENSURE_TRUE(docEncoder, NS_ERROR_FAILURE);
Er... people are doing innerHTML for those cases?  <sigh>....  Patch coming up.
Attached patch Hackety-hack (obsolete) — Splinter Review
Attachment #189506 - Flags: superreview?(jst)
Attachment #189506 - Flags: review?(jst)
Checked in the previous patch, by the way.  We still need to deal with
HTMLElement impls that hack together a weird innerHTML getter and setter, I
think...  Not sure how many of those are left.
Yeah, there are all sort of bookmarklets these days...
The patch checked in from this bug (attachment 161091 [details] [diff] [review]) seems to have broken
ChatZilla rather badly, but don't panic just yet!

+ QueryInterface (function) 3 lines
+ message (string) 'Component returned failure code: 0x80004005
(NS_ERROR_FAILURE) [nsIDOMNSHTMLElement.innerHTML]'
+ result (number) 2147500037
+ name (string) 'NS_ERROR_FAILURE'
+ filename (string) 'chrome://chatzilla/content/static.js'
+ lineNumber (number) 1356
+ columnNumber (number) 0
+ location (object)
+ inner (object) null
+ data (object) null
+ initialize (function) 3 lines

I'm going to explain what ChatZilla does, why I think it broke, and what I think
it going to fix it.

Firstly, ChatZilla creates elements (using document.createElementNS) when
processing messages, to apply effects like *bold* and so on. They're nothing
more than html:span usually. Now, I am reasonably sure these elements are
sorta-attached to the XUL ChatZilla document at this point.

So along comes innerHTML (which we use as a mega-hack to flatten the message for
logging purposes), and it goes to get a doc encoder thingy for XUL. Oops, there
isn't one! So we fail.

Now, that's all a little dodgy to start with, but it seems that it /could/ work
(like it has since at least Mozilla 1.0). I think bz's hackety-hack patch will
fix it.

So, really, I'm just warning you that the dups (if we get any) may be a
little... strange. :)
so question... what about random xml content? what about, say, svg? do all XML
mime types that can contain XHTML need their own encoder?
(In reply to comment #31)
> Created an attachment (id=189506) [edit]
> Hackety-hack
> 

I have created a build including this patch and can verify that it fixes the
issue with chatzilla described in comment #34.
Attachment #189506 - Flags: superreview?(jst)
Attachment #189506 - Flags: superreview-
Attachment #189506 - Flags: review?(jst)
Attachment #189506 - Flags: review-
This should fix XUL to use the XML serializer...  Problem is that other
consumers of serializers will fail to deal with XUL, still.  :(
Attachment #189506 - Attachment is obsolete: true
Attachment #189651 - Flags: superreview?(jst)
Attachment #189651 - Flags: review?(jst)
(In reply to comment #37)
> Created an attachment (id=189651) [edit]
> Better hackity-hack
> 
> This should fix XUL to use the XML serializer...  Problem is that other
> consumers of serializers will fail to deal with XUL, still.  :(

I have verified that chatzilla works with this patch as well.
Comment on attachment 189651 [details] [diff] [review]
Better hackity-hack

r+sr=jst
Attachment #189651 - Flags: superreview?(jst)
Attachment #189651 - Flags: superreview+
Attachment #189651 - Flags: review?(jst)
Attachment #189651 - Flags: review+
Comment on attachment 189651 [details] [diff] [review]
Better hackity-hack

Requesting 1.8b4 approval.  more innerHTML in XML fun...
Attachment #189651 - Flags: approval1.8b4?
*** Bug 301273 has been marked as a duplicate of this bug. ***
Attachment #189651 - Flags: approval1.8b4? → approval1.8b4+
Checked that in too.
Whiteboard: [has landed, needs testing?]
Chris, see comment 32.  But yes, the basic reading of innerHTML in XML docs
needs testing...
What's left to do here? Have we fixed the blocking1.8b4 parts?
Yeah, the blocking parts are fixed.  Perhaps we should file a followup bug on
the remaining work...
Assignee: bugmail → general
Flags: blocking1.8b4+
Depends on: 340800
What work remains? Seems like this bug should've been closed a couple years ago, eh?
Um... read the bug?  Comment 0 and comment 32 pretty much cover what remains to be done, and has remained to be done the entire time this bug has been filed.
Assignee: general → nobody
QA Contact: ian → general
I've already filed a bug for this on Webkit 
(https://bugs.webkit.org/show_bug.cgi?id=30923), but this seems like the right bug to append information to:

XHTML character entities don't work when you feed them into XHTML innerHTML.

For instance:

var element = document.createElement("div");
element.innerHTML = "test &nbsp;";

Will fail.

It shouldn't - non-breaking space is a valid character entity in the XHTML namespace, but despite recognizing XHTML elements, Firefox does not recognize entities. 

That would seem to fly in the face of the purpose of innerHTML - that any string passed to it would be parsed as though inserted into the page.  And of course, in the case of a text/html DOM document, this example succeeds.
Component: DOM: Mozilla Extensions → DOM
Attachment #189651 - Flags: checkin+
Attachment #161091 - Flags: checkin+
I believe this is now following the relevant spec, so we're all good here.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: