Last Comment Bug 500937 - html xmlns submitted to MySpace with "update mood & status" feature
: html xmlns submitted to MySpace with "update mood & status" feature
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: unspecified
: All All
: P2 normal (vote)
: ---
Assigned To: Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
:
Mentors:
: 501226 (view as bug list)
Depends on: 547667 541937
Blocks: 468708
  Show dependency treegraph
 
Reported: 2009-06-27 14:29 PDT by Noah (oldtimer) [:Noah]
Modified: 2010-03-24 05:46 PDT (History)
12 users (show)
jst: blocking1.9.2+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta1-fixed


Attachments
Screencast of issue (563.01 KB, application/x-shockwave-flash)
2009-06-27 14:55 PDT, Noah (oldtimer) [:Noah]
no flags Details
Firefight patch (1.89 KB, patch)
2009-07-01 01:12 PDT, Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
no flags Details | Diff | Splinter Review
Make XMLSerializer be a proper HTML serializer when used with HTML docs (2.10 KB, patch)
2009-07-01 02:04 PDT, Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
no flags Details | Diff | Splinter Review
Add a constructor argument for forcing particular serializer (5.87 KB, patch)
2009-08-13 04:40 PDT, Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
no flags Details | Diff | Splinter Review
Make the XML mode XHTML-friendly (15.91 KB, patch)
2009-08-28 04:24 PDT, Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
laurent: review-
Details | Diff | Splinter Review
OutputRaw, uuid, test case (19.85 KB, patch)
2009-09-03 07:22 PDT, Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
no flags Details | Diff | Splinter Review
OutputRaw, uuid, test case v2 (21.60 KB, patch)
2009-09-04 08:06 PDT, Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
laurent: review+
Details | Diff | Splinter Review
Keep output always as XML but make it HTML-friendlier as in WebKit's XML mode (14.55 KB, patch)
2009-09-25 04:14 PDT, Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
no flags Details | Diff | Splinter Review
Keep output always as XML but make it HTML-friendlier as in WebKit's XML mode, v2 (16.82 KB, patch)
2009-09-25 07:04 PDT, Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26)
laurent: review+
jst: superreview+
Details | Diff | Splinter Review

Description Noah (oldtimer) [:Noah] 2009-06-27 14:29:28 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090617 Firefox/3.0
Build Identifier: 

Using myspace, I can reproduce this behavior everytime. It may happen elsewhere but who knows.

View my screencast to see the steps to repro & if you want to view the source, login using the details for the dummy account I made:
username: mozsupport@gmail.com
pass: firefox1

This was tested and confirmed in a new profile. Regression range was traced back to 6/8 working, 6/9 broken. Which makes me blame bug 468708.

Reproducible: Always
Comment 1 Noah (oldtimer) [:Noah] 2009-06-27 14:55:06 PDT
Created attachment 385600 [details]
Screencast of issue

Whatever you type into the form field, once you hit 'Update', it will always be overwritten by: <html xmlns=" http://lnk.ms/*random letter/number variation*
Comment 2 Tyler Downer [:Tyler] 2009-06-27 17:22:50 PDT
Can you give a build ID that this happens in? Is it 3.0.x, 3.5, or trunk?
Comment 3 Noah (oldtimer) [:Noah] 2009-06-28 13:07:14 PDT
It's trunk. The Firefox/3.0 bit in my UA is custom.
Comment 4 Noah (oldtimer) [:Noah] 2009-06-30 15:56:27 PDT
Steps to repro

Login here using the dummy account/pass listed in comment 1:
http://www.myspace.com/

You'll arrive at a homepage area, mouseover to the "Status and Mood" section, which is centered in the middle of the page at the top.

You should see a textbox prefilled with: What are you doing right now?

Click inside it, it should auto erase the prefilled text & auto expand that field into a bigger textarea.

Then feel free to type whatever phrase you want. Click the Update button, see noted bug.
Comment 5 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-06-30 22:55:16 PDT
I'm seeing this is Firefox 3.5 RC, so I doubt this was caused by the HTML namespace patch. Investigating further.
Comment 6 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-06-30 23:29:23 PDT
Hmm. What I was seeing was probably the field autofilling itself from the previous status.
Comment 7 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-07-01 01:12:18 PDT
Created attachment 386231 [details] [diff] [review]
Firefight patch

So MySpace uses XMLSerializer on a HTML tree and can't deal if the result is actually an XML serialization rather than a bogo-serialization somewhere in between HTML and XML. (MySpace runs h=h.replace(/<html>|<\/html>|<html\/>/gi,"") on the result to get rid of the root element.)

XMLSerializer applied to HTML trees has never done the right thing: the result has neither been round-trippable if parsed as XML (if you parsed the old XMLSerializer output as XML, the element that were instanceof HTMLElement in the source doc wouldn't be in the reparsed doc) and the result hasn't been round-trippable if parsed as HTML, either (e.g. empty divs would round trip to the same tree).

This patch breaks XMLSerializer so that it approximates the old brokenness well enough not to show the problem on MySpace.

I'm not happy with this way of "fixing" the issue though. Ideally, we'd evangelize MySpace not to use XMLSerializer when they want HTML and would introduce a distinct HTMLSerializer. That course of action may be futile depending on how many other sites use XMLSerializer. A different proper way of fixing would be to make XMLSerializer to act as a proper HTML serializer when the owner doc of the input node is an HTML document (i.e. handle empty divs and stuff in a way that round trips with an HTML parser).
Comment 8 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-07-01 01:15:31 PDT
*** Bug 501226 has been marked as a duplicate of this bug. ***
Comment 9 Laurent Jouanneau 2009-07-01 01:50:45 PDT
Perhaps we could change nsDOMSerializer so it would "guess" the content-type of the document, so the document encoder would select the serializer corresponding to the document, instead of always getting the xml serializer. In this function:

http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsDOMSerializer.cpp#75
Comment 10 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-07-01 02:04:05 PDT
Created attachment 386234 [details] [diff] [review]
Make XMLSerializer be a proper HTML serializer when used with HTML docs

Attaching a proper fix for the underlying issue. This patch makes XMLSerializer act as an HTML serializer when used with HTML docs.
Comment 11 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-07-01 04:59:10 PDT
Comment on attachment 386234 [details] [diff] [review]
Make XMLSerializer be a proper HTML serializer when used with HTML docs

This patch seems would make Gecko match WebKit on this point. (The earlier patch would have matched Opera.)
Comment 12 Jesse Ruderman 2009-07-01 15:04:36 PDT
I would prefer wontfix/evang and creating a separate API for serializing as HTML.

I frequently need to serialize HTML as XML and XML as HTML.  I have been using XMLSerializer to serialize HTML as XML, and using a hack to add the missing xmlns attribute.

MySpace is being silly.  We should not cripple XMLSerializer forever just to make MySpace work in nightlies.  Having a single API called "XMLSerializer" that changes its behavior depending on the document would be limiting, misleading, and an easy source of bugs in code that needs to work in both HTML and XML.
Comment 13 Olli Pettay [:smaug] (TPAC) 2009-07-01 15:06:19 PDT
It wouldn't be just evang, because other browsers don't add xmlns.
Comment 14 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-07-01 23:23:44 PDT
(In reply to comment #12)
> I would prefer wontfix/evang and creating a separate API for serializing as
> HTML.

I would have agreed before yesterday, but attachment 386234 [details] [diff] [review] would gain interop with WebKit and compat with MySpace, so I think that's the desirable default behavior.

> I frequently need to serialize HTML as XML and XML as HTML.  I have been using
> XMLSerializer to serialize HTML as XML, and using a hack to add the missing
> xmlns attribute.

I suggest adding an optional argument to the XMLSerializer constructor for requesting XML or HTML serializer explicitly (and making the no-argument case do what attachment 386234 [details] [diff] [review] does).

This should be coordinated with other vendors on public-webapps, in my opinion.

> MySpace is being silly.

I agree.

> Having a single API called "XMLSerializer"
> that changes its behavior depending on the document would be limiting,
> misleading,

We already have innerHTML that says "HTML", but I can see why digging deeper into that hole isn't quite nice. Still, I'd go for interop with WebKit plus optional constructor argument.

(In reply to comment #13)
> It wouldn't be just evang, because other browsers don't add xmlns.

WebKit runs an HTML serializer. Opera serializes to something that doesn't round-trip as either XHTML or HTML (like Gecko did before but different in case with what Gecko did before).
Comment 15 Brian Carpenter [:geeknik] 2009-07-04 12:36:29 PDT
I was not seeing this in any of the nightly 3.5pre or 3.5.1pre builds but as soon as I installed the 070409 nightly Minefield build, I started seeing this, but only on Myspace.
Comment 16 conny 2009-07-17 19:25:48 PDT
It's in Minefield for 2 Month now, and not only in Windows
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090717 Minefield/3.6a1pre
Comment 17 Jesse Ruderman 2009-07-17 19:36:17 PDT
I reported this issue to MySpace today.
Comment 18 James R. Landrum 2009-07-27 11:03:40 PDT
This really is MySpace's stupidity. Statuses cannot have HTML, as < is converted to &lt;. I doubt PHP is the only one with striptags (I believe this is the right function...)
Comment 19 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-07 03:40:30 PDT
Jesse, will your tools break if the current patch is checked in (i.e. without a novel optional constructor argument to force particular behavior)?

That is, other than addressing potential review comments, is there any additional code here that I need to write in the 3.6 timeframe?
Comment 20 Jesse Ruderman 2009-08-07 12:16:05 PDT
Yes, I'll have no way to serialize parts of HTML documents as XML.

But more importantly, I think it's foolish to make the API confusing for eternity just because a single site breaks right now.  This should be treated as tech evang.
Comment 21 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-12 07:17:44 PDT
It's not just about one site. It's about WebKit parity.

I think it makes sense to add a constructor argument for requesting particular behavior, even though I'd prefer to do it with multivendor agreement at the WebApps WG as opposed to doing it unilaterally.
Comment 22 Jesse Ruderman 2009-08-12 10:19:50 PDT
If the bar for adding an argument is "multivendor agreement at the WebApps WG", the bar for making XMLSerializer not do what it says by default should be a lot higher than "WebKit happens to do it that way".

Have you tried to get WebKit to change to our behavior?
Comment 23 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-13 02:00:13 PDT
I haven't tried to get WebKit to change their behavior, because I think the WebKit behavior makes sense as the default. It's closer to Opera's behavior and Gecko's old behavior than the current Gecko trunk behavior but isn't broken like Opera's behavior or Gecko's old behavior.

It's unlikely to get formal multivendor agreement in the 3.6 time frame, since there's no spec deliverable for this in WebApps, so I guess I can count the lack of disagreement from other vendors on public-webapps as good enough. 

Maybe this read leads to me having to edit a spec for XMLSerializer and DOMParser...
Comment 24 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-13 04:40:35 PDT
Created attachment 394252 [details] [diff] [review]
Add a constructor argument for forcing particular serializer

This patch adds a constructor argument to XMLSerializer. 

 * new XMLSerializer() behaves as with attachment 386234 [details] [diff] [review].
 * new XMLSerializer("application/xml") forces XML serialization
 * new XMLSerializer("text/html") forces HTML serialization
 * Other argument values are reserved for future extension but unknown are treated as XML.

Feedback from others:
Anne van Kesteren suggested using "html" and "xml" instead of MIME types.
Maciej Stachowiak suggested adding .markupAsXML and .markupAsHTML to the Document and Element interfaces.

I think Maciej's suggestion is more JS-like and better for feature sniffing.

In any case, I think either this patch (possibly with Anne's suggested modification) or attachment 386234 [details] [diff] [review] (possibly accompanied by an implementation of Maciej's suggestion) should go in in the Gecko 1.9.2 timeframe.

Opinions?
Comment 25 Laurent Jouanneau 2009-08-13 06:36:38 PDT
Please, supports application/xhtml+xml for XHTML. The serializer supports now well XHTML like HTML since my patch for bug 422403. In other word, serialization of XML documents is now different from serialization of XHTML documents.
Comment 26 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-14 04:29:57 PDT
Does the XHTML serializer have any ill effects when applied to other kind of XML? Maciej said WebKit's XML serialization code path is always XHTML-friendly. Maybe we should do the same when XMLSerializer is instantiated without a constructor argument and is applied to an XML DOM.
Comment 27 Laurent Jouanneau 2009-08-19 03:26:25 PDT
The XHTML serializer serializes XHTML elements as it should be, and for any other XML elements, it serializes as the XML serializer does. But the XML serializer doesn't know anything about XHTML or other XML language. So, in pretty printing mode, it can't always serialize as good as the XHTML serializer for XHTML elements.

For exemple, XML serializer doesn't know what is a <ul> and a <li>. In pretty printing mode, if there isn't any non-empty DOMtext node between two <li> elements in a <ul> element, the XML serializer won't introduce whitespace. It doesn't know if whitespaces between <xhtml:li> elements are significant or not. However, the HTML/XHTML serializer know it so it can generate a better pretty printing serialization. An other exemple is the <xhtml:pre> element, which should be always serialized in a "raw" manner. Its content should be modified (the serializer shouldn't remove or add whitespace), even in pretty printing mode. The XML serializer doesn't know that, so it modify it.

However, in our case, we don't do pretty printing. But we should still use the XHTML serializer for XHTML documents (and not the XML serializer), because it serializes some attributes in a different manner (like those inserted by the HTML editor), it serializes some elements like <script> with an end tag (</script> here) even if they are empty, to keep the compatibility with old browsers/parsers etc...

I have some ideas for the futur of the XML serializer, with a system of "plugins" (via XPCOM components so we could enhance the XML serializer through an extension), to allow it to know what to do with each element of a specific namespace. I need that for Etna, my generic wysiwyg XML editor. With this system, we could "learn" to the XML serializer how to generate good pretty printing serialization and specific serialization, for XUL, SVG, MATHML or any other XML language, even if the document contain mixed elements of each of this namespace. But it's an other history :-)
Comment 28 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-19 03:57:58 PDT
(In reply to comment #27)
> The XHTML serializer serializes XHTML elements as it should be, and for any
> other XML elements, it serializes as the XML serializer does.

OK. This suggests to me that both for WebKit compat and for making it harder for site developers to shoot themselves in the foot, new XMLSerializer().serializeToString(xmlDocument) and new XMLSerializer().serializeToString("application/xhtml+xml") should run the XHTML serializer.

Is there any value in making new XMLSerializer().serializeToString("application/xml") run the XML serializer as opposed to it, too, running the XHTML serializer?
Comment 29 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-28 02:25:20 PDT
Comment on attachment 386234 [details] [diff] [review]
Make XMLSerializer be a proper HTML serializer when used with HTML docs

This doesn't do the compatible thing with <meta> rewriting.
Comment 30 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-28 02:25:41 PDT
Comment on attachment 394252 [details] [diff] [review]
Add a constructor argument for forcing particular serializer

This doesn't do the compatible thing with <meta> rewriting.
Comment 31 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-28 04:24:26 PDT
Created attachment 397251 [details] [diff] [review]
Make the XML mode XHTML-friendly

This patch uses nsXHTMLContentSerializer instead of nsXMLContentSeriazer for the XML case for maximal author-friendliness and WebKit-compatibility in the case where the author assumes the XML path output is moderately text/html-safe.

I didn't expose the XHTML-unfriendly XML serializer at all, because I'll be on the hook for writing a spec for this having touched XMLSerializer last, and I can't think of a good reason to provide an explicitly less safe XML serialization mode.

I made both "text/html" and "html" (ASCII-case-insensitively) trigger the HTML mode for maximal DWIM.

Also, this patch suppresses the <meta> encoding declaration rewriting for backwards-compatibility and doesn't use an entity for &nbsp; in the XML case, since &nbsp; is not an XML pre-defined entity.

Note to self: Change IsHTML() to !IsCaseSensitive() on the 1.9.2 branch.
Comment 32 Laurent Jouanneau 2009-08-30 08:00:17 PDT
Comment on attachment 397251 [details] [diff] [review]
Make the XML mode XHTML-friendly


>+  if ((aMode == HTML) || ((aMode == AUTO) && doc && doc->IsHTML())) {
>+    encoder = do_CreateInstance(NS_DOC_ENCODER_CONTRACTID_BASE "text/html", &rv);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    // This method will fail if no document
>+    rv = encoder->Init(domDoc, NS_LITERAL_STRING("text/html"),
>+                       nsIDocumentEncoder::OutputEncodeBasicEntities | 
>+                       nsIDocumentEncoder::OutputDontRewriteEncodingDeclaration);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+  } else {
>+    encoder = do_CreateInstance(NS_DOC_ENCODER_CONTRACTID_BASE "application/xhtml+xml", &rv);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    // This method will fail if no document
>+    rv = encoder->Init(domDoc, NS_LITERAL_STRING("application/xhtml+xml"),
>+                       nsIDocumentEncoder::OutputDontRewriteEncodingDeclaration);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+  }


I think we should use the XML serializer for any XML contents except XHTML. Since the XHTML serializer has some few different behaviors (for example, it does wrap by default, but XML does not), we should keep the XML serializer so we keep maximum compatibilities with existing code. I think XMLSerializer is often used on any XML contents in XulRunner app, extensions... (at least, I use it in some of my projects).

So I don't see any reason to drop the support of the XML serializer.  And in this case, you could keep the OutputEncodeBasicEntities flag for the XHTML serializer (note this flag has no effect with the XML serializer).


>+nsDOMSerializer::Initialize(nsISupports* aOwner, JSContext *cx, JSObject *obj,
>+                            PRUint32 argc, jsval *argv)

I'm not familiar with manipulation of JS contexts. You should ask an other reviewer to review this part at least.

>+  nsDependentJSString str(jsstr);
>+  if (str.LowerCaseEqualsLiteral("text/html") || 
>+      str.LowerCaseEqualsLiteral("html")) {
>+    mMode = HTML;
>+  } else {
>+    mMode = XHTML;
>+  }

If the XML serializer is supported, you should then support "xhtml", or having only HTML and XML mode, and in HTML mode, detecting if this is an HTML or an XHTML document to choose the encoder.

Except this issues, the rest of the patch looks good for me.

I think you should add some tests with the XMLSerializer object, and HTML, XHTML, XML documents, and tests against the current bug. (tests I added for serializers are directly using the document encoder).
Comment 33 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-08-31 00:57:59 PDT
(In reply to comment #32)
> I think we should use the XML serializer for any XML contents except XHTML.
> Since the XHTML serializer has some few different behaviors (for example, it
> does wrap by default, but XML does not), we should keep the XML serializer so
> we keep maximum compatibilities with existing code.

Can the XHTML serializer be configured not wrap? What I'm looking for is an XML serializer that uses the <foo></foo> syntax for non-void http://www.w3.org/1999/xhtml elements even when they have no content.

> I think XMLSerializer is
> often used on any XML contents in XulRunner app, extensions... (at least, I use it in some of my projects).

Using XMLSerializer as the entry point?

> So I don't see any reason to drop the support of the XML serializer.

It adds complexity to the Web-exposed feature set. Why is it useful to have two XML serialization options: XHTML-safer and less XHTML-safe? Why not expose only the safer option when both are equally right for round-tripping using a real XML parser?

If a spec is written for this stuff and the spec has three different serialization code paths, how would you sell the XHTML-unsafe XML serialization code path to WebKit developers? Their XML serializer already exhibits the traits of Gecko's XHTML serializer. What would they gain by having a third path that can output <div xmlns="http://www.w3.org/1999/xhtml"/> as opposed to <div xmlns="http://www.w3.org/1999/xhtml"></div>?
Comment 34 Laurent Jouanneau 2009-09-01 03:53:21 PDT
Ok, I just readed the code again, and finally, I think there isn't issue if we use the XHTML serializer to serialize any XML content. We could set the OutputRaw flag to avoid wrapping.

However, I think we should add tests on the use of the XHTML serializer with non XHTML documents, to be sure it is ok.

A little mistake: you forgot to change the UUID of the nsIDocumentEncoder interface.
Comment 35 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-03 07:22:28 PDT
Created attachment 398378 [details] [diff] [review]
OutputRaw, uuid, test case
Comment 36 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-04 08:06:28 PDT
Created attachment 398664 [details] [diff] [review]
OutputRaw, uuid, test case v2

There were existing tests with non-XHTML stuff in content/test/unit/test_xml_serializer.js. Only XHTML elements were affected in those tests as expected.
Comment 37 Laurent Jouanneau 2009-09-04 08:32:14 PDT
Comment on attachment 398664 [details] [diff] [review]
OutputRaw, uuid, test case v2

ok for me.
Comment 38 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-09-23 15:10:00 PDT
If webkit does HTML serialization rather than XML serialization then I think it's really hard to argue that we should have webkit compat here.

We've had this API longer than webkit has, and we have more users, so it seems more likely that sites depend on our behavior than webkits.

If only one site is breaking over this, I think we should attempt to evangelize rather than sacrificing the XMLSerializer API on the shrine of website compatibility.

It would make a lot more sense to me to make sure that we have proper APIs for serializing to HTML/XML explicitly. Be that as .markupAsHTML/.markupAsXML or something else.
Comment 39 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-24 09:21:34 PDT
(In reply to comment #38)
> If webkit does HTML serialization rather than XML serialization then I think
> it's really hard to argue that we should have webkit compat here.

I think the situation is like innerHTML. It's called "HTML" but depends on the kind of document.

> We've had this API longer than webkit has, and we have more users, so it seems
> more likely that sites depend on our behavior than webkits.

Gecko's previously shipped behavior is that the output is neither round-trippable as HTML nor round-trippable as XML. I think that kind of behavior is way more broken than WebKit's.

AFAICT, no one has shipped a release with the current Gecko trunk behavior, so in the absence of further data, I think it's fair to assume that WebKit's behavior is more Web-compatible than Gecko's trunk behavior.

> If only one site is breaking over this, I think we should attempt to evangelize
> rather than sacrificing the XMLSerializer API on the shrine of website
> compatibility.

The new constructor argument is there so that the API is not sacrificed for people who really want to serialize an HTML-mode DOM as XML or an XML-mode DOM as HTML.

> It would make a lot more sense to me to make sure that we have proper APIs for
> serializing to HTML/XML explicitly. Be that as .markupAsHTML/.markupAsXML or
> something else.

I think we should regardless have WebKit's default behavior for XMLSerializer. .markupAsHTML/.markupAsXML on Element and Document would be nicer than the constructor argument but because the idea got Warnocked at comment 24, I went with the smaller API surface change that still addresses Jesse's use case.
Comment 40 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-24 09:31:52 PDT
(In reply to comment #39)
> .markupAsHTML/.markupAsXML on Element and Document would be nicer than the
> constructor argument but because the idea got Warnocked at comment 24, I went
> with the smaller API surface change that still addresses Jesse's use case.

Even if we decide that we want .markupAsHTML/.markupAsXML on Element and Document in the long term, I think it would still make sense to land this patch in time for the beta freeze, because it would be bad if the beta shipped in a MySpace-breaking state.

(The problem here is that WebKit demonstrates that HTML5 compliance as far as the HTML element namespace works on the Web if you implement all the pieces that WebKit implements, but no one knows if failing to implement this piece of the package breaks more than MySpace.)
Comment 41 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-09-24 10:53:46 PDT
But keeping things are they are is a smaller departure from previous gecko behavior than landing this patch.

We have had success with evangelizing sites before. And this is a site that's maintained so chances are pretty high that we can get them to change.

Until we see more sites breaking over this I don't think we should assume that this is a problem prevalent on the web.

I suggest we hoist this over the tech evangelism asap so that we at least give them a *chance* to try to deal with this. If it fails we always have patch that we can land later.
Comment 42 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-24 11:17:29 PDT
(In reply to comment #41)
> But keeping things are they are is a smaller departure from previous gecko
> behavior than landing this patch.

This depends on whether outputting the namespace URI that trips up MySpace is a smaller departure than outputting <img> without a slash.

Adding the namespace URI is a large departure in both the case where someone parses the result using a proper XML parser and in the case where someone tries to match a regexp over the output.

Not having the slash is a large departure when someone really tries to parse the output as XML but a small departure when the output is used as HTML source.
Comment 43 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-24 11:24:16 PDT
(In reply to comment #41)
> We have had success with evangelizing sites before. And this is a site that's
> maintained so chances are pretty high that we can get them to change.

What's the benefit that could be obtained by evangelism apart from avoiding using the name *XML*Serializer to serialize to HTML?

Is there any evidence that there's usage that would be broken by the patch but wouldn't be broken by the trunk behavior? At least we have a high-visibility single data point in the other direction.
Comment 44 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-09-24 15:18:23 PDT
Having "XMLSerializer" serialize to XML I think is no small benefit. No, there is no proof that it would cause confusion, but there's also no proof that it wouldn't.

The longer we debate if we should even attempt to evangelize, the later we'll start the effort to do so, and the less likely that is to succeed. So I say lets just give it a try, and if it fails we can reevaluate.
Comment 45 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-09-24 15:20:07 PDT
Ugh, sorry, i missunderstood part of your last comment.

The fact that we've only gotten reports from a single site out there breaking I actually think is pretty good indication that this is not a widespread problem. Sure, it's not conclusive proof, but we'll never get that one way or another.
Comment 46 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-25 01:23:21 PDT
(In reply to comment #44)
> Having "XMLSerializer" serialize to XML I think is no small benefit. No, there
> is no proof that it would cause confusion, but there's also no proof that it
> wouldn't.

I think logical naming is a smaller benefit than interop between Gecko and WebKit.

> The longer we debate if we should even attempt to evangelize, the later we'll
> start the effort to do so, and the less likely that is to succeed. So I say
> lets just give it a try, and if it fails we can reevaluate.

Even if site evang succeeded, we still wouldn't have interop and would need to convince WebKit to change to achieve interop. So the overall success of the evang path depends on convincing (at least) two external parties (Myspace and WebKit) while the patch would take us to interop (in the no-argument constructor case) without having to get buy-in from external parties.

However, I'll try pinging WebKit devs to see if they'd entertain the idea of changing WebKit if we managed to evangelize MySpace.
Comment 47 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-25 01:51:14 PDT
One more thing: If Gecko ships without interop with WebKit and WebKit didn't change soon, the situation would effective amount to handing the decision on where the interoperable platform converges to Opera, since they could create a two against one situation either way if they changed from their currently third position to interoperate with either Gecko or WebKit. (Or if there's no convergence, authors won't be able to use the API with HTML DOMs in a non-broken way and everyone's impl is effectively useless for HTML DOMs. )
Comment 48 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-25 03:38:53 PDT
OK, I pinged Maciej and Anne. Let's go for attempt at evang and if it succeeds, making both XML and HTML DOMs serialize to XML but with the HTML-friendly behavior for <div></div> instead of <div/>. See http://krijnhoetmer.nl/irc-logs/whatwg/20090925#l-481

I'll write a patch for the case that assumes evang success. I think we should put at least one of the patches in the beta instead of leaving things as they are even if we don't yet know at the time of the beta freeze whether evang has succeeded.

Jesse, did MySpace give any reportable feedback on the bug report mentioned in comment 17?
Comment 49 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-25 04:14:30 PDT
Created attachment 402813 [details] [diff] [review]
Keep output always as XML but make it HTML-friendlier as in WebKit's XML mode

Here's the patch that would get us to a situation WebKit and Opera could converge on, too, if evangelizing MySpace succeeds. (Will test this a big more before setting the review flags.)
Comment 50 Laurent Jouanneau 2009-09-25 05:31:51 PDT
--- a/content/base/test/Makefile.in	
+++ a/content/base/test/Makefile.in	
@@ -272,16 +272,17 @@ _TEST_FILES = 	test_bug5141.html \
 		file_xhtmlserializer_1_sibling_body_only_body.xhtml \
 		file_xhtmlserializer_1_no_body.xhtml \
 		test_bug422403-2.xhtml \
 		file_xhtmlserializer_2.xhtml \
 		file_xhtmlserializer_2_basic.xhtml \
 		file_xhtmlserializer_2_enthtml.xhtml \
 		file_xhtmlserializer_2_entw3c.xhtml \
 		file_xhtmlserializer_2_latin1.xhtml \
+		test_bug500937.html \
 		test_htmlcopyencoder.html \


Henri, I think the test_bug500937.html file is missing in your patch :-)
Comment 51 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-25 07:04:00 PDT
Created attachment 402828 [details] [diff] [review]
 Keep output always as XML but make it HTML-friendlier as in WebKit's XML mode, v2

Oops. Indeed the test file was missing.
Comment 52 Laurent Jouanneau 2009-09-25 07:25:51 PDT
Comment on attachment 402828 [details] [diff] [review]
 Keep output always as XML but make it HTML-friendlier as in WebKit's XML mode, v2

Ok for me. There is just two useless "\"  in two JS strings in your html test file : <\/script> .
Comment 53 Jesse Ruderman 2009-09-25 12:56:50 PDT
> Jesse, did MySpace give any reportable feedback on the bug report mentioned
> in comment 17?

No, MySpace only sent me a stream of generic messages, despite the form's apparent enthusiasm for bug reports.
Comment 54 Johnny Stenback (:jst, jst@mozilla.com) 2009-09-25 17:39:22 PDT
Comment on attachment 402828 [details] [diff] [review]
 Keep output always as XML but make it HTML-friendlier as in WebKit's XML mode, v2

sr=jst

Let's get this patch in for beta and I'll be working with our evangelism team to see what we can do to reach out to MySpace about this issue.
Comment 55 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-28 01:14:49 PDT
Thank you for the reviews. Fixed on trunk:
http://hg.mozilla.org/mozilla-central/rev/e915fafc9ba5

I'll wait for it to bake on the tinderboxen before pushing to branch.
Comment 56 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2009-09-28 05:32:30 PDT
Fixed on branch as well:
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/8bcc1a265ced

Note You need to log in before you can comment on or make changes to this bug.