nsHTMLCopyEncoder should produce text/html for XHTML




nsHTMLCopyEncoder produces text/html encodings only for HTML documents, but it should do so for XHTML too.  nsHTMLCopyEncoder::SetSelection checks nsIHTMLDocument.IsHTML, and if false (and it is false for XHTML), it treats the document as text and doesn't produce text/html:

> // also consider ourselves in a text widget if we can't find an html document
> nsCOMPtr<nsIHTMLDocument> htmlDoc = do_QueryInterface(mDocument);
> if (!(htmlDoc && mDocument->IsHTML())) {
>   mIsTextWidget = true;

The IsHTML term seems to have been introduced in bug 270145; see bz's bug 270145 comment 5.  (IsCaseSensitive apparently served the purpose that IsHTML serves now.  Bug 487023 made the change.)  I tried removing that term and then copying and pasting from an XHTML document, but when CDATA was present, its closing "]]>" was visible in the resulting paste.  I don't know if there are other problems.

Spun out from bug 723163.  See Henri's comments there about how this is the wrong behavior: bug 723163 comment 13, bug 723163 comment 14, and bug 723163 comment 34.
Of course the problem is that some XHTML documents can't be serialized as text/html....
Are you saying the prospect of nsHTMLCopyEncoder producing text/html for XHTML is doomed from the get-go?  Henri says, "In most normal cases, the text/html serializer can convert XHTML to HTML just fine."  I'm new to how this works -- is there some other class that could (even lossily) translate XHTML to HTML, or could put application/xhtml+xml on the clipboard?
I think what Henri says doesn't contradict what I said...  We can certainly produce text/html, and for most actual uses of XHTML it would work.  In some cases it would produce something quite different from what we started with.  Not necessarily even "lossy", but "different".
I guess the real question is whether we care about those cases.  We may not.
Ah, bug 703514 comment 2, but apparently it broke some about:memory tests
(bug 703514 comment 13).  I still think we should drop that IsHTML() term :-)
