Closed Bug 247024 Opened 20 years ago Closed 12 years ago

Encoding for text/xml over HTTP defaults to UTF-8

Categories

(Core :: XML, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: bugzilla.mozilla.org-3, Assigned: hjtoi-bugzilla)

References

()

Details

(Keywords: helpwanted, testcase)

Attachments

(2 files, 6 obsolete files)

According to RFC 3023, XML documents served over HTTP with Content-Type:
text/xml and no charset= part, should be treated as US-ASCII. Mozilla appears to
treat it as UTF-8.

Steps to reproduce:
1. Open an XML document encoded in UTF-8 from HTTP with Content-Type: text/xml
with no charset= part. The document should contain at least one non-US-ASCII
character, e.g. é (e accent), i.e. the document is invalid.

Actual results:
The non-US-ASCII characters are displayed.

Expected results:
The non-US-ASCII characters should be displayed as ? (or generate a fatal error
when bug 174351 is fixed).

I see this using Firefox 0.9 on Win XP.
Same error under linux (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8a2)
Gecko/20040615 Firefox/0.8.0+)
Note that this RFC basically flatly contradicts the XML specification. 
Unfortunately, real-life pages rely on the "unlabeled XML means UTF8"
stipulations of the XML specification...
Ah, OK.  Mental note -- a prerequisite to doing this is to stop using text/xml
almost everywhere in Gecko.  See
http://lxr.mozilla.org/seamonkey/search?string=text/xml and
http://lxr.mozilla.org/seamonkey/search?string=text_xml -- most of those should
use application/xml.
The basic change to nsXMLDocument should be pretty simple.  There's some tedious
(but not hard) work with all those text/xml users in the tree...
Keywords: helpwanted
Whiteboard: [good first bug]
Should nsMimeTypes define application/xml?

I'm looking into fixing a part of this bug although I'm not sure if I know a
proper solution for all legacy code. Perhaps I should just make a start and ask
for review.
Yeah, defining it in nsMimeTypes is a good idea.

Feel free to ask me questions if you run into issues!
Attached patch patch - untested (obsolete) — Splinter Review
This adds APPLICATION_XML and replaces existing usage of TEXT_XML.
Attachment #174601 - Attachment is obsolete: true
Attachment #175434 - Flags: review?(bzbarsky)
Comment on attachment 175434 [details] [diff] [review]
patch #a - fixes some internal usage

>Index: mozilla/extensions/sroaming/resources/content/transfer/conflictCheck.js
>-const kListingMimetype = "text/xml";
>+const kListingMimetype = "application/xml";

I can't tell whether this part is right; break it out into a separate patch,
and get review from someone who knows this code.

>Index: mozilla/extensions/transformiix/source/xslt/txMozillaXMLOutput.cpp
>Index: mozilla/extensions/transformiix/source/xslt/txOutputFormat.cpp

Same here.

>Index: mozilla/extensions/transformiix/source/xslt/txStandaloneXSLTProcessor.cpp
>- * or type="text/xml" the href pseudo attribute value will be appended to
>+ * or type="application/xml" the href pseudo attribute value will be appended to
...
>-                    type.EqualsLiteral("text/xml")) {
>+                    type.EqualsLiteral("application/xml")) {

This is almost certainly wrong, but again please get review from an XSLT peer.

The rest looks ok.
Attachment #175434 - Flags: review?(bzbarsky) → review-
Attached patch patch #b (obsolete) — Splinter Review
I removed those files and and added some files I missed in the first patch.
Attachment #175434 - Attachment is obsolete: true
Attachment #175540 - Flags: review?(bzbarsky)
Attached patch patch #a.1 (obsolete) — Splinter Review
Address comment 12 for patch #a.
Attachment #175546 - Flags: review?(bzbarsky)
Attachment #175540 - Flags: review?(bzbarsky) → review-
Attachment #175546 - Flags: review?(bzbarsky) → review+
Attached patch patch #b.1 (obsolete) — Splinter Review
Marking checked in or obsolete patches obsolete. This patch is an update to
patch #b to exclude all files from #a.1, since #a.1 has been checked in.
Attachment #175540 - Attachment is obsolete: true
Attachment #175546 - Attachment is obsolete: true
Attachment #175771 - Flags: superreview?(bzbarsky)
Attachment #175771 - Flags: review?(bzbarsky)
Comment on attachment 175771 [details] [diff] [review]
patch #b.1

>Index: extensions/xmlextras/base/src/nsXMLHttpRequest.cpp
>       // If no content type header was set by the client, we set it to
>-      // text/xml.
>+      // application/xml.

This needs review from someone who knows that code and whether servers handle
this.

r+sr=bzbarsky on the rest.  I won't be able to check this in till at least
Sunday, so find someone else to do it (and don't forget to remove the
XMLHttpRequest part!).
Attachment #175771 - Flags: superreview?(bzbarsky)
Attachment #175771 - Flags: superreview+
Attachment #175771 - Flags: review?(bzbarsky)
Attachment #175771 - Flags: review+
Attached patch patch #b.2 (obsolete) — Splinter Review
Without that file + a fresh |cvs diff|.
Attachment #175771 - Attachment is obsolete: true
Attachment #177295 - Flags: superreview+
Attachment #177295 - Flags: review+
Depends on: 286073
Depends on: 286074
Depends on: 286090
Depends on: 286198
Depends on: 282698
Attached patch patch #cSplinter Review
I was wondering about the other text/xml occurances:
 <http://lxr.mozilla.org/seamonkey/search?string=text%2Fxml>
... most have been fixed but I'm not sure about what should happen to the last
few...

(Marking patch #b.2 obsolete as it has been checked in.)
Attachment #177295 - Attachment is obsolete: true
Attachment #180034 - Flags: review?(bzbarsky)
Attachment #180034 - Flags: superreview+
Attachment #180034 - Flags: review?(bzbarsky)
Attachment #180034 - Flags: review+
Comment on attachment 174604 [details] [diff] [review]
patch #2 - untested

I believe that this patch would be good to have. It fixes a problem reported recently on the Ltru mailing list: when uploading a local .xml file containing UTF-8 to the W3C validator we send it with the content-type "text/xml" and the validator flags any non-US-ASCII codepoints as errors.

>Index: mozilla/uriloader/exthandler/nsExternalHelperAppService.cpp

>@@ -442,16 +442,17 @@ static nsExtraMimeTypeEntry extraMimeEnt

>+  { APPLICATION_XML, "xml,xsl,xbl", "Extensible Markup Language", MAC_TYPE('TEXT'), MAC_TYPE('ttxt') },
>   { TEXT_XML, "xml,xsl,xbl", "Extensible Markup Language", MAC_TYPE('TEXT'), MAC_TYPE('ttxt') },

I think this is an error: the new line should replace the following line, no?
Attachment #174604 - Flags: superreview?(cbiesinger)
Attachment #174604 - Flags: review?(cbiesinger)
Attachment #174604 - Flags: superreview?(cbiesinger)
Attachment #174604 - Flags: superreview+
Attachment #174604 - Flags: review?(cbiesinger)
Attachment #174604 - Flags: review+
Attachment #174604 - Flags: approval1.9?
Conserned about how old this patch is .. can you confirm what it fixes and what this may break.  Are there any automated tests?
Comment on attachment 174604 [details] [diff] [review]
patch #2 - untested

a-'ing this until schrep's comments are addressed.
Attachment #174604 - Flags: approval1.9? → approval1.9-
Changing .xml to map to application/xml as opposed to text/xml would make things like content type on form-based file upload better in the sense that text/xml is harmful due to the US-ASCII default per RFC.

However, the US-ASCII default for text/xml in the RFC does not really help anyone in *practice*. In my opinion, it would be more productive to get the RFC repealed than to make Gecko display the REPLACEMENT CHARACTER on non-ASCII.
QA Contact: ashshbhatt → xml
I think we should WONTFIX and willfully violate RFC 3023. RFC 3023 is an excellent example of the IETF appreciating the internal consistency of the RFC canon over what's useful to Web authors of web software implementors. It makes no practical sense to make ancient email-motivated restrictions that predate widespread UTF-8 deployment on text/* apply to HTTP payloads. It would make more sense for the IETF to repeal the ancient restrictions on text/*.
CSS, HTML, and various file formats in the HTML spec all already violate the text/*-is-ASCII rule, FWIW.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Whiteboard: [good first bug]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: