Closed
Bug 247024
Opened 21 years ago
Closed 13 years ago
Encoding for text/xml over HTTP defaults to UTF-8
Categories
(Core :: XML, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: bugzilla.mozilla.org-3, Assigned: hjtoi-bugzilla)
References
()
Details
(Keywords: helpwanted, testcase)
Attachments
(2 files, 6 obsolete files)
4.84 KB,
patch
|
Biesinger
:
review+
Biesinger
:
superreview+
damons
:
approval1.9-
|
Details | Diff | Splinter Review |
3.11 KB,
patch
|
bzbarsky
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
According to RFC 3023, XML documents served over HTTP with Content-Type:
text/xml and no charset= part, should be treated as US-ASCII. Mozilla appears to
treat it as UTF-8.
Steps to reproduce:
1. Open an XML document encoded in UTF-8 from HTTP with Content-Type: text/xml
with no charset= part. The document should contain at least one non-US-ASCII
character, e.g. é (e accent), i.e. the document is invalid.
Actual results:
The non-US-ASCII characters are displayed.
Expected results:
The non-US-ASCII characters should be displayed as ? (or generate a fatal error
when bug 174351 is fixed).
I see this using Firefox 0.9 on Win XP.
Reporter | ||
Comment 1•21 years ago
|
||
Test case: http://serber.chsc.dk/misc/encoding.php
Keywords: testcase
Comment 2•21 years ago
|
||
Same error under linux (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8a2)
Gecko/20040615 Firefox/0.8.0+)
![]() |
||
Comment 3•21 years ago
|
||
Note that this RFC basically flatly contradicts the XML specification.
Unfortunately, real-life pages rely on the "unlabeled XML means UTF8"
stipulations of the XML specification...
Comment 4•21 years ago
|
||
No, XML (both 1.0 and 1.1) defers to RFC3023 for text/xml's encoding:
http://www.w3.org/TR/REC-xml/#sec-guessing-with-ext-info
http://www.w3.org/TR/xml11/#sec-guessing-with-ext-info
![]() |
||
Comment 5•21 years ago
|
||
Ah, OK. Mental note -- a prerequisite to doing this is to stop using text/xml
almost everywhere in Gecko. See
http://lxr.mozilla.org/seamonkey/search?string=text/xml and
http://lxr.mozilla.org/seamonkey/search?string=text_xml -- most of those should
use application/xml.
![]() |
||
Comment 6•21 years ago
|
||
The basic change to nsXMLDocument should be pretty simple. There's some tedious
(but not hard) work with all those text/xml users in the tree...
Keywords: helpwanted
Whiteboard: [good first bug]
Comment 7•20 years ago
|
||
Should nsMimeTypes define application/xml?
I'm looking into fixing a part of this bug although I'm not sure if I know a
proper solution for all legacy code. Perhaps I should just make a start and ask
for review.
![]() |
||
Comment 8•20 years ago
|
||
Yeah, defining it in nsMimeTypes is a good idea.
Feel free to ask me questions if you run into issues!
Comment 9•20 years ago
|
||
This adds APPLICATION_XML and replaces existing usage of TEXT_XML.
Comment 10•20 years ago
|
||
Attachment #174601 -
Attachment is obsolete: true
Comment 11•20 years ago
|
||
Updated•20 years ago
|
Attachment #175434 -
Flags: review?(bzbarsky)
![]() |
||
Comment 12•20 years ago
|
||
Comment on attachment 175434 [details] [diff] [review]
patch #a - fixes some internal usage
>Index: mozilla/extensions/sroaming/resources/content/transfer/conflictCheck.js
>-const kListingMimetype = "text/xml";
>+const kListingMimetype = "application/xml";
I can't tell whether this part is right; break it out into a separate patch,
and get review from someone who knows this code.
>Index: mozilla/extensions/transformiix/source/xslt/txMozillaXMLOutput.cpp
>Index: mozilla/extensions/transformiix/source/xslt/txOutputFormat.cpp
Same here.
>Index: mozilla/extensions/transformiix/source/xslt/txStandaloneXSLTProcessor.cpp
>- * or type="text/xml" the href pseudo attribute value will be appended to
>+ * or type="application/xml" the href pseudo attribute value will be appended to
...
>- type.EqualsLiteral("text/xml")) {
>+ type.EqualsLiteral("application/xml")) {
This is almost certainly wrong, but again please get review from an XSLT peer.
The rest looks ok.
Attachment #175434 -
Flags: review?(bzbarsky) → review-
Comment 13•20 years ago
|
||
I removed those files and and added some files I missed in the first patch.
Attachment #175434 -
Attachment is obsolete: true
Attachment #175540 -
Flags: review?(bzbarsky)
Comment 14•20 years ago
|
||
Address comment 12 for patch #a.
Attachment #175546 -
Flags: review?(bzbarsky)
![]() |
||
Updated•20 years ago
|
Attachment #175540 -
Flags: review?(bzbarsky) → review-
![]() |
||
Updated•20 years ago
|
Attachment #175546 -
Flags: review?(bzbarsky) → review+
Comment 15•20 years ago
|
||
Marking checked in or obsolete patches obsolete. This patch is an update to
patch #b to exclude all files from #a.1, since #a.1 has been checked in.
Attachment #175540 -
Attachment is obsolete: true
Attachment #175546 -
Attachment is obsolete: true
Attachment #175771 -
Flags: superreview?(bzbarsky)
Attachment #175771 -
Flags: review?(bzbarsky)
![]() |
||
Comment 16•20 years ago
|
||
Comment on attachment 175771 [details] [diff] [review]
patch #b.1
>Index: extensions/xmlextras/base/src/nsXMLHttpRequest.cpp
> // If no content type header was set by the client, we set it to
>- // text/xml.
>+ // application/xml.
This needs review from someone who knows that code and whether servers handle
this.
r+sr=bzbarsky on the rest. I won't be able to check this in till at least
Sunday, so find someone else to do it (and don't forget to remove the
XMLHttpRequest part!).
Attachment #175771 -
Flags: superreview?(bzbarsky)
Attachment #175771 -
Flags: superreview+
Attachment #175771 -
Flags: review?(bzbarsky)
Attachment #175771 -
Flags: review+
Comment 17•20 years ago
|
||
Without that file + a fresh |cvs diff|.
Updated•20 years ago
|
Attachment #175771 -
Attachment is obsolete: true
Attachment #177295 -
Flags: superreview+
Attachment #177295 -
Flags: review+
Comment 18•20 years ago
|
||
I was wondering about the other text/xml occurances:
<http://lxr.mozilla.org/seamonkey/search?string=text%2Fxml>
... most have been fixed but I'm not sure about what should happen to the last
few...
(Marking patch #b.2 obsolete as it has been checked in.)
Attachment #177295 -
Attachment is obsolete: true
Attachment #180034 -
Flags: review?(bzbarsky)
![]() |
||
Updated•20 years ago
|
Attachment #180034 -
Flags: superreview+
Attachment #180034 -
Flags: review?(bzbarsky)
Attachment #180034 -
Flags: review+
Comment 19•17 years ago
|
||
Comment on attachment 174604 [details] [diff] [review]
patch #2 - untested
I believe that this patch would be good to have. It fixes a problem reported recently on the Ltru mailing list: when uploading a local .xml file containing UTF-8 to the W3C validator we send it with the content-type "text/xml" and the validator flags any non-US-ASCII codepoints as errors.
>Index: mozilla/uriloader/exthandler/nsExternalHelperAppService.cpp
>@@ -442,16 +442,17 @@ static nsExtraMimeTypeEntry extraMimeEnt
>+ { APPLICATION_XML, "xml,xsl,xbl", "Extensible Markup Language", MAC_TYPE('TEXT'), MAC_TYPE('ttxt') },
> { TEXT_XML, "xml,xsl,xbl", "Extensible Markup Language", MAC_TYPE('TEXT'), MAC_TYPE('ttxt') },
I think this is an error: the new line should replace the following line, no?
Attachment #174604 -
Flags: superreview?(cbiesinger)
Attachment #174604 -
Flags: review?(cbiesinger)
Updated•17 years ago
|
Attachment #174604 -
Flags: superreview?(cbiesinger)
Attachment #174604 -
Flags: superreview+
Attachment #174604 -
Flags: review?(cbiesinger)
Attachment #174604 -
Flags: review+
Updated•17 years ago
|
Attachment #174604 -
Flags: approval1.9?
Comment 20•17 years ago
|
||
Conserned about how old this patch is .. can you confirm what it fixes and what this may break. Are there any automated tests?
Comment 21•17 years ago
|
||
Comment on attachment 174604 [details] [diff] [review]
patch #2 - untested
a-'ing this until schrep's comments are addressed.
Attachment #174604 -
Flags: approval1.9? → approval1.9-
Comment 22•17 years ago
|
||
Changing .xml to map to application/xml as opposed to text/xml would make things like content type on form-based file upload better in the sense that text/xml is harmful due to the US-ASCII default per RFC.
However, the US-ASCII default for text/xml in the RFC does not really help anyone in *practice*. In my opinion, it would be more productive to get the RFC repealed than to make Gecko display the REPLACEMENT CHARACTER on non-ASCII.
Updated•16 years ago
|
QA Contact: ashshbhatt → xml
Comment 23•14 years ago
|
||
WONTFIX?
Comment 24•14 years ago
|
||
I think we should WONTFIX and willfully violate RFC 3023. RFC 3023 is an excellent example of the IETF appreciating the internal consistency of the RFC canon over what's useful to Web authors of web software implementors. It makes no practical sense to make ancient email-motivated restrictions that predate widespread UTF-8 deployment on text/* apply to HTTP payloads. It would make more sense for the IETF to repeal the ancient restrictions on text/*.
Comment 25•14 years ago
|
||
CSS, HTML, and various file formats in the HTML spec all already violate the text/*-is-ASCII rule, FWIW.
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
Whiteboard: [good first bug]
You need to log in
before you can comment on or make changes to this bug.
Description
•