Closed Bug 431701 Opened 16 years ago Closed 16 years ago

set characterSet of XML DOM document created with createDocument() to UTF-8 and not ISO-8859-1

Categories

(Core :: DOM: Core & HTML, defect, P1)

x86
Windows XP
defect

Tracking

()

VERIFIED FIXED
mozilla1.9.1b1

People

(Reporter: martin.honnen, Assigned: bzbarsky)

References

()

Details

(Keywords: verified1.9.0.4, verified1.9.1)

Attachments

(4 files, 1 obsolete file)

Firefox (both 2.0 release as well as 3.0 nightly) sets the characterSet property of an XML DOM document created with document.implementation.createDocument() to ISO-8859-1.

Firefox 3.0 uses that characterSet property of the document to encode it when it is passed to the send method of an XMLHttpRequest object and sets the request header Content-Type: application/xml; charset=ISO-8859-1. So with Firefox 2.0 an XML DOM document created with createDocument() is UTF-8 encoded while with Firefox 3.0 it is ISO-8859-1 encoded.
 
That leads to two problems: 
1) server-side code expecting an UTF-8 encoded document is not able to parse the XML sent unless the server-side code is changed to read and use the charset parameter
2) there is unnecessary bloat in the data sent as any characters outside of ISO-8859-1 are now escaped as numeric character references.

Therefore this bug is filed to suggest to change the characterSet property of documents created with createDocument() to UTF-8 instead of ISO-8859-1.
Attached file test case
Simple test case that uses document.implementation.createDocument() to create an XML DOM document and outputs its properties like characterSet and inputEncoding and xmlEncoding
Note that I have set intl.charset.default to UTF-8, and still get ISO-8859-1, so the reason is not what I expected it to be.
The reason is right here:
  http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/content/base/public/nsIDocument.h&rev=3.299&mark=119#117

I think we should just make createDocument() call SetDocumentCharacterSet("UTF-8").

Not sure how I feel about trying to do this for Gecko 1.9.  We _are_ breaking existing content with our current behavior, though...

If we think this change is safe enough to take, it's a trivial patch.  The question is whether we think it's safe.
What are the risks? I.e. what code picks up the charaterSet and uses it for anything?
Whatever uses "GetDocumentCharacterSet(", "GetCharacterSet(" and ".characterSet"):

http://lxr.mozilla.org/seamonkey/search?string=GetDocumentCharacterSet%28

http://lxr.mozilla.org/seamonkey/search?string=GetCharacterSet%28

http://lxr.mozilla.org/seamonkey/search?string=.characterSet

I suspect that most of these are touching documents that are in a window, so not relevant here.  There's URI creation, etc, but those are not really that relevant for data documents.  So I _think_ it should be safe, but I won't have time to look through those lists for a few days.
(In reply to comment #0)
> 
> 1) server-side code expecting an UTF-8 encoded document is not able to parse
> the XML sent unless the server-side code is changed to read and use the charset
> parameter
>

This was the scenerio I ran into. A workaround solution for developers is to do something like:
https://bugzilla.mozilla.org/show_bug.cgi?id=407213#c8
Blocks: 407213
Assignee: nobody → jonas
Flags: wanted1.9.0.x+
This was brought up in the newsgroups again. I wonder if this is bad enough to be a blocker.

Definitely something to look out for in RC1 feedback.
Priority: -- → P1
I assume there was in fact not much on this in RC1 feedback?
So realistically, this bug means that we've changed behavior for sending documents over XMLHttpRequest from Firefox 2.  I'm not sure why neither this bug nor bug 407213 got nominated for blocking 1.9; they probably should have been.  :(

We should definitely make sure bug 407213 is fixed ASAP, whether by making this change or some other means.
Flags: blocking1.9.1?
Flags: blocking1.9.0.1?
The other possibility is that createDocument() could use the encoding of the document that the DOMImplementation came from.  Not sure which is better.

But I should note that the XMLHttpRequest spec does require us to send documents with an inputEncoding of null as UTF-8, and the DOM spec says documents created in memory should have an inputEncoding of null.

So if we really wanted to, we could change nothing in terms of our documentCharacterSet, but simply flag documents created using createDocument as such.
Flags: blocking1.9.1?
Flags: blocking1.9.1+
Flags: blocking1.9.0.1?
Flags: blocking1.9.0.1-
Progress on this, Jonas?
This seems to be breaking a fair number of sites; we should really fix it.
Flags: blocking1.9.0.2?
Comment on attachment 329781 [details] [diff] [review]
Patch to do the input encoding thing if we decide that's safer

Lets do this at least. We could probably also set the default encoding to UTF-8 at least on moz-central.
Attachment #329781 - Flags: superreview+
Attachment #329781 - Flags: review+
I landed attachment 329781 [details] [diff] [review] on trunk.  I'll work on getting up a patch for default encoding, and some tests.
Boris, are you going to have time to make a patch (and tests) for this before tomorrow? I don't think this blocks but we can revisit for 1.9.0.3.

Also, since the patch landed, can we call this FIXED?
Flags: in-testsuite?
Flags: blocking1.9.0.3?
Flags: blocking1.9.0.2?
Flags: blocking1.9.0.2-
(In reply to comment #16)
> Boris, are you going to have time to make a patch (and tests) for this before
> tomorrow?

Um... since I was out of town with no net connection (announced and all, too), clearly not.

> I don't think this blocks

It's a regression that's breaking sites.  We need to fix this on branch.  The only reason I haven't posted a branch fix yet is because I haven't had time to write the tests.

> Also, since the patch landed, can we call this FIXED?

No, since it's not fixed.  We've just landed a workaround branch-safe patch to get trunk baking; we still want to fix this bug as filed.
Hello,

A lot of our customers (mainly Germany - umlaut) can not update to firefox 3, caused by this problem. (Ajax Application)

Could you give a time frame for a solution of this problem?

Thank you

(In reply to comment #18)
> caused by this problem. (Ajax Application)

While we also ran into this with our AJAX-based Email client (http://www.scalix.com) we were able to change our application to explicitly set the correct character set. So while an official browser-side solution is obviously still preferrable, you should also get in touch with your developers or application vendors and see if they want to address this. We've been supporting FF3 properly since about 3.0b4 came out.
If all goes to plan this will be fixed in Firefox 3.0.2 and Firefox 3.1.  You can help right now by downloading a current trunk nightly from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/ and seeing whether the problem is fixed there (it should be).
Thanks Boris.

I tried latest trunk. I'm afraid the problem isn't fixed there.

Unfortunately we cannot explicitely set the char set, because our application is based on a precompiled framework.
Silke: does the test case attached to this bug work for you?  If so, can you attach a test case that does not?
our result of test case is:

characterSet: ISO-8859-1; inputEncoding: null; xmlEncoding: null

so for my oppinion the inputEncoding should be ISO-8859-1 too?
Uh... The whole point of the change made so far is that inputEncoding is null for that document (where it used to claim to be ISO-8859-1) and therefor XHR sends the document as UTF-8.

So maybe you should clearly describe what "the problem" is in your case.  In a separate bug, since it doesn't sound like what you're seeing is this bug.
OK, I finally got a test written for bug 407213 and discovered that the original patch here is wrong.  Updated patch, with the UTF-8 thing as well, and with updated tests coming up.
Assignee: jonas → bzbarsky
Status: NEW → ASSIGNED
Attachment #334715 - Flags: superreview?(jonas)
Attachment #334715 - Flags: review?(jonas)
Attached patch Branch versionSplinter Review
Attachment #334715 - Flags: superreview?(jonas)
Attachment #334715 - Flags: superreview+
Attachment #334715 - Flags: review?(jonas)
Attachment #334715 - Flags: review+
Pushed changeset 336c686c17aa.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Flags: in-testsuite? → in-testsuite+
Resolution: --- → FIXED
Comment on attachment 334719 [details] [diff] [review]
Branch version

Looking for branch approval here.  This fixes various sites that blindly parse XHR data as UTF-8 without looking at the HTTP headers.
Attachment #334719 - Flags: approval1.9.0.3?
Well if this helps anyone who uses PHP, this seems to work right now:

header('Content-type: text/xml; charset=ISO-8859-1');

on XML files without a proper encoding header like this:

<?xml version="1.0" encoding="ISO-8859-1"?>

I didn't get anywhere with UTF-8, though.  Nor with JavaScript's
setRequestHeader() as a fast and easy workaround like the above.  Why doesn't that work?  Anyway, this might be a fast fix if you use php and can't edit the xml files.

best of luck.
firefox 3.0.2 still send xml in iso-8859-1. Is it normal ? (firefox 3.1 recent nigtly send it in utf-8)
Yes, this fix wasn't in Firefox 3.0.2.  It'll be in Firefox 3.0.4.
Flags: blocking1.9.0.4? → blocking1.9.0.4+
Comment on attachment 334719 [details] [diff] [review]
Branch version

Approved for 1.9.0.4, a=dveditz for release-drivers
Attachment #334719 - Flags: approval1.9.0.4? → approval1.9.0.4+
Fixed on 1.9 branch.
Keywords: fixed1.9.0.4
Verified for 1.9.0.4 using the testcase with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.4pre) Gecko/2008102304 GranParadiso/3.0.4pre.

Verified on trunk with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b2pre) Gecko/20081023 Minefield/3.1b2pre.
Status: RESOLVED → VERIFIED
Hi there,
We've got new encoding problem which is probably connected with this fix in FireFox 3.0.4.

All XML requests are still in ISO-8859-1 like in v3.x but FFox 3.0.4 is sending header "Content-Type: text/xml; charset=UTF-8". This header can not be overwritten we were trying: 

setRequestHeader('Content-Type','text/xml; charset='+ (xData.characterSet || 'utf-8'))

...but it has no effect - styl data in ISO and request header says "UTF-8". So we are unable to use our mail client now.
Lukas, the data Firefox is sending _is_ encoded in UTF-8 if the header says UTF-8.  What do you mean when you say that "XML requests are still in ISO-8859-1"?

It might be worth either filing a separate bug with your testcase or sending me the URL of your testcase directly...
(In reply to comment #36)
> Hi there,
> We've got new encoding problem which is probably connected with this fix in
> FireFox 3.0.4.
 
> All XML requests are still in ISO-8859-1 like in v3.x but FFox 3.0.4 is sending
> header "Content-Type: text/xml; charset=UTF-8". This header can not be
> overwritten we were trying: 
> ...but it has no effect - styl data in ISO and request header says "UTF-8". So
> we are unable to use our mail client now.

I have the same problem.

The request sent by Firefox 3.0.4 (German Version) says the encoding is "UTF-8", but send the data encoded in ISO-8859-1.

The problem was not there with the Firefox 3.0.3 version (german and us version).

The problem is not present in the US version of Firefox 3.0.4.
Uh...  The dependence on localization is very odd.  Incredibly odd.  Can you please file a bug report with a testcase that shows the problem for you?  I'll try reproducing in the German version, but that's hard to do without a page to try on.
Andrea, thanks for that testcase.

It looks like the problem on branch is that XHR is serializing using the document encoding, still, but claiming UTF-8 if the document was created via createDocument.  So we really do need to change nsXMLDocument on branch.  :(
Flags: blocking1.9.0.5?
Attached patch Do that (obsolete) — Splinter Review
Attachment #348216 - Flags: superreview?(jonas)
Attachment #348216 - Flags: review?(jonas)
Blocks: 464958
No longer blocks: 464958
Depends on: 464958
Please please please don't morph bugs, especially closed fixed verified bugs. We have a hard enough time tracking things for the branches without mixing in half-fixed and not fixed bugs.

Created bug 464958 to track the regression in comment 36 on
Flags: blocking1.9.0.5?
Attachment #348216 - Attachment is obsolete: true
Attachment #348216 - Flags: superreview?(jonas)
Attachment #348216 - Flags: review?(jonas)
Keywords: fixed1.9.1
Target Milestone: --- → mozilla1.9.1b1
Keywords: verified1.9.1
Keywords: fixed1.9.1
Depends on: 474211
Component: DOM: Mozilla Extensions → DOM
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: