Ymail problem with character encoding

NEW
Unassigned

Status

()

Core
DOM
--
major
11 years ago
5 years ago

People

(Reporter: Manuel Deschamps, Unassigned)

Tracking

Trunk
x86
Windows XP
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

11 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b1) Gecko/2007110703 Firefox/3.0b1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b1) Gecko/2007110703 Firefox/3.0b1

I just found out that we can't send messages with FF3 if the messages contain non-standard characters, i.e. "ó". I  debugged the request body and it is exactly the same for FF2 and FF3, but at the time the request is sent the encoding is not correct. I tried changing the request headers but no luck. I will appreciate any help that you can give me. I am using FF3 b1. 

I am trying to set the encoding to UTF8, while in FF2 it works in FF3 it still says ISO.

Reproducible: Always
(Reporter)

Comment 1

11 years ago
Created attachment 291954 [details]
FF2 request (WORKS)
(Reporter)

Comment 2

11 years ago
Created attachment 291955 [details]
FF3 request (FAILS(
How are you sending the message, exactly?  Is there a web page I can look at that shows the problem?
(Reporter)

Comment 4

11 years ago
you can look at the attachments I have checked in. So basically what I do is set the content-type header to utf8, the soap content is inserted into the XHR and then the request is sent. By using a sniffer you can see that FF3 replaces the content-type charset to ISO


> you can look at the attachments I have checked in.

That tells me what data is sent, not how you got there.  What exact JS do you run?
(Reporter)

Comment 6

11 years ago
it is a normal XHR.. something like

		req.open("POST", url);
		req.onreadystatechange = _handleReadyStateChange;
                req.setRequestHeader('Content-Type', 'application/xml;charset=utf-8');

                var envelope = // A SOAP ENVELOPE WITH THE WORD "comunicación"
	
		req.send( envelope );

 i have also noticed that even if I get FF3 to set utf8 on the header, the data is not correctly encoded
So "envelope" is a DOM node, right?

It looks like the DOM node comes from a document encoded as ISO-8859-1, so that's the encoding used for the request.

In Fx2, the request is sent without a "charset" parameter, but is encoded in ISO-8859-1 (as you can check by just looking at that attachment using different charsets from the charset menu).

In Fx3, it looks like there is a literal '?' there, which is odd.  I can't think of any circumstance where Gecko code would do that, so I wonder where that text is coming from.  The Fx3 behavior of setting charset=ISO-8859-1 is correct.

Again, an actual testcase that shows the problem would be good.  For one thing, it would allow pinpointing the day when the behavior changed.  For another thing, it would allow debugging to see what's going on.
(Reporter)

Comment 8

11 years ago
I have found that the problem is that on Fx2 and Fx3 when you create a new DOM Document using implementation.createDocument( ) the new document will have a ISO-8859-1 encoding regardless of the character set of the original document. 

i.e. 

var mainDoc = window.document;
var newDoc = mainDoc.implementation.createDocument("","",null);

alert("main " + mainDoc.characterSet);   // output:  main utf-8
alert("new "  + newDoc.characterSet);    // output:  new  ISO-8859-1

if you want to create a new document, to make a SOAP request for instance, you might want a document that is UTF-8 encoded and since firefox does not let you change the characterSet, the only work-around that I found for this problem is:

var domDocument = new DOMParser().parseFromString("<?xml version='1.0' encoding='UTF-8'?><dummy/>","application/xml");

domDocument.replaceChild(domDocument.firstChild);

alert("new "  + domDocument.characterSet);    // output:  new  UTF-8

the performance for parseFromString is about 10x slower compared to createDocument but at least you get a UTF-8 encoded document..

Ideal solution would be that createDocument took the document's charset and use it for the new document.
That doesn't really explain what's going on here.  Firefox is sending ISO-8859-1-encoded data and labeling it as ISO-8859-1, right?  Or is it sending something else?  What exact data is going on the wire?  Is it identical for Fx2 and Fx3b1?

Worth filing as separate bug about the createDocument charset issue.

Updated

10 years ago
Depends on: 431701

Comment 10

10 years ago
Hi there,
we've got the same problem.

Our AJAX application is fully UTF-8 (main document, JS) and we are unable to create XML document via document.implementation.createDocument("", "", null); with UTF-8 encoding.

ALL other browsers inclucing FFox2 are not strict and allow ANY chars in document because they usually has undefined encoding.

Proposed solution with ".parseFromString("<?xml version='1.0'
encoding='UTF-8'?><dummy/>","application/xml");" is too dirty for ANY serious use. IT IS 10x SLOWER!!
 

Comment 11

10 years ago
Any progress here?
Really annoying bug we've already got over 100 request from our clients and RC2 is the same.
(Reporter)

Comment 12

10 years ago
Dan,

is this bug on your radar? do you think it can be fixed on some patch in the near future?

Thanks,
Manuel
This behavior isn't going to get changed for the initial Firefox 3 release...  See bug 431701 for details.
Status: UNCONFIRMED → NEW
Component: General → DOM: Mozilla Extensions
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → general
Version: unspecified → Trunk

Comment 14

10 years ago
So what is official solution for this problem? Because parseFromString is slow and it looks more like meanwhile solution for Alpha version.

I do not know ANY app with 8859-1 xml encoding! This bug will force all EU developers to rewrite their code. What about default UTF-8 encoding for createDocument()?

> So what is official solution for this problem?

Either fixing your server to actually look at the incoming HTTP headers or doing the parseFromString thing, looks like...  Those are the only two options I can think of.

> This bug will force all EU developers to rewrite their code.

I assume that none of them have tested the RC, which is why we're not getting a lot of bug reports?  And that they're all using a somewhat-broken server-side that ignores the incoming headers?

Note that we've always been sending the ISO-8859-1 data.  It's just that now we correctly label it as such!

> What about default UTF-8 encoding for createDocument()?

Did you read comment 13?

Comment 16

10 years ago
Thats cool, so we have SQL data in UTF-8, server side working fully in UTF-8, client side in UTF-8 but we are forced to encode/decode all data to mime/iso-8859-1 because of new stric aproach from FF3. I hope our 60,000 customers and milions of their users will appreciate the slowdown.

On top of it FF2+Others are sending 8859-1 header but text is actually UTF-8 because of JS client encoding so there is no need for encoding. It means that server must check request headers also for Browser version.

Comment 17

10 years ago
xml document should be in the same encoding like main document, or stay not strict like in FF2

it is bizzare logic...
window.document => UTF8
window.document.implementation.createDocument("","",null); => ISO-8859-1 ???
No one is saying this is a good situation, or that we don't want to fix it.  Just that Firefox 3 is in code freeze at this point...
Oh, and I was wrong in comment 15.  As you point out, Firefox 2 does send UTF-8 data.

Comment 20

10 years ago
We fixed our SSide luckily other browsers do not sent any charset in header so it was quite simple.

But still I see no reason for doing this when others works like FF2.

It must slow down Firefox if it is encoding to 8859-1. The point is that web-kit is sometimes 2x as fast as FF3 so another pointless slowdown doesnt help...

Thx for your time, we are looking forward to FF3 final.
(Reporter)

Comment 21

10 years ago
Lukas,

I believe you took the wrong approach; when I investigated this issue 6 months ago, the only work-around I found was the parseFromString solution. It is indeed 10x slower, but we are talking about less than 2ms. If you are creating XML documents to send a request to a server, the network time would be at least 500 times more than that, so those 2 ms are "nothing".

changing the server and hoping you dont get a different encoding from other browser seems kinda brittle.

 i am not trying to tell you what to do and I dont expect to have an argument about these 2 "hacks" we need to do to work around FF3 bugs..  just my 2 cents...

good luck!

Comment 22

10 years ago
Hi Manuel,
truth is that our server is sending accept encoding header with 8859-1 and utf-8. It was made to understand utf-8 only without any charset chack so thets why it crash with FF3. We tested all supported browsers (Opera9,FF2,FF1.5,Safari 3+) and all of them are NOT sending ANY charset information in request header during XML request - FF3 do that and in its case it is 8859-1 with properly encoded "above" characters. So FF3 actualy uses charset which is in default accept-charset header thus behaves different then others but still properly according to our server side.

Your hack is usefull for people who has no chance to fix server side but in our case we are trying to avoid "hack solutions" as much as we can.

Comment 23

10 years ago
Hi,

I have a problem realted to this bug and 431701 too. I'm not sure if this is the right place to post this comment, but here it goes anyway:

When I do a HTML POST in Mozilla/5.0 (Windows; U; Windows NT 5.1; da; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 the browser posts the header "Content-Type: application/x-www-form-urlencoded; charset=UTF-8" but it actually posts content in iso-8859-1 (in the least ASP.NET can't parse it right unless you change content encoding to iso-8859-1). 

This means that my (ASP.NET 2.0) solution parses input incorrectly - the danish æ, ø and å characters are not recognized server side.

I can fix this by changing the content encoding server side before the content is being parsed, like

(in Global.asax - code here is VB.NET)
Sub Application_BeginRequest(ByVal sender As Object, ByVal e As System.EventArgs)
        If Request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; da; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5" Then
            Request.ContentEncoding = System.Text.Encoding.GetEncoding(28591)
        End If
End Sub

However I believe that this should be fixed in Fx natively and not by my code :-P

I am not sure what versions of Fx that has this problem so I don't know in which cases I should change the encoding - can anyone give me a hint on how to figure this out?
Odd.  That sounds like bug 464958, which should be fixed in 3.0.5.

Can you please:

1) File a new bug, so as not to confuse this one any more
2) Post a URI that would let me reproduce this problem?

In particular, I'd like to see exactly what you pass to send().
(Assignee)

Updated

5 years ago
Component: DOM: Mozilla Extensions → DOM
Product: Core → Core
You need to log in before you can comment on or make changes to this bug.