Closed Bug 410217 Opened 14 years ago Closed 4 years ago

XMLHttpRequest should check BOM before using UTF-8 charset

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1070763

People

(Reporter: BijuMailList, Unassigned)

Details

(Keywords: testcase, Whiteboard: parity-IE)

Attachments

(4 files)

MLHttpRequest corrupts UTF-16LE served by text/plain but not when text/xml
PS: let me upload testcase
Attached file XHR_UTF16LE.html
Attachment #294870 - Attachment mime type: text/plain → text/plain; charset=utf-16le
Attached file XHR_UTF16LE.zip
Sorry, if charset is not given, bugzilla is sending UTF-8 as default I am unable to stop it.

So please see my test at 
http://w3l.sourceforge.net/xhr/XHR_UTF16LE.html
http://w3l.sourceforge.net/xhr/
I am also attaching XHR_UTF16LE.zip so that you can put it at your test server.

Steps:
1. go to http://w3l.sourceforge.net/xhr/rss_json.txt
   result: user able to see the text file.

2. go to http://w3l.sourceforge.net/xhr/XHR_UTF16LE.html
3. click Test Case link "0"  
4. click "Test" button
   result: user see corrupted content

5. click Test Case link "1"  
6. click "Test" button
   result: user see the test file content


What was observed:-
1. step 1 show Firefox able to show "rss_json.txt" which is served with content type just "text/plain" properly

2. step 6 show Firefox XHR able to read "rss_json.txt" which is served with content type just "application/xml" properly


So why Firefox XHR is unable automatically detect UFT-16LE when reading plain text mime-type file ?

You can try this at your server using content of XHR_UTF16LE.zip 
make sure web-sever is not automatically appending charset value while serving. 
XHR_UTF16LE.zip
it also a problem for UTF-16BE
The URL you are fetching using XHR do not mention the charset in the Conent-Type header.

curl -v http://w3l.sourceforge.net/xhr/rss_json.txt
Content-Type: text/plain

curl -v http://w3l.sourceforge.net/xhr/rss_json.xml
Content-Type: application/xml

The current behavior when the content type is not XML and the charset is not specified in the headers is to fall back to UTF-8:

http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/content/base/src/nsXMLHttpRequest.cpp&rev=1.211&mark=775-778#775

What the draft XMLHttpRequest specification requires to do in that situation would be defined in step 6 there: http://www.w3.org/TR/XMLHttpRequest/#text-response-entity-body.

If you add the right charset in the Content-type header of the above URLs, your testcase should work as expected.
Severity: major → normal
Component: Networking → DOM: Mozilla Extensions
OS: Windows XP → All
QA Contact: networking → general
Hardware: PC → All
Summary: XMLHttpRequest corrupts UTF-16LE served by text/plain but not when text/xml → XMLHttpRequest should check BOM before using UTF-8 charset
Now I know the solution, so it is OK for me if this is not fixed.

I am only reporting the inconsistency I found.
1. text/plain vs text/xml   (testcase step 4 vs step 6)
2. XHR.responseText vs What displayed on browser (step 4 vs step 1)
3. (http:// or https://) vs (file:// or ftp://)
    please unzip http://w3l.sourceforge.net/xhr/XHR_UTF.zip
    to local drive or a ftp folder and test.

and parity-IE, IE7 shows with out any problem

(In reply to comment #5)
> The URL you are fetching using XHR do not mention the charset in the
> Conent-Type header.

It is intended 

> If you add the right charset in the Content-type header of the above URLs,
> your testcase should work as expected.

Defeats purpose of this bug.
  
> http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/content/base/src/nsXMLHttpRequest.cpp&rev=1.211&mark=775-778#775

why we are going at "else" part?
I thought we have the document.

> http://www.w3.org/TR/XMLHttpRequest/#text-response-entity-body.
I dont see equivalent code for that at bonsai.mozilla.org link
Whiteboard: parity-IE
> I am only reporting the inconsistency I found.
> 1. text/plain vs text/xml   (testcase step 4 vs step 6)
> 2. XHR.responseText vs What displayed on browser (step 4 vs step 1)
> 3. (http:// or https://) vs (file:// or ftp://)
>     please unzip http://w3l.sourceforge.net/xhr/XHR_UTF.zip
>     to local drive or a ftp folder and test.

I did only test using http://. Do you have different results when using file:// or https://?

> > If you add the right charset in the Content-type header of the above URLs,
> > your testcase should work as expected.
> 
> Defeats purpose of this bug.

yes, this was just a workaround to get you going until this if fixed.

> > http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/content/base/src/nsXMLHttpRequest.cpp&rev=1.211&mark=775-778#775
> 
> why we are going at "else" part?
> I thought we have the document.

There is no document when the response is not XML, or failed to be parsed as such.

> > http://www.w3.org/TR/XMLHttpRequest/#text-response-entity-body.
> I dont see equivalent code for that at bonsai.mozilla.org link
 
Yes, that's the issue. This is not implemented right now, and that should be what this bug is about.
Attachment #294870 - Attachment mime type: text/plain; charset=utf-16le → text/plain; charset=
Attachment #294871 - Attachment mime type: text/xml → text/xml; charset=
(In reply to comment #7)
> > 3. (http:// or https://) vs (file:// or ftp://)
> I did only test using http://. Do you have different results when using 
> file:// or https://?

Yes file:// and ftp:// content is displayed. 
https:// same as http:// 
also please see 
http://groups.google.com/group/mozilla.dev.tech.dom/browse_thread/thread/cdb39c617034c75a#
the issue while sending data
Component: DOM: Mozilla Extensions → DOM
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1070763
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.