Closed Bug 120836 Opened 23 years ago Closed 22 years ago

XMLHttpRequest.responseText fails when character set indicated in headers

Categories

(Core :: XML, defect, P2)

x86
Windows 98
defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: matthew, Assigned: hjtoi-bugzilla)

References

()

Details

(Keywords: intl, regression)

Attachments

(5 files, 2 obsolete files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.7) Gecko/20011221
BuildID:    2001122106

When an XMLHttpRequest object retrieves an XML document whose character set is
indicated in HTTP headers, but not in the XML declaration, then the responseText
method can fail.

Reproducible: Always
Steps to Reproduce:
I have an XML document encoded in us-ascii, containing characters with codes
above 127. It indicates this by sending a content type of "text/xml;
charset=us-ascii". The XML declaration does not indicate an encoding. I retrieve
this document via an XMLHttpRequest.

Actual Results:  A call to responseText returns an UNEXPECTED_FAILURE.

Expected Results:  The call to responseText should return the text of the
document (allowing for any Javascript restrictions on character sets etc., I
don't know what they are.)

Adding a bit of debug to XMLHttpRequest, it looks as though it is
getting the character set from the nsIDocument at

http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#370

which returns a character set of UTF-8 (even though this is not
specified in the XML). As a result, the attempt to get the character
set from the HTTP headers at

http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#374

is never attempted. Hence the document is read as UTF-8 (which it is
not) and the conversion fails.

I've taken the document, removed all private information (I hope!) and placed it
at http://crashonline.org.uk/test/annotations.asc.xml. A "GET" XMLHttpRequest
for this URL should indicate the failure. As a comparison, the same document at
http://crashonline.org.uk/test/annotations.xml indicates the character set in
the XML declaration.
I was looking at working on a patch for this. It would decide whether to call
DetectCharset based on the value from GetDocumentCharsetSource. However the
current charset seems to get set near the end of nsXMLDocument::StartDocumentLoad

http://lxr.mozilla.org/seamonkey/source/content/xml/document/src/nsXMLDocument.cpp#633

which doesn't set the character set source (at least not in the document).

Is that a fault?

Priority: -- → P2
Target Milestone: --- → mozilla1.0
Actually I think this is a duplicate. We don't support HTTP headers for XML yet,
so it would be small wonder if this worked.

*** This bug has been marked as a duplicate of 93218 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Hmm... after some more testing realized this is not dupe. We *should* try to
give out some text. Fix coming up.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Attached patch Cleaned up, tested, proposed fix (obsolete) — Splinter Review
This might have been a regression, since I believe the scanner code was changed
in response to a change in the decoders, but XMLHttpRequest was not updated.
Now XMLHttpRequest again follows the scanner and things seem to work. What the
code does it tries to convert the text to the requested character set, but if
it finds errors (like characters that are illegal in that set) it will replace
those characters with U+FFFD, which will show as '?' in the browser, and
continue until the whole buffer has been converted.

Notice that I don't seem to be able to see the accented e in Remillard. I see ?
both in normal document load and now also using XMLHttpRequest. I do see the
correct letter in view source, though.

Another notice is that when testing
http://www.mozilla.org/xmlextras/xgetinvalid.html we do not completely clear
the document tree before the parsererror element. Something strange is going on
there.

These notices are different bugs that I will file later.
Attachment #75100 - Attachment is obsolete: true
Yes, confirmed that this is a regression. It works fine in NS 6.2.1 which is
based on 0.9.4.
Keywords: regression
Comment on attachment 75255 [details] [diff] [review]
Cleaned up, tested, proposed fix

>+  if (!outBuffer) {
>+    nsMemory::Free(outBuffer);

This doesn't make sense :).
Attachment #75255 - Flags: review+
Comment on attachment 75255 [details] [diff] [review]
Cleaned up, tested, proposed fix

What harishd said, sr=jst
Attachment #75255 - Flags: superreview+
Comment on attachment 75255 [details] [diff] [review]
Cleaned up, tested, proposed fix

you need to address the reviewers comments before approval, that is get rid of
the inappropriate |Free|
Attachment #75255 - Flags: needs-work+
The only difference to the previous patch is the removed free.
Attachment #75255 - Attachment is obsolete: true
Comment on attachment 75403 [details] [diff] [review]
Removed unneeded free

a=scc, and bringing forward previously good r and sr
Attachment #75403 - Flags: superreview+
Attachment #75403 - Flags: review+
Attachment #75403 - Flags: approval+
Checked in.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Changing QA Contact
QA Contact: petersen → rakeshmishra
verified on the trunk build 2002-05-07-08-trunk on Windows 2000
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: