Closed Bug 120836 Opened 23 years ago Closed 23 years ago

XMLHttpRequest.responseText fails when character set indicated in headers

Categories

(Core :: XML, defect, P2)

x86
Windows 98
defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: matthew, Assigned: hjtoi-bugzilla)

References

()

Details

(Keywords: intl, regression)

Attachments

(5 files, 2 obsolete files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.7) Gecko/20011221 BuildID: 2001122106 When an XMLHttpRequest object retrieves an XML document whose character set is indicated in HTTP headers, but not in the XML declaration, then the responseText method can fail. Reproducible: Always Steps to Reproduce: I have an XML document encoded in us-ascii, containing characters with codes above 127. It indicates this by sending a content type of "text/xml; charset=us-ascii". The XML declaration does not indicate an encoding. I retrieve this document via an XMLHttpRequest. Actual Results: A call to responseText returns an UNEXPECTED_FAILURE. Expected Results: The call to responseText should return the text of the document (allowing for any Javascript restrictions on character sets etc., I don't know what they are.) Adding a bit of debug to XMLHttpRequest, it looks as though it is getting the character set from the nsIDocument at http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#370 which returns a character set of UTF-8 (even though this is not specified in the XML). As a result, the attempt to get the character set from the HTTP headers at http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#374 is never attempted. Hence the document is read as UTF-8 (which it is not) and the conversion fails. I've taken the document, removed all private information (I hope!) and placed it at http://crashonline.org.uk/test/annotations.asc.xml. A "GET" XMLHttpRequest for this URL should indicate the failure. As a comparison, the same document at http://crashonline.org.uk/test/annotations.xml indicates the character set in the XML declaration.
I was looking at working on a patch for this. It would decide whether to call DetectCharset based on the value from GetDocumentCharsetSource. However the current charset seems to get set near the end of nsXMLDocument::StartDocumentLoad http://lxr.mozilla.org/seamonkey/source/content/xml/document/src/nsXMLDocument.cpp#633 which doesn't set the character set source (at least not in the document). Is that a fault?
Priority: -- → P2
Target Milestone: --- → mozilla1.0
Actually I think this is a duplicate. We don't support HTTP headers for XML yet, so it would be small wonder if this worked. *** This bug has been marked as a duplicate of 93218 ***
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Hmm... after some more testing realized this is not dupe. We *should* try to give out some text. Fix coming up.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Attached patch Cleaned up, tested, proposed fix (obsolete) — Splinter Review
This might have been a regression, since I believe the scanner code was changed in response to a change in the decoders, but XMLHttpRequest was not updated. Now XMLHttpRequest again follows the scanner and things seem to work. What the code does it tries to convert the text to the requested character set, but if it finds errors (like characters that are illegal in that set) it will replace those characters with U+FFFD, which will show as '?' in the browser, and continue until the whole buffer has been converted. Notice that I don't seem to be able to see the accented e in Remillard. I see ? both in normal document load and now also using XMLHttpRequest. I do see the correct letter in view source, though. Another notice is that when testing http://www.mozilla.org/xmlextras/xgetinvalid.html we do not completely clear the document tree before the parsererror element. Something strange is going on there. These notices are different bugs that I will file later.
Attachment #75100 - Attachment is obsolete: true
Yes, confirmed that this is a regression. It works fine in NS 6.2.1 which is based on 0.9.4.
Keywords: regression
Comment on attachment 75255 [details] [diff] [review] Cleaned up, tested, proposed fix >+ if (!outBuffer) { >+ nsMemory::Free(outBuffer); This doesn't make sense :).
Attachment #75255 - Flags: review+
Comment on attachment 75255 [details] [diff] [review] Cleaned up, tested, proposed fix What harishd said, sr=jst
Attachment #75255 - Flags: superreview+
Comment on attachment 75255 [details] [diff] [review] Cleaned up, tested, proposed fix you need to address the reviewers comments before approval, that is get rid of the inappropriate |Free|
Attachment #75255 - Flags: needs-work+
The only difference to the previous patch is the removed free.
Attachment #75255 - Attachment is obsolete: true
Comment on attachment 75403 [details] [diff] [review] Removed unneeded free a=scc, and bringing forward previously good r and sr
Attachment #75403 - Flags: superreview+
Attachment #75403 - Flags: review+
Attachment #75403 - Flags: approval+
Checked in.
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Changing QA Contact
QA Contact: petersen → rakeshmishra
verified on the trunk build 2002-05-07-08-trunk on Windows 2000
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: