Closed Bug 384298 Opened 18 years ago Closed 12 years ago

XMLHttpRequest response incorrectly assumed to be XML

Categories

(Core :: DOM: Core & HTML, defect)

x86
Windows XP
defect
Not set
major

Tracking

()

RESOLVED INVALID

People

(Reporter: carsten.orthbandt, Unassigned)

References

()

Details

(Keywords: testcase)

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Avant Browser; Avant Browser; .NET CLR 1.0.3705; .NET CLR 2.0.50727; .NET CLR 1.1.4322; Media Center PC 4.0; FDM) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9a5) Gecko/20070606 GranParadiso/3.0a5 When using XMLHttpRequest from JavaScript to fetch Non-HTML/XML data (JSON for istance) and the server response lacking any content-type specifier, GP assumes XML and throws parsing errors. According to RFC 2616, Section 7.2.1 a browser SHOULD default to text/plain and MAY try to guess the content type. Here the guessing clearly fails. I see three ways to solve this: 1. Stick to the RFC and treat responses without an "Content-Type" header field as "text/plain", not causing errors while parsing 2. Make the guessing a bit more sensible. Try to check if the content at hand at least looks like XML and only try to parse it then. 3. Always do the parse but suppress the error messages as they are clearly wrong. 1: my favourite but may cause problems with servers/proxies missing out on the content type with actual XML content 2: seems to be the most sensible and stable approach, but also highest workload 3: performance-wise not very smart and it may be hard to detect which error messages have to be displayed and which not. Reproducible: Always Steps to Reproduce: 1. Surf some website that uses JSON and doesn't specify a content type Actual Results: XMLHttpRequest responses with JSON content throw syntax errors in the console Expected Results: The XMLHttpRequest specs say it's fine to send non-XML content, so there should be no XML errors in the console IFF the content type isn't specified as XML. I think the problem is clear. If somebody would like to test this I can easily mock up a page the demonstrates it. Currently we "fixed" this issue by explicitly returning a content type one each and every request (although this is header overhead one shouldn't need). I consider this a MAJOR bug because functionality that adhered to the RFC in 1.5 and 2.0 now doesn't. It causes misleading error messages and confuses users. It may even cause website malfunctions (through exceptions that aren't).
Carsten, do you haven an example page somewhere that shows the issue?
Component: General → DOM: Mozilla Extensions
Product: Firefox → Core
QA Contact: general → general
Version: unspecified → Trunk
I've put up a test page at http://www.pixeltamer.net/test/mozbug.html Expected result: no error message in console. Actual result: "not well-formed" error in console. The text file being retrieved doesn't have a content-type attached.
Thanks for the testcase! The testcase is working fine in IE7 and Opera9.20. So it appears to me that this could be fixed here: http://lxr.mozilla.org/seamonkey/source/content/base/src/nsXMLHttpRequest.cpp#1643 Right, Boris?
Keywords: testcase
Would that break sites that XMLHttpRequest a URL with no content-type header and try to treat the returned data as XML? (See also bug 311724.)
We could avoid setting the XML type and instead sniff for XML... Martin, the relevant code is on line 1714 of rev 1.181 of nsXMLHttpRequest.cpp, not the code you pointed to. The code you point to is the upload code. Note hat Firefox 1.5 and 2 simply didn't report XML parse errors in XMLHttpRequest. Ever. On trunk, this is sort of a regression from bug 305243. I should note that there is no bug per the RFC. The RFC says the UA MAY guess in this case, and we do based on the context. As for the proposed XMLHttpRequest specs, I would be interested in hearing what they have to say on the matter. They should probably at least somewhat define it.
Blocks: 305243
Flags: blocking1.9?
Re Jesse: I'd strongly suspect that skipping XML parsing if the content-type is missing would break some sites. So sniffing the format is definately the better option. Re Boris: This is what http://www.w3.org/TR/XMLHttpRequest/ has to say about it: "The name of the object is XMLHttpRequest for compatibility with the web, though each component of this name is potentially misleading. First, the object supports any text based format, including XML." As for the RFC interpretation, there are two options: - The UA SHOULD treat unknown content as text/plain. - The UA MAY guess the content type. Guessing based on context (calling API) is IMHO wrong since - XMLHttpRequest still follows the HTTP RFC and as such shouldn't change the rules - The XMLHttpRequest spec says (Section 2.1 responseXML) that the response is only to be treated as XML if Content-Type is "either text/xml, application/xml, or ends in +xml" - In the same section: "If Content-Type did not contain such a media type, or if the document could not be parsed (due to an XML namespace well-formedness error or unsupported character encoding, for instance), it must be null" Strictly following the last quote you're theoretically not even ALLOWED to XML-parse the content if it wasn't explicitly type-tagged as XML. Obviously this would be a bad idea though since it would break apps that expect the XML parse to happen even without a content type. I haven't tried it but I would suspect that virtually all browsers provide a valid responseXML if the content is valid XML even without a specified media type.
http://dev.w3.org/cvsweb/~checkout~/2006/webapi/XMLHttpRequest/Overview.html?content-type=text/html;%20charset=utf-8#dfn-responsexml defines exactly what you guys should do. Please follow that and not the version currently on the TR/ page. The difference is that responses _without_ Content-Type are als treated as XML.
Hopefully a new version will be published on TR/ soon reflecting the URL above.
Given that this unpublished spec-change was done just 3 days ago, I'd still suggest not to break behaviour that was standards-compliant 4 days ago. I seriously question this spec change since it contradicts RFC 2616 AND the current version, but if it's gonna happen, so be it. AFAICS, even this new version doesn't imply that an actual error message should be displayed.
If you haven't specified a Content-Type header the behavior that Firefox exhibits (minus other bugs) seems to be correct (trying to parse the response as XML).
The problem as stated in the initial entry is not that FF3 is trying to parse non-XML content as XML but that it displays an error message. Users notice that and complain. As per the currently valid spec, content without a Content-Type header is to be treated as text/plain (and I think it should stay that way). Obviously appcompat-wise it's a good idea to at least try to parse unknown content as XML. But in this case it was an actual decision NOT to send a content type since it is redundant information (the content is text/plain (custom JSON derivative)). This works fine with every browser except FF3 which logs a syntax error. Since the currently binding spec explicitly says that it is fine to send non-XML without a Content-Type, this is IMHO a bug. Since I'm not really in the position to fight the pending spec change my proposal is: Version 1) If the response has no Content-Type specified, treat it as XML, BUT do not log any parsing errors. Version 2) If the response has no Content-Type specified, sniff the content type (look for the <xml> start tag), parse it no matter what BUT ONLY log parsing erros IF sniffing hinted at XML content. Version 2 might be better appcompat-wise, but in overall behaviour quite vague. Since you (Anne) are already following this: I think it might be a good idea not only to spec out when fields have to be null and processing stopped but also to spec which of these error conditions should be user-visible. The spec already defines where exceptions have to be thrown so I think it's only logical to also state if certain conditions only prevent further processing or if they are actual errors.
We need a decision from the spec here. Please bring it up in the webapi list if you don't like what the spec says. Leaving nominated for now to see how far away from what we do now the spec ends up.
Could somebody point out where RFC2616 has a default of "text/plain"? Besides, even if that would be the case, it wouldn't allow parsing as json.
Sorry, I mixed that up. RFC2616 specifies octetstream, not text/plain. But this doesn't matter. The point is that the UA complains that the response is NOT in a specific format (XML) while it nowhere said it was. How the actual response is parsed (and I mentioned it's "custom JSON derivative") is totally up to the application. The UA shouldn't interfere with custom protocols. Confusing the user with false positives is interfering in my book.
For what it's worth, I think not logging the error if there is an HTTP channel but no Content-Type header is a pain, since the sink is what makes that decision and the sink is just an XML sink. I suppose we could subclass nsXMLContentSink to accomplish it... I also think the content sniffing should be pretty easy to do. Much easier than the other. Of course then there's the problem of cross-UA compatible sniffing... As for RFC2616 as far as I can tell it defaults to application/octet-stream, not text/plain. But that's a SHOULD level requirement, not a MUST. The exact text in section 7.2.1 is: Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body. If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. If the media type remains unknown, the recipient SHOULD treat it as type "application/octet-stream". Now this particular server is violating the first SHOULD, because it feels that it has important enough reasons to. We're violating the second SHOULD, again because we have important enough reasons to. If it were up to me I would add sniffing, but standardizing that would be a PITA, I suspect... I do _not_ plan to add a separate sniffer for this (myself; others are welcome to do it), and I bet our generic content sniffer would detect XML in a lot more cases than people would want... and not consistently on different users' machines.
After a lot of discussion on the webapi the outcome was that the XMLHttpRequest are designed not to detail if an error message should be displayed or not. Given the three facts that: - HTTP actually allows responses without a content-type - The error doesn't influence script execution at all (at least if the response isn't meant to be XML in the first place) - due to the context (XHR) there may be hundreds of these false positives per minute I'd suggest to "simply" (haven't looked at the code) disable error loggig if the Content-Type is absent. Your comment raises the suspicion that the place where the error message is logged doesn't have access to that information. If that is the case and your informed judgement is that subclassing would be the way to go then I'd say: Probably too much hassle. But as far as I followed the code the actual request (and therefor type info) is passed down the chain. Sniffing may indeed be very difficult and error-prone. Especially with free-form text formats that explicitly do _not_ follow known rules.
> the place where the error message is logged doesn't have access to that > information. It might; I do see a channel being passed to the XML content sink Init() method. What the sink _doesn't_ know is that it was created to handle an XMLHttpRequest. All it knows is that it's being asked to handle something someone decided was XML, and it knows the envelope information (HTTP headers, etc). So changing this behavior _just_ for XMLHttpRequest would be rather difficult.
Personally, I don't see a problem with suppressing the error message for all cases IF (and only if) no Content-Type was given. If the supposed XML content came from a direct request without a XML type, the HTTP RFC prefers transparent behaviour as well (although things are a bit more complicated because of the neccessary sniffing for dumb file-servers).
The point is that if no Content-Type was given and we ended up in the XML parser code then someone is either forcing a parse (and might hence want the error message) or we detected the content as XML via sniffing (and should show the error message as we would for any other XML). I'm also assuming that any behavior changes here would be limited to HTTP/1.0 or later, of course, so that file:// URIS, HTTP/0.9 responses, etc are not affected.
Not a blocker, but wanted for 1.9.
Flags: blocking1.9? → blocking1.9-
Whiteboard: [wanted-1.9]
I have this problem XMLHttpRequest NOT parse responseXML I use DOM parser (bad solution) xml dom parser TRUNCATE my data no problem in IE code XML : <?xml version="1.0" encoding="UTF-8" ?> <kkdb> <ra>8</ra> <titolo>pippo</titolo> <tx>13</tx> <texto>my data is 20 kb</texto> <pubblic>06-07-2007</pubblic> <ar>11</ar> </kkdb> code javascript function xmlhttpPost(idd) { var strURL; var content; var oggh; var setta; strURL = tinyMCE.documentBasePath + '/' + tinyMCE.getParam('savajaxurl',null) + '&plg=AJXOPEN'; // replace in file.xml // strURL = tinyMCE.getParam('savajaxurl',null) + '&plg=AJXOPEN'; // alert(strURL); var xmlHttpReq = false; // Xhr per Mozilla/Safari/Ie7 if (window.XMLHttpRequest) { xmlHttpReq = new XMLHttpRequest(); } // per tutte le altre versioni di IE else if (window.ActiveXObject) { xmlHttpReq = new ActiveXObject("Microsoft.XMLHTTP"); } xmlHttpReq.open('POST', strURL, true); //document.write(strURL); xmlHttpReq.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded'); xmlHttpReq.onreadystatechange = function() { if (xmlHttpReq.readyState == 4) { // tinyMCE.selectedInstance.setHTML('ucc' + xmlHttpReq.responseText); // alert(xmlHttpReq.responseText); read xml output xmlpp(xmlHttpReq.responseText); } else{ // alert(self.xmlHttpReq.readyState); tinyMCE.selectedInstance.setHTML('LOADING'); } } xmlHttpReq.send(getquerystring(idd)); return true; } function getquerystring(idd){ contentcd = '&id=' + idd; return contentcd; } function echi(idd){ ddd = xmlhttpPost(idd); // setInterval('window.close()', 2000); } function xmlpp(tex) { if (window.ActiveXObject) { var doc=new ActiveXObject("Microsoft.XMLDOM"); doc.async="false"; doc.loadXML(tex); } // code for Mozilla, Firefox, Opera, etc. else { var parser=new DOMParser(); var doc=parser.parseFromString(tex,"text/xml"); } var inst = tinyMCE.selectedInstance; var formObj = inst.formElement.form; var xccc=doc.documentElement; for(var x=0,objs=formObj.elements;x<objs.length;x++){ objs[x].tagName // if(xccc.getElementsByTagName(objs[x].name)[0].firstChild.nodeValue!= undefined){ switch(objs[x].tagName){ case "INPUT": formObj.elements[x].value = xccc.getElementsByTagName(objs[x].name)[0].firstChild.nodeValue; break; case "SELECT": var current_season = xccc.getElementsByTagName(objs[x].name)[0].firstChild.nodeValue for(var i=0;i<formObj.elements[x].length;i++){ if(formObj.elements[x].options[i].value == current_season){ formObj.elements[x].options[i].selected = true; } } break; case "CHECKBOX": break; case "RADIO": break; } // } tinyMCE.selectedInstance.setHTML(xccc.getElementsByTagName('texto')[0].firstChild.nodeValue); // this contain big? data (20kb) not bynary } return true; } contact me in any case trancks
Just wanted to note that this isn't a Windows-only issue, as indicated on the report, it also happens on Mac OS X (and presumably others). Also, it ALWAYS occurs when doing XMLHttpRequests on local files (i.e. file:///...) since there's no server to set the Content-Type. I would vote to suppress the error if no Content-Type is set.
Flags: wanted1.9+
Whiteboard: [wanted-1.9]
Additionally, may apps use XHR to retrieve local (chrome) files... these are often non XML files and XML parsing throws an unwanted error. Perhaps throw a warning rather than an error? I note that malformed CSS errors throw a warning rather than error, and this type of issue seems, to me, analogous.
(In reply to comment #19) > I'm also assuming that any behavior changes here would be limited to HTTP/1.0 > or later, of course, so that file:// URIS, HTTP/0.9 responses, etc are not > affected. I disagree -- it is also problematic with local files, which many extensions access via chrome:// The workaround I posted in Bug 445388 will work for some, but it is not a real solution to the problem ... only the symptom.
Per the latest spec, we have to parse the response as xml to determine the charset when the response MIME type is null. http://xhr.spec.whatwg.org/#text-response-entity-body Use .response = 'text' or .overrideMimeType('text/plain') to override the behavior.
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
> .response = 'text' Ugh, .responseType = 'text'.
Component: DOM: Mozilla Extensions → DOM
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.