Last Comment Bug 384298 - XMLHttpRequest response incorrectly assumed to be XML
: XMLHttpRequest response incorrectly assumed to be XML
Status: RESOLVED INVALID
: testcase
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: x86 Windows XP
: -- major with 2 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
http://www.pixeltamer.net/test/mozbug...
: 445388 518301 (view as bug list)
Depends on:
Blocks: 305243
  Show dependency treegraph
 
Reported: 2007-06-13 06:02 PDT by Carsten Orthbandt
Modified: 2013-04-04 13:53 PDT (History)
16 users (show)
jst: blocking1.9-
reed: wanted1.9+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Carsten Orthbandt 2007-06-13 06:02:05 PDT
User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Avant Browser; Avant Browser; .NET CLR 1.0.3705; .NET CLR 2.0.50727; .NET CLR 1.1.4322; Media Center PC 4.0; FDM)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9a5) Gecko/20070606 GranParadiso/3.0a5

When using XMLHttpRequest from JavaScript to fetch Non-HTML/XML data (JSON for istance) and the server response lacking any content-type specifier, GP assumes XML and throws parsing errors.
According to RFC 2616, Section 7.2.1 a browser SHOULD default to text/plain and MAY try to guess the content type. Here the guessing clearly fails.
I see three ways to solve this:
1. Stick to the RFC and treat responses without an "Content-Type" header field as "text/plain", not causing errors while parsing
2. Make the guessing a bit more sensible. Try to check if the content at hand at least looks like XML and only try to parse it then.
3. Always do the parse but suppress the error messages as they are clearly wrong.

1: my favourite but may cause problems with servers/proxies missing out on the content type with actual XML content
2: seems to be the most sensible and stable approach, but also highest workload
3: performance-wise not very smart and it may be hard to detect which error messages have to be displayed and which not.


Reproducible: Always

Steps to Reproduce:
1. Surf some website that uses JSON and doesn't specify a content type

Actual Results:  
XMLHttpRequest responses with JSON content throw syntax errors in the console

Expected Results:  
The XMLHttpRequest specs say it's fine to send non-XML content, so there should be no XML errors in the console IFF the content type isn't specified as XML.

I think the problem is clear. If somebody would like to test this I can easily mock up a page the demonstrates it. Currently we "fixed" this issue by explicitly returning a content type one each and every request (although this is header overhead one shouldn't need).

I consider this a MAJOR bug because functionality that adhered to the RFC in 1.5 and 2.0 now doesn't. It causes misleading error messages and confuses users. It may even cause website malfunctions (through exceptions that aren't).
Comment 1 Martijn Wargers [:mwargers] (not working for Mozilla) 2007-06-13 07:18:12 PDT
Carsten, do you haven an example page somewhere that shows the issue?
Comment 2 Carsten Orthbandt 2007-06-13 07:51:35 PDT
I've put up a test page at http://www.pixeltamer.net/test/mozbug.html

Expected result: no error message in console.
Actual result: "not well-formed" error in console.

The text file being retrieved doesn't have a content-type attached.
Comment 3 Martijn Wargers [:mwargers] (not working for Mozilla) 2007-06-13 08:12:07 PDT
Thanks for the testcase!
The testcase is working fine in IE7 and Opera9.20.
So it appears to me that this could be fixed here:
http://lxr.mozilla.org/seamonkey/source/content/base/src/nsXMLHttpRequest.cpp#1643
Right, Boris?
Comment 4 Jesse Ruderman 2007-06-13 12:28:33 PDT
Would that break sites that XMLHttpRequest a URL with no content-type header and try to treat the returned data as XML?  (See also bug 311724.)
Comment 5 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-06-13 13:50:07 PDT
We could avoid setting the XML type and instead sniff for XML...

Martin, the relevant code is on line 1714 of rev 1.181 of nsXMLHttpRequest.cpp, not the code you pointed to.  The code you point to is the upload code.

Note hat Firefox 1.5 and 2 simply didn't report XML parse errors in XMLHttpRequest.  Ever.

On trunk, this is sort of a regression from bug 305243.

I should note that there is no bug per the RFC.  The RFC says the UA MAY guess in this case, and we do based on the context.

As for the proposed XMLHttpRequest specs, I would be interested in hearing what they have to say on the matter.  They should probably at least somewhat define it.
Comment 6 Carsten Orthbandt 2007-06-13 20:15:06 PDT
Re Jesse:
I'd strongly suspect that skipping XML parsing if the content-type is missing would break some sites. So sniffing the format is definately the better option.

Re Boris:
This is what http://www.w3.org/TR/XMLHttpRequest/ has to say about it:

"The name of the object is XMLHttpRequest for compatibility with the web, though each component of this name is potentially misleading. First, the object supports any text based format, including XML."

As for the RFC interpretation, there are two options:
- The UA SHOULD treat unknown content as text/plain.
- The UA MAY guess the content type.

Guessing based on context (calling API) is IMHO wrong since
- XMLHttpRequest still follows the HTTP RFC and as such shouldn't change the rules
- The XMLHttpRequest spec says (Section 2.1 responseXML) that the response is only to be treated as XML if Content-Type is "either text/xml, application/xml, or ends in +xml"
- In the same section: "If Content-Type did not contain such a media type, or if the document could not be parsed (due to an XML namespace well-formedness error or unsupported character encoding, for instance), it must be null"

Strictly following the last quote you're theoretically not even ALLOWED to XML-parse the content if it wasn't explicitly type-tagged as XML. Obviously this would be a bad idea though since it would break apps that expect the XML parse to happen even without a content type. I haven't tried it but I would suspect that virtually all browsers provide a valid responseXML if the content is valid XML even without a specified media type.

Comment 7 Anne (:annevk) 2007-06-14 06:45:45 PDT
http://dev.w3.org/cvsweb/~checkout~/2006/webapi/XMLHttpRequest/Overview.html?content-type=text/html;%20charset=utf-8#dfn-responsexml defines exactly what you guys should do. Please follow that and not the version currently on the TR/ page. The difference is that responses _without_ Content-Type are als treated as XML.
Comment 8 Anne (:annevk) 2007-06-14 06:46:23 PDT
Hopefully a new version will be published on TR/ soon reflecting the URL above.
Comment 9 Carsten Orthbandt 2007-06-14 07:31:51 PDT
Given that this unpublished spec-change was done just 3 days ago, I'd still suggest not to break behaviour that was standards-compliant 4 days ago.
I seriously question this spec change since it contradicts RFC 2616 AND the current version, but if it's gonna happen, so be it.
AFAICS, even this new version doesn't imply that an actual error message should be displayed.
Comment 10 Anne (:annevk) 2007-06-14 08:38:39 PDT
If you haven't specified a Content-Type header the behavior that Firefox exhibits (minus other bugs) seems to be correct (trying to parse the response as XML).
Comment 11 Carsten Orthbandt 2007-06-14 08:55:52 PDT
The problem as stated in the initial entry is not that FF3 is trying to parse non-XML content as XML but that it displays an error message. Users notice that and complain.
As per the currently valid spec, content without a Content-Type header is to be treated as text/plain (and I think it should stay that way).
Obviously appcompat-wise it's a good idea to at least try to parse unknown content as XML. But in this case it was an actual decision NOT to send a content type since it is redundant information (the content is text/plain (custom JSON derivative)).
This works fine with every browser except FF3 which logs a syntax error. Since the currently binding spec explicitly says that it is fine to send non-XML without a Content-Type, this is IMHO a bug.
Since I'm not really in the position to fight the pending spec change my proposal is:

Version 1)
If the response has no Content-Type specified, treat it as XML, BUT do not log any parsing errors.

Version 2)
If the response has no Content-Type specified, sniff the content type (look for the <xml> start tag), parse it no matter what BUT ONLY log parsing erros IF sniffing hinted at XML content.

Version 2 might be better appcompat-wise, but in overall behaviour quite vague.

Since you (Anne) are already following this: I think it might be a good idea not only to spec out when fields have to be null and processing stopped but also to spec which of these error conditions should be user-visible. The spec already defines where exceptions have to be thrown so I think it's only logical to also state if certain conditions only prevent further processing or if they are actual errors.
Comment 12 Jonas Sicking (:sicking) PTO Until July 5th 2007-06-14 15:40:43 PDT
We need a decision from the spec here. Please bring it up in the webapi list if you don't like what the spec says.

Leaving nominated for now to see how far away from what we do now the spec ends up.
Comment 13 Julian Reschke 2007-06-15 00:42:53 PDT
Could somebody point out where RFC2616 has a default of "text/plain"? Besides, even if that would be the case, it wouldn't allow parsing as json.
Comment 14 Carsten Orthbandt 2007-06-15 00:49:20 PDT
Sorry, I mixed that up. RFC2616 specifies octetstream, not text/plain.
But this doesn't matter. The point is that the UA complains that the response is NOT in a specific format (XML) while it nowhere said it was.
How the actual response is parsed (and I mentioned it's "custom JSON derivative") is totally up to the application. The UA shouldn't interfere with custom protocols.
Confusing the user with false positives is interfering in my book.
Comment 15 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-06-15 09:03:26 PDT
For what it's worth, I think not logging the error if there is an HTTP channel but no Content-Type header is a pain, since the sink is what makes that decision and the sink is just an XML sink.  I suppose we could subclass nsXMLContentSink to accomplish it...

I also think the content sniffing should be pretty easy to do.  Much easier than the other.  Of course then there's the problem of cross-UA compatible sniffing...

As for RFC2616 as far as I can tell it defaults to application/octet-stream, not text/plain.  But that's a SHOULD level requirement, not a MUST.  The exact text in section 7.2.1 is:

  Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type
  header field defining the media type of that body. If and only if the media
  type is not given by a Content-Type field, the recipient MAY attempt to
  guess the media type via inspection of its content and/or the name
  extension(s) of the URI used to identify the resource. If the media type
  remains unknown, the recipient SHOULD treat it as type
  "application/octet-stream".

Now this particular server is violating the first SHOULD, because it feels that it has important enough reasons to.  We're violating the second SHOULD, again because we have important enough reasons to.

If it were up to me I would add sniffing, but standardizing that would be a PITA, I suspect...  I do _not_ plan to add a separate sniffer for this (myself; others are welcome to do it), and I bet our generic content sniffer would detect XML in a lot more cases than people would want... and not consistently on different users' machines.
Comment 16 Carsten Orthbandt 2007-06-15 09:24:01 PDT
After a lot of discussion on the webapi the outcome was that the XMLHttpRequest are designed not to detail if an error message should be displayed or not.
Given the three facts that:
- HTTP actually allows responses without a content-type
- The error doesn't influence script execution at all (at least if the response isn't meant to be XML in the first place)
- due to the context (XHR) there may be hundreds of these false positives per minute
I'd suggest to "simply" (haven't looked at the code) disable error loggig if the Content-Type is absent.
Your comment raises the suspicion that the place where the error message is logged doesn't have access to that information. If that is the case and your informed judgement is that subclassing would be the way to go then I'd say: Probably too much hassle. But as far as I followed the code the actual request (and therefor type info) is passed down the chain.
Sniffing may indeed be very difficult and error-prone. Especially with free-form text formats that explicitly do _not_ follow known rules.
Comment 17 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-06-15 09:52:03 PDT
> the place where the error message is logged doesn't have access to that
> information.

It might; I do see a channel being passed to the XML content sink Init() method.  What the sink _doesn't_ know is that it was created to handle an XMLHttpRequest.  All it knows is that it's being asked to handle something someone decided was XML, and it knows the envelope information (HTTP headers, etc).

So changing this behavior _just_ for XMLHttpRequest would be rather difficult.
Comment 18 Carsten Orthbandt 2007-06-15 09:59:24 PDT
Personally, I don't see a problem with suppressing the error message for all cases IF (and only if) no Content-Type was given.
If the supposed XML content came from a direct request without a XML type, the HTTP RFC prefers transparent behaviour as well (although things are a bit more complicated because of the neccessary sniffing for dumb file-servers).
Comment 19 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-06-15 10:07:23 PDT
The point is that if no Content-Type was given and we ended up in the XML parser code then someone is either forcing a parse (and might hence want the error message) or we detected the content as XML via sniffing (and should show the error message as we would for any other XML).

I'm also assuming that any behavior changes here would be limited to HTTP/1.0 or later, of course, so that file:// URIS, HTTP/0.9 responses, etc are not affected.
Comment 20 Johnny Stenback (:jst, jst@mozilla.com) 2007-07-02 16:56:41 PDT
Not a blocker, but wanted for 1.9.
Comment 21 red Owl 2007-07-06 16:51:19 PDT
I have this problem XMLHttpRequest NOT parse responseXML
I use DOM parser (bad solution) xml dom parser TRUNCATE my data
no problem in IE

code XML :
<?xml version="1.0" encoding="UTF-8"  ?>
<kkdb>
<ra>8</ra>
<titolo>pippo</titolo>
<tx>13</tx>
<texto>my data is 20 kb</texto>
<pubblic>06-07-2007</pubblic>
<ar>11</ar>
</kkdb>


code javascript 

function xmlhttpPost(idd) {
    var strURL;
    var content;
    var oggh;
    var setta;
    strURL = tinyMCE.documentBasePath + '/' + tinyMCE.getParam('savajaxurl',null) + '&plg=AJXOPEN';
// replace in file.xml
    // strURL = tinyMCE.getParam('savajaxurl',null) + '&plg=AJXOPEN';
          // alert(strURL);
    var xmlHttpReq = false;

    // Xhr per Mozilla/Safari/Ie7
    if (window.XMLHttpRequest) {
        xmlHttpReq = new XMLHttpRequest();
    }
    // per tutte le altre versioni di IE
    else if (window.ActiveXObject) {
        xmlHttpReq = new ActiveXObject("Microsoft.XMLHTTP");
    }

    xmlHttpReq.open('POST', strURL, true);
    //document.write(strURL);

    xmlHttpReq.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');
    xmlHttpReq.onreadystatechange = function() {
        if (xmlHttpReq.readyState == 4) {
               // tinyMCE.selectedInstance.setHTML('ucc' + xmlHttpReq.responseText);
                // alert(xmlHttpReq.responseText);    read xml output
                xmlpp(xmlHttpReq.responseText);
         }
         else{
         // alert(self.xmlHttpReq.readyState);
            tinyMCE.selectedInstance.setHTML('LOADING');
         }
    }
  xmlHttpReq.send(getquerystring(idd));
   return true;
}

function getquerystring(idd){
    contentcd = '&id=' + idd;
     return contentcd;
}

function echi(idd){
 ddd = xmlhttpPost(idd);
 // setInterval('window.close()', 2000);
}

function xmlpp(tex) {
if (window.ActiveXObject)
  {
  var doc=new ActiveXObject("Microsoft.XMLDOM");
  doc.async="false";
  doc.loadXML(tex);
  }
// code for Mozilla, Firefox, Opera, etc.
else
  {
  var parser=new DOMParser();
  var doc=parser.parseFromString(tex,"text/xml");
  }

var inst = tinyMCE.selectedInstance;
var formObj = inst.formElement.form;
var xccc=doc.documentElement;
 for(var x=0,objs=formObj.elements;x<objs.length;x++){
    objs[x].tagName
  // if(xccc.getElementsByTagName(objs[x].name)[0].firstChild.nodeValue!= undefined){
  switch(objs[x].tagName){
          case "INPUT":
                 formObj.elements[x].value = xccc.getElementsByTagName(objs[x].name)[0].firstChild.nodeValue;
          break;
          case "SELECT":
                 var current_season = xccc.getElementsByTagName(objs[x].name)[0].firstChild.nodeValue
                 for(var i=0;i<formObj.elements[x].length;i++){
                   if(formObj.elements[x].options[i].value == current_season){
                      formObj.elements[x].options[i].selected = true;
                    }
                 }
          break;
          case "CHECKBOX":

          break;
          case "RADIO":

          break;

     }
  // }


         tinyMCE.selectedInstance.setHTML(xccc.getElementsByTagName('texto')[0].firstChild.nodeValue);
// this contain big? data (20kb) not bynary
 }

return true;
}

contact me in any case
trancks
Comment 22 Tom Robinson 2007-10-28 22:40:31 PDT
Just wanted to note that this isn't a Windows-only issue, as indicated on the report, it also happens on Mac OS X (and presumably others).

Also, it ALWAYS occurs when doing XMLHttpRequests on local files (i.e. file:///...) since there's no server to set the Content-Type.

I would vote to suppress the error if no Content-Type is set.
Comment 23 Mozzy 2008-08-03 05:48:07 PDT
Additionally, may apps use XHR to retrieve local (chrome) files... these are often non XML files and XML parsing throws an unwanted error.

Perhaps throw a warning rather than an error?  I note that malformed CSS errors throw a warning rather than error, and this type of issue seems, to me, analogous.
Comment 24 Christoph Dorn 2008-08-04 13:32:56 PDT
*** Bug 445388 has been marked as a duplicate of this bug. ***
Comment 25 Mozzy 2008-08-04 14:03:13 PDT
(In reply to comment #19)

> I'm also assuming that any behavior changes here would be limited to HTTP/1.0
> or later, of course, so that file:// URIS, HTTP/0.9 responses, etc are not
> affected.

I disagree -- it is also problematic with local files, which many extensions access via chrome://  The workaround I posted in Bug 445388 will work for some, but it is not a real solution to the problem ... only the symptom.



Comment 26 Jesse Ruderman 2013-02-17 13:18:58 PST
*** Bug 518301 has been marked as a duplicate of this bug. ***
Comment 27 Masatoshi Kimura [:emk] 2013-02-17 17:25:57 PST
Per the latest spec, we have to parse the response as xml to determine the charset when the response MIME type is null.
http://xhr.spec.whatwg.org/#text-response-entity-body
Use .response = 'text' or .overrideMimeType('text/plain') to override the behavior.
Comment 28 Masatoshi Kimura [:emk] 2013-02-17 17:28:05 PST
> .response = 'text'
Ugh, .responseType = 'text'.

Note You need to log in before you can comment on or make changes to this bug.