Last Comment Bug 289714 - Loaded-as-data XML documents should not generate <parsererror>
: Loaded-as-data XML documents should not generate <parsererror>
Status: NEW
:
Product: Core
Classification: Components
Component: XML (show other bugs)
: Trunk
: x86 Linux
: P3 normal with 4 votes (vote)
: mozilla1.9beta1
Assigned To: Olli Pettay [:smaug] (high review load, please consider other reviewers)
:
Mentors:
: 837463 (view as bug list)
Depends on: 45566 251354 399502
Blocks: 313854
  Show dependency treegraph
 
Reported: 2005-04-09 11:17 PDT by Boris Zbarsky [:bz] (Out June 25-July 6)
Modified: 2016-06-18 10:11 PDT (History)
22 users (show)
reed: wanted1.9+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
DOMParser throws (37.45 KB, patch)
2007-10-17 07:38 PDT, Olli Pettay [:smaug] (high review load, please consider other reviewers)
Ms2ger: review-
Details | Diff | Review
289714-trigger-detailed-onerror-on-failed-xml-xhrs-instead-of-adding-a-parsererror-node.diff (30.64 KB, patch)
2016-06-02 22:05 PDT, Thomas Wisniewski
no flags Details | Diff | Review
289714-null-responseXML-for-invalid-xml-xhrs.diff (13.60 KB, patch)
2016-06-04 14:59 PDT, Thomas Wisniewski
wisniewskit: review? (peterv)
Details | Diff | Review
918703-make-console-xml-parsing-notices-more-informative.diff (2.79 KB, patch)
2016-06-04 17:21 PDT, Thomas Wisniewski
wisniewskit: review? (peterv)
Details | Diff | Review
289714-fix-gmp-test.diff (1.95 KB, patch)
2016-06-16 20:28 PDT, Thomas Wisniewski
no flags Details | Diff | Review
289714-fix-xpcshell-tests.diff (4.91 KB, patch)
2016-06-16 20:31 PDT, Thomas Wisniewski
no flags Details | Diff | Review

Description Boris Zbarsky [:bz] (Out June 25-July 6) 2005-04-09 11:17:25 PDT
When we have a well-formedness error in an XML doc loaded as data, should we
really generate a <parseerror> node?  Or should we just clear out the DOM and
return?  It seems like the latter would be a better idea; the question is
whether people out in the wild seriously depend on the former behavior...

Thoughts?  I think this should be pretty easy to fix if we decide to do it (esp.
once bug 251354 is fixed).
Comment 1 Jonas Sicking (:sicking) PTO Until July 5th 2005-04-09 12:30:07 PDT
I do actually think people depend on the current behaviour. But I also think
it'd be ok to break them as long as we provided some other mechanism for
detecting the parse error. For synconous loads like xmlhttprequest (and doc.load
once we do that) we should IMHO throw an exception.
Comment 2 Boris Zbarsky [:bz] (Out June 25-July 6) 2005-04-09 12:37:44 PDT
Sync document.load() returns a boolean to indicate success or failure; it can
return false (as it does now) on parse error.

For xmlhttprequest, don't we already have a way to report errors?  We should use
it if we do.

I agree that we should have a way to indicate whether the document was parsed
correctly, though.  Maybe even add a boolean property on some document interface?
Comment 3 Jonas Sicking (:sicking) PTO Until July 5th 2005-04-09 12:54:32 PDT
I think that if all the methods of loading indicate failure somehow, then it's
enough to just empty the document on failure since a successful load can never
result in an empty document. Though of course an additional flag wouldn't hurt,
but it might get tricky to figure out when to clear that flag...
Comment 4 Boris Zbarsky [:bz] (Out June 25-July 6) 2005-04-09 13:06:14 PDT
OK, makes sense.  Will look into doing this, then.
Comment 5 Jonas Sicking (:sicking) PTO Until July 5th 2005-04-09 13:51:34 PDT
Oh, and of course, we should check how IE deals with errors.
Comment 6 Bob Clary [:bc:] 2005-04-09 13:54:06 PDT
I don't think XMLHttpRequest has any method other than the <parseerror> element
to report the actual error. If you are going to do something else, why not
standardize on what MS does:
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk/html/xmproparseerror.asp>
Comment 7 Boris Zbarsky [:bz] (Out June 25-July 6) 2005-04-09 14:00:55 PDT
> I don't think XMLHttpRequest has any method other than the <parseerror> element
> to report the actual error.

It has an onerror handler, looks like.  Not completely trivial to use from C++,
but really easy from JS.

> why not standardize on what MS does

It's a thought.  Unfortunately, it's not very clearly documented.  For example,
the errorCode property is undocumented.  Do we need compat in errorCode values
if we implement this?  The filepos property doesn't say what it means by
"absolute file position" (is that bytes?  Or what?).  
Comment 8 Bob Clary [:bc:] 2005-04-09 15:04:39 PDT
(In reply to comment #7)

> It has an onerror handler, looks like.  Not completely trivial to use from C++,
> but really easy from JS.

True, but what error information do you get? 

> It's a thought.  Unfortunately, it's not very clearly documented.  For example,
> the errorCode property is undocumented.  Do we need compat in errorCode values
> if we implement this?  

I wouldn't think so, but you can start with 
1)
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk/html/PushModelParser_ErrorCodes.asp>
2)
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcexmldm/html/cerefIXMLParseErrorErrorMessages.asp>

1) is deprecated and 2) is supposedly CE .Net 4.2, but they share values and I
expect are standardized across MSXML implementations.

I would think that we would not have to be identical to their errorCode, nor in
their reason so long as we agreed that non-zero errorCode was an error.

The filepos property doesn't say what it means by
> "absolute file position" (is that bytes?  Or what?).  

If you are really interested, I can experiment and try to find out.

Comment 9 Boris Zbarsky [:bz] (Out June 25-July 6) 2005-04-10 08:38:59 PDT
> True, but what error information do you get? 

Not much, really.  Just that there was an error.  Just like the return value
from document.load.

> If you are really interested, I can experiment and try to find out.

If people don't rely on it, I'm not that interested... if they do, I sort of am.
Comment 10 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-02-04 14:40:55 PST
I think we should make this change, and make it as early as we can.

So the things we currently load as data are:  XMLHttpRequest, DOMParser, anything loaded by the syncload service, XMLDocument.load, and some xforms and xslt stuff.

I think we should treat these as following:

  XMLHttpRequest:  Null responseXML (already happens if the response is not XML)
                   and dispatch an error event.  For sync, throw.
  DOMParser:  Throw.
  Async XMLDocument.load: error event
  syncload service: throw
  XForms: ???
  XSLT:  ???

We seem to already have some code in our tree that assumes it should look for a parsererror tag.  We'll have to fix that, of course.

I do think we should do this asap so sites that depend on this have time to get fixed.
Comment 11 Jonas Sicking (:sicking) PTO Until July 5th 2007-03-02 17:13:05 PST
This is really annoying, there's no reason we should keep doing this for yet another release.
Comment 12 Peter Van der Beken [:peterv] 2007-06-20 11:15:26 PDT
Pushing.
Comment 13 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2007-09-27 06:13:14 PDT
Let's see if I manage to do this for M9. I should.
Comment 14 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2007-10-02 09:03:12 PDT
Safari has also some kind of error document, which for example DOMParser
returns. Opera throws LSException "PARSE_ERR".
Neither of those, nor IE, seem to use onerror of XHR to indicate error in parsing.
Comment 15 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2007-10-08 06:04:18 PDT
So I'm not so sure how important this is or what should be done in each case.
Comment 16 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-10-08 07:26:41 PDT
With XHR you could just set the document to null, I guess.
Comment 17 Jesse Ruderman 2007-10-10 19:51:30 PDT
I'll have to make changes to https://www.squarefree.com/userscripts/valid-xhtml.user.js (assuming this fixes bug 45566), but I'm fine with that.
Comment 18 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2007-10-15 09:50:10 PDT
Bug 399502 should be fixed first to get events from xmldoc.load()
Comment 19 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2007-10-17 07:32:47 PDT
As far as I see, XSLT generates the error page only if the page is loaded
to be visible (not as data). Document loading should be cleaned up a bit, but
for 1.9 XSLT behaves well enough, IMO.
So:
  XMLHttpRequest:  Null responseXML
  DOMParser:  Throw LSException like Opera
  Async XMLDocument.load: error event
  syncload service: throw
  XForms: Handles internally, report error to console as done already now.
  XSLT: when used in scripts, do whatever documented in nsIXSLTProcessor.idl
        (throw nsIXSLTException)

DOMParser could also just return null document?
Comment 20 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2007-10-17 07:38:10 PDT
Created attachment 285230 [details] [diff] [review]
DOMParser throws

Peter, any comments?
Comment 21 Jesse Ruderman 2007-10-17 10:24:11 PDT
Please include information about the XML parsing error in the exception: line number, column number, reason.  https://www.squarefree.com/userscripts/valid-xhtml.user.js needs that information.
Comment 22 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2007-10-17 10:46:43 PDT
Ah, true, that would be useful.
Comment 23 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2007-10-17 12:11:54 PDT
Though, I don't know how to do it, or can't find any reasonable way to do it.
Comment 24 Damon Sicore (:damons) 2007-10-17 13:53:32 PDT
We've shipped this forever, right?  Why should this block beta?
Comment 25 Jonas Sicking (:sicking) PTO Until July 5th 2007-10-17 13:57:30 PDT
Yeah, this should not block beta. I'd even say it shouldn't even block the release at this point, but since there is a patch we might as well drive it in.
Comment 26 Damon Sicore (:damons) 2007-10-17 14:09:53 PDT
I'm minusing.  That doesn't mean it won't go in, but we need to prioritize accordingly.  Anyone who objects can renom.
Comment 27 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-10-17 14:58:26 PDT
I thought the idea was that this is a pretty major change in behavior so we should get it shipped in beta so that web sites can adjust to it as needed.  That's if we're doing it for 1.9, of course.

Put another way, I would argue that if this doesn't make beta we shouldn't take it for 1.9.
Comment 28 Jonas Sicking (:sicking) PTO Until July 5th 2007-10-17 15:08:39 PDT
I don't think this is a big enough change that if it doesn't make first beta we can't include it later. Most sites probably has no error handling at all so they assume that the XML is wellformed, and likely only ever gets that.
Comment 29 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2007-10-17 23:54:14 PDT
(In reply to comment #21)
> Please include information about the XML parsing error in the exception: line
> number, column number, reason. 

The exception should contain also the JS line number etc where
for example parseFromString is called. Should the stack (nsIException.location)
contain the information about xml parsing error or perhaps inner exception
(nsIException.inner)? Suggestions?
Comment 30 :Ms2ger 2012-02-12 06:43:15 PST
Comment on attachment 285230 [details] [diff] [review]
DOMParser throws

We can't throw for DOMParser parsing errors.
Comment 31 Robert Longson 2013-02-03 02:39:00 PST
*** Bug 837463 has been marked as a duplicate of this bug. ***
Comment 32 Brian Bi 2013-02-03 10:09:35 PST
What's the status of this now? I filed 837463 and it got marked as a duplicate of this, but there hasn't been any activity here for nearly a year.
Comment 33 Thomas Wisniewski 2016-06-02 22:05:28 PDT
Created attachment 8759512 [details] [diff] [review]
289714-trigger-detailed-onerror-on-failed-xml-xhrs-instead-of-adding-a-parsererror-node.diff

I decided to try my hand at unraveling this again, based on the comments in this bug. To summarize, this patch triggers onerror instead of onload for an XHR that tries and fails to parse its response as XML. The onerror ProgressEvent is also given a sub-object (tentatively named relatedError) with the related parsing failure's info (message, source, source line, and line/column number). This seems to be in-line with what the comments in this bug were aiming at.

However, there are some web platform test issues that I believe need clarification. Firstly, if the XML response body is an empty string, the second test in XMLHttpRequest/responsexml-basic.htm expects the responseXML to be nulled out and for onload to be triggered. This patch would trigger onerror with a "no element found" relatedError, so I added a check to allow 0-length XML responses to be treated as this test currently requires. I'm not sure if the test should be changed instead, though.

There is also DOMParser-parseFromString-xml.html, which actually expects <parsererror> nodes for DOMParser() failures. This runs counter to what the XMLHttpRequest web platform tests expect. However, I'm again not sure if it would be best to change the tests, and have DOMParser() throw exceptions with relatedErrors (or something along those lines). As such I opted to just honor the current tests and have DOMParser() continue to add <parsererror> nodes. This of course makes the patch a bit more complex; a new SetDocumentFlags() call is used by XHRs specifically to tell the XML content sink not to add a <parsererror> node to the resulting document. DOMParser() of course doesn't make this call.

There are also a couple of web platform tests (and one mochitest) that were timing out, simply because they're sending text/plain responses without setting their Content-Type header to reflect that. As such, the XHR expected XML content, failed to parse it, and onerror was triggered by this patch. The tests didn't even think to check for an onerror case, and so they sat there until they timed out. I altered the relevant tests so they return the correct content type, and to also check for and treat onerror as a failure, so they wouldn't have to time out.

Finally, there is XMLHttpRequest/responsetext-decoding.htm. With this patch, four of its tests time out similarly, but it seems like they may intend to treat the content-type as XML. The problem is that their responses are only a BOM or a BOM followed by another BOM, which this patch treats as a non-0-length case for onerror. However, the tests don't seem like they should care whether onload or onerror are fired, as long as the responseText is set appropriately, and so I've simply altered them to pass in either case. I'm not at all sure if that's correct, though.

As for my patch itself, I opted to save the nsIScriptError from the XML content sink on the related nsIDocument during parse-time, then save it on the XMLHttpRequest for future use during the onerror code (since the pointed to the nsIDocument can be nulled before then). I'm not sure if that's the most graceful or appropriate approach, as I'm still new to the codebase.

As mentioned before, I also added SetDocumentLoadFlags() to handle the DOMParser()/XMLHttpRequest divide, since this seemed like the easiest and most obvious way to handle that case. I'm not sure if another similar mechanism would be more appropriate, though.

I'm also not sure if adding a new WebIDL type for ParseErrors is the way to go, but it seemed to be the path of least complexity, so I thought I'd start with that.
Comment 34 Hallvord R. M. Steen [:hallvors] 2016-06-03 13:19:47 PDT
Per the XMLHttpRequest spec:

https://xhr.spec.whatwg.org/#document-response
"let document be a document that represents the result of running the XML parser with XML scripting support disabled on received bytes. If that fails (unsupported character encoding, namespace well-formedness error, etc.), then return null."

So no XML parsing failures will (per spec) cause XHR error events, as far as I can tell.

IMHO it would be nice to align DOMParser and not create those parsererrror elements - but my opinion shouldn't count for much here, I'll rather ask Boris to chime in on this.

Triggering XHR.onerror for responses sent without content-type (and assumed to be XML): no, please. This is not likely to be web compatible.

Maybe silly question: why do we make an effort to make error details available to JS at all? Can't we just report the details in a console error message and be done? Perhaps exposing those details made sense in 2005, but the web has moved on and XML is much less important. JS-implemented XML error handlers is probably a very rare beast..
Comment 35 Thomas Wisniewski 2016-06-03 13:30:44 PDT
Sure, if all that's desired is to null out the document before onload, then this patch would certainly become much simpler.

You could be right the exposing the error in a more fine-grained manner may no longer matter much, since I basing that off comment #21 (from 2007). I was concerned because while the console does show a terse error message indicating there was a parsing failure, it doesn't give you specifics (which XHR had the parse failure, what the source was, etc). Perhaps that should be extended instead?
Comment 36 Hallvord R. M. Steen [:hallvors] 2016-06-03 13:59:55 PDT
I think reporting more info in the console would be great.
(Right now we often output utterly confusing and maddening "No element found" error messages with no context. There are plenty of questions on Stackoverflow regarding what this error means, improving it would be terrific).

annevk mentioned on IRC that the parsererror element thing is actually added to the DOM parsing standard now - I think it's a wart, but it was considered necessary for web compat so..
https://w3c.github.io/DOM-Parsing/#the-domparser-interface
Comment 37 Thomas Wisniewski 2016-06-04 14:59:42 PDT
Created attachment 8760015 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs.diff

Alright, here's a much simpler version of the patch which simply nulls out responseXML upon parsing failures, rather than doing anything with onerror. I'll see if I can find a way to improve the console messages, but I'm not sure what would be best. Perhaps even just prefixing them with "Error parsing XML: " would be a good start?
Comment 38 Hallvord R. M. Steen [:hallvors] 2016-06-04 15:23:38 PDT
"Error parsing XML: no element found" would be a great improvement, especially if it's also clearly associated with a URL. I'd suggest changing the wording slightly and say "no root element found".

Also, is un-labelled XML a thing? Do other browsers assume XML if a site sends XML without sending a content-type? Perhaps assuming text/plain would work better :)
Comment 39 Thomas Wisniewski 2016-06-04 17:21:46 PDT
Created attachment 8760023 [details] [diff] [review]
918703-make-console-xml-parsing-notices-more-informative.diff

True, browsers don't assume XML in that case, as Chrome and Firefox currently return response==responseText and responseXML=null for unspecified content-type responses. My earlier patch was just flawed and I didn't realize it at the time. The new patch doesn't have that kind of issue, thankfully.

As for changing the console error, this patch seems to do the job. The messages now have this format:
  XML Parsing Error: no root element found
  Location: http://w3c-test.org/XMLHttpRequest/resources/status.py?type=application/xml&content=
  Line Number 1, Column 1:

Of course the line/column number info is superfluous, given that it's repeated in the little link to the far right of such messages, but I'd rather re-use this format string since it's already in the localization file.
Comment 40 Boris Zbarsky [:bz] (Out June 25-July 6) 2016-06-06 12:59:34 PDT
> I'll rather ask Boris to chime in on this

I don't know that I have any new opinions on this since my last comments in the bug, modulo checking what other browsers do and hence what web compat is likely to require.

> why do we make an effort to make error details available to JS at all

You mean why is there a <parseerror> thing in general?  Because that XML parser behavior was designed around an XML parser parsing content in a browsing context (i.e. yellow screen of death) and the idea was to show the user something at least moderately more useful than just a blank page, I think....  The side-effects on XHR and whatnot were not intended, but the parser behavior predated those things existing, so...

> Also, is un-labelled XML a thing? 

In general (i.e. not just over HTTP), sure.  Happens all the time for local files, right?  ;)

If you create a file with no extension containing this text and fetch it via XHR to a file:// URI, what should happen and why?  What should happen if its extension is .docx?

<?xml version="1.0"?>
<root>aaa</root>

and then try to fetch it via XHR
Comment 41 Thomas Wisniewski 2016-06-16 20:28:22 PDT
Created attachment 8763159 [details] [diff] [review]
289714-fix-gmp-test.diff

A try run uncovered two bugs: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f83f3523f74a

This patch fixes the first, where GMP has an "empty" XML document that's not really an XML document at all (no root element). Looking at the test, it's fine to just add an empty root element, since it's currently passing because there's a parsererror that it's ignoring.
Comment 42 Thomas Wisniewski 2016-06-16 20:31:27 PDT
Created attachment 8763160 [details] [diff] [review]
289714-fix-xpcshell-tests.diff

This patch fixes the other related failures in the above try run, where XPC shell tests are reading a non-XML file to test overriding the mime type, which of course fails because now the responseXML is null. As such I change the test to read an actual XML file and make sure it's still overriding the mime type as expected.

A new try run shows that there are only failures that seem completely unrelated to me (running them locally gives different results, which are the same results that I get without this patchset applied): https://treeherder.mozilla.org/#/jobs?repo=try&revision=dab588e6c61e
Comment 43 Thomas Wisniewski 2016-06-18 10:11:38 PDT
Alright, I've requested review for the non-test-change patches, as I think this is about where it needs to be.

Note You need to log in before you can comment on or make changes to this bug.