When we have a well-formedness error in an XML doc loaded as data, should we
really generate a <parseerror> node? Or should we just clear out the DOM and
return? It seems like the latter would be a better idea; the question is
whether people out in the wild seriously depend on the former behavior...
Thoughts? I think this should be pretty easy to fix if we decide to do it (esp.
once bug 251354 is fixed).
I do actually think people depend on the current behaviour. But I also think
it'd be ok to break them as long as we provided some other mechanism for
detecting the parse error. For synconous loads like xmlhttprequest (and doc.load
once we do that) we should IMHO throw an exception.
Sync document.load() returns a boolean to indicate success or failure; it can
return false (as it does now) on parse error.
For xmlhttprequest, don't we already have a way to report errors? We should use
it if we do.
I agree that we should have a way to indicate whether the document was parsed
correctly, though. Maybe even add a boolean property on some document interface?
I think that if all the methods of loading indicate failure somehow, then it's
enough to just empty the document on failure since a successful load can never
result in an empty document. Though of course an additional flag wouldn't hurt,
but it might get tricky to figure out when to clear that flag...
OK, makes sense. Will look into doing this, then.
Oh, and of course, we should check how IE deals with errors.
I don't think XMLHttpRequest has any method other than the <parseerror> element
to report the actual error. If you are going to do something else, why not
standardize on what MS does:
> I don't think XMLHttpRequest has any method other than the <parseerror> element
> to report the actual error.
It has an onerror handler, looks like. Not completely trivial to use from C++,
but really easy from JS.
> why not standardize on what MS does
It's a thought. Unfortunately, it's not very clearly documented. For example,
the errorCode property is undocumented. Do we need compat in errorCode values
if we implement this? The filepos property doesn't say what it means by
"absolute file position" (is that bytes? Or what?).
(In reply to comment #7)
> It has an onerror handler, looks like. Not completely trivial to use from C++,
> but really easy from JS.
True, but what error information do you get?
> It's a thought. Unfortunately, it's not very clearly documented. For example,
> the errorCode property is undocumented. Do we need compat in errorCode values
> if we implement this?
I wouldn't think so, but you can start with
1) is deprecated and 2) is supposedly CE .Net 4.2, but they share values and I
expect are standardized across MSXML implementations.
I would think that we would not have to be identical to their errorCode, nor in
their reason so long as we agreed that non-zero errorCode was an error.
The filepos property doesn't say what it means by
> "absolute file position" (is that bytes? Or what?).
If you are really interested, I can experiment and try to find out.
> True, but what error information do you get?
Not much, really. Just that there was an error. Just like the return value
> If you are really interested, I can experiment and try to find out.
If people don't rely on it, I'm not that interested... if they do, I sort of am.
I think we should make this change, and make it as early as we can.
So the things we currently load as data are: XMLHttpRequest, DOMParser, anything loaded by the syncload service, XMLDocument.load, and some xforms and xslt stuff.
I think we should treat these as following:
XMLHttpRequest: Null responseXML (already happens if the response is not XML)
and dispatch an error event. For sync, throw.
Async XMLDocument.load: error event
syncload service: throw
We seem to already have some code in our tree that assumes it should look for a parsererror tag. We'll have to fix that, of course.
I do think we should do this asap so sites that depend on this have time to get fixed.
This is really annoying, there's no reason we should keep doing this for yet another release.
Let's see if I manage to do this for M9. I should.
Safari has also some kind of error document, which for example DOMParser
returns. Opera throws LSException "PARSE_ERR".
Neither of those, nor IE, seem to use onerror of XHR to indicate error in parsing.
So I'm not so sure how important this is or what should be done in each case.
With XHR you could just set the document to null, I guess.
I'll have to make changes to https://www.squarefree.com/userscripts/valid-xhtml.user.js (assuming this fixes bug 45566), but I'm fine with that.
Bug 399502 should be fixed first to get events from xmldoc.load()
As far as I see, XSLT generates the error page only if the page is loaded
to be visible (not as data). Document loading should be cleaned up a bit, but
for 1.9 XSLT behaves well enough, IMO.
XMLHttpRequest: Null responseXML
DOMParser: Throw LSException like Opera
Async XMLDocument.load: error event
syncload service: throw
XForms: Handles internally, report error to console as done already now.
XSLT: when used in scripts, do whatever documented in nsIXSLTProcessor.idl
DOMParser could also just return null document?
Created attachment 285230 [details] [diff] [review]
Peter, any comments?
Please include information about the XML parsing error in the exception: line number, column number, reason. https://www.squarefree.com/userscripts/valid-xhtml.user.js needs that information.
Ah, true, that would be useful.
Though, I don't know how to do it, or can't find any reasonable way to do it.
We've shipped this forever, right? Why should this block beta?
Yeah, this should not block beta. I'd even say it shouldn't even block the release at this point, but since there is a patch we might as well drive it in.
I'm minusing. That doesn't mean it won't go in, but we need to prioritize accordingly. Anyone who objects can renom.
I thought the idea was that this is a pretty major change in behavior so we should get it shipped in beta so that web sites can adjust to it as needed. That's if we're doing it for 1.9, of course.
Put another way, I would argue that if this doesn't make beta we shouldn't take it for 1.9.
I don't think this is a big enough change that if it doesn't make first beta we can't include it later. Most sites probably has no error handling at all so they assume that the XML is wellformed, and likely only ever gets that.
(In reply to comment #21)
> Please include information about the XML parsing error in the exception: line
> number, column number, reason.
The exception should contain also the JS line number etc where
for example parseFromString is called. Should the stack (nsIException.location)
contain the information about xml parsing error or perhaps inner exception
Comment on attachment 285230 [details] [diff] [review]
We can't throw for DOMParser parsing errors.
*** Bug 837463 has been marked as a duplicate of this bug. ***
What's the status of this now? I filed 837463 and it got marked as a duplicate of this, but there hasn't been any activity here for nearly a year.
Created attachment 8759512 [details] [diff] [review]
I decided to try my hand at unraveling this again, based on the comments in this bug. To summarize, this patch triggers onerror instead of onload for an XHR that tries and fails to parse its response as XML. The onerror ProgressEvent is also given a sub-object (tentatively named relatedError) with the related parsing failure's info (message, source, source line, and line/column number). This seems to be in-line with what the comments in this bug were aiming at.
However, there are some web platform test issues that I believe need clarification. Firstly, if the XML response body is an empty string, the second test in XMLHttpRequest/responsexml-basic.htm expects the responseXML to be nulled out and for onload to be triggered. This patch would trigger onerror with a "no element found" relatedError, so I added a check to allow 0-length XML responses to be treated as this test currently requires. I'm not sure if the test should be changed instead, though.
There is also DOMParser-parseFromString-xml.html, which actually expects <parsererror> nodes for DOMParser() failures. This runs counter to what the XMLHttpRequest web platform tests expect. However, I'm again not sure if it would be best to change the tests, and have DOMParser() throw exceptions with relatedErrors (or something along those lines). As such I opted to just honor the current tests and have DOMParser() continue to add <parsererror> nodes. This of course makes the patch a bit more complex; a new SetDocumentFlags() call is used by XHRs specifically to tell the XML content sink not to add a <parsererror> node to the resulting document. DOMParser() of course doesn't make this call.
There are also a couple of web platform tests (and one mochitest) that were timing out, simply because they're sending text/plain responses without setting their Content-Type header to reflect that. As such, the XHR expected XML content, failed to parse it, and onerror was triggered by this patch. The tests didn't even think to check for an onerror case, and so they sat there until they timed out. I altered the relevant tests so they return the correct content type, and to also check for and treat onerror as a failure, so they wouldn't have to time out.
Finally, there is XMLHttpRequest/responsetext-decoding.htm. With this patch, four of its tests time out similarly, but it seems like they may intend to treat the content-type as XML. The problem is that their responses are only a BOM or a BOM followed by another BOM, which this patch treats as a non-0-length case for onerror. However, the tests don't seem like they should care whether onload or onerror are fired, as long as the responseText is set appropriately, and so I've simply altered them to pass in either case. I'm not at all sure if that's correct, though.
As for my patch itself, I opted to save the nsIScriptError from the XML content sink on the related nsIDocument during parse-time, then save it on the XMLHttpRequest for future use during the onerror code (since the pointed to the nsIDocument can be nulled before then). I'm not sure if that's the most graceful or appropriate approach, as I'm still new to the codebase.
As mentioned before, I also added SetDocumentLoadFlags() to handle the DOMParser()/XMLHttpRequest divide, since this seemed like the easiest and most obvious way to handle that case. I'm not sure if another similar mechanism would be more appropriate, though.
I'm also not sure if adding a new WebIDL type for ParseErrors is the way to go, but it seemed to be the path of least complexity, so I thought I'd start with that.
Per the XMLHttpRequest spec:
"let document be a document that represents the result of running the XML parser with XML scripting support disabled on received bytes. If that fails (unsupported character encoding, namespace well-formedness error, etc.), then return null."
So no XML parsing failures will (per spec) cause XHR error events, as far as I can tell.
IMHO it would be nice to align DOMParser and not create those parsererrror elements - but my opinion shouldn't count for much here, I'll rather ask Boris to chime in on this.
Triggering XHR.onerror for responses sent without content-type (and assumed to be XML): no, please. This is not likely to be web compatible.
Maybe silly question: why do we make an effort to make error details available to JS at all? Can't we just report the details in a console error message and be done? Perhaps exposing those details made sense in 2005, but the web has moved on and XML is much less important. JS-implemented XML error handlers is probably a very rare beast..
Sure, if all that's desired is to null out the document before onload, then this patch would certainly become much simpler.
You could be right the exposing the error in a more fine-grained manner may no longer matter much, since I basing that off comment #21 (from 2007). I was concerned because while the console does show a terse error message indicating there was a parsing failure, it doesn't give you specifics (which XHR had the parse failure, what the source was, etc). Perhaps that should be extended instead?
I think reporting more info in the console would be great.
(Right now we often output utterly confusing and maddening "No element found" error messages with no context. There are plenty of questions on Stackoverflow regarding what this error means, improving it would be terrific).
annevk mentioned on IRC that the parsererror element thing is actually added to the DOM parsing standard now - I think it's a wart, but it was considered necessary for web compat so..
Created attachment 8760015 [details] [diff] [review]
Alright, here's a much simpler version of the patch which simply nulls out responseXML upon parsing failures, rather than doing anything with onerror. I'll see if I can find a way to improve the console messages, but I'm not sure what would be best. Perhaps even just prefixing them with "Error parsing XML: " would be a good start?
"Error parsing XML: no element found" would be a great improvement, especially if it's also clearly associated with a URL. I'd suggest changing the wording slightly and say "no root element found".
Also, is un-labelled XML a thing? Do other browsers assume XML if a site sends XML without sending a content-type? Perhaps assuming text/plain would work better :)
Created attachment 8760023 [details] [diff] [review]
True, browsers don't assume XML in that case, as Chrome and Firefox currently return response==responseText and responseXML=null for unspecified content-type responses. My earlier patch was just flawed and I didn't realize it at the time. The new patch doesn't have that kind of issue, thankfully.
As for changing the console error, this patch seems to do the job. The messages now have this format:
XML Parsing Error: no root element found
Line Number 1, Column 1:
Of course the line/column number info is superfluous, given that it's repeated in the little link to the far right of such messages, but I'd rather re-use this format string since it's already in the localization file.
> I'll rather ask Boris to chime in on this
I don't know that I have any new opinions on this since my last comments in the bug, modulo checking what other browsers do and hence what web compat is likely to require.
> why do we make an effort to make error details available to JS at all
You mean why is there a <parseerror> thing in general? Because that XML parser behavior was designed around an XML parser parsing content in a browsing context (i.e. yellow screen of death) and the idea was to show the user something at least moderately more useful than just a blank page, I think.... The side-effects on XHR and whatnot were not intended, but the parser behavior predated those things existing, so...
> Also, is un-labelled XML a thing?
In general (i.e. not just over HTTP), sure. Happens all the time for local files, right? ;)
If you create a file with no extension containing this text and fetch it via XHR to a file:// URI, what should happen and why? What should happen if its extension is .docx?
and then try to fetch it via XHR
Created attachment 8763159 [details] [diff] [review]
A try run uncovered two bugs: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f83f3523f74a
This patch fixes the first, where GMP has an "empty" XML document that's not really an XML document at all (no root element). Looking at the test, it's fine to just add an empty root element, since it's currently passing because there's a parsererror that it's ignoring.
Created attachment 8763160 [details] [diff] [review]
This patch fixes the other related failures in the above try run, where XPC shell tests are reading a non-XML file to test overriding the mime type, which of course fails because now the responseXML is null. As such I change the test to read an actual XML file and make sure it's still overriding the mime type as expected.
A new try run shows that there are only failures that seem completely unrelated to me (running them locally gives different results, which are the same results that I get without this patchset applied): https://treeherder.mozilla.org/#/jobs?repo=try&revision=dab588e6c61e
Alright, I've requested review for the non-test-change patches, as I think this is about where it needs to be.