Closed
Bug 651072
Opened 14 years ago
Closed 13 years ago
Support HTML parsing in XMLHttpRequest per XMLHttpRequest Level 2
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
mozilla11
People
(Reporter: hsivonen, Assigned: hsivonen)
References
Details
(Keywords: dev-doc-complete)
Attachments
(4 files, 4 obsolete files)
45.30 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
1.33 KB,
patch
|
mossop
:
review+
|
Details | Diff | Splinter Review |
14.68 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
12.00 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
Add support for parsing text/html into a DOM in XMLHttpRequest per the XMLHttpRequest Level 2 spec:
http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#document-response-entity-body
Assignee | ||
Comment 1•13 years ago
|
||
This is going to need a lot of testing...
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•13 years ago
|
||
Attachment #562026 -
Attachment is obsolete: true
Assignee | ||
Comment 3•13 years ago
|
||
Attachment #565887 -
Attachment is obsolete: true
Assignee | ||
Comment 4•13 years ago
|
||
As discussed earlier, HTML is not supported for multipart responses, because the current multipart infrastructure doesn't work with a parser that doesn't finish synchronously when OnStopRequest fires and it's unclear if there's demand for multipart support.
Attachment #571005 -
Attachment is obsolete: true
Attachment #573822 -
Flags: review?(bugs)
Assignee | ||
Comment 5•13 years ago
|
||
Currently, the extension updater relies on error pages getting a null responseXML. When HTML parsing is supported, text/html error pages result in a non-null responseXML. However, it's not OK to check the XHR status field for the success code 200, because the extension manager also does requests whose responses don't expose a status code (I did check if this was due to security concerns or due to the channel not being an HTTP channel).
Attachment #573824 -
Flags: review?(dtownsend)
Comment 6•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #5)
> Created attachment 573824 [details] [diff] [review] [diff] [details] [review]
> Make extension updater treat text/html responses as errors
>
> Currently, the extension updater relies on error pages getting a null
> responseXML. When HTML parsing is supported, text/html error pages result in
> a non-null responseXML. However, it's not OK to check the XHR status field
> for the success code 200, because the extension manager also does requests
> whose responses don't expose a status code (I did check if this was due to
> security concerns or due to the channel not being an HTTP channel).
Actually mostly what we expect for non-XML pages is that the responseXML has the namespace XMLURI_PARSE_ERROR. What is the content of a failed html parse? Can we just turn off html parsing in this case? Html is not a valid response here.
Assignee | ||
Comment 7•13 years ago
|
||
(In reply to Dave Townsend (:Mossop) from comment #6)
> Actually mostly what we expect for non-XML pages is that the responseXML has
> the namespace XMLURI_PARSE_ERROR.
With text/html responses, the root element with have the http://www.w3.org/1999/xhtml namespace.
> What is the content of a failed html parse?
HTML parsing can't fail (except for Out of Memory), so if the response is of type text/html, responseXML will be non-null.
> Can we just turn off html parsing in this case?
I tried to make the extension manager override the MIME type to application/xml, but browser_manualupdates.js failed anyway. I didn't investigate further, though maybe I should.
Assignee | ||
Comment 8•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #7)
> with have
will have
Assignee | ||
Comment 9•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #7)
> I tried to make the extension manager override the MIME type to
> application/xml, but browser_manualupdates.js failed anyway. I didn't
> investigate further, though maybe I should.
Looking at the source, it seems like there has to be a channel type that is involved in browser_manualupdates.js and that really honor nsIChannel::SetContentType().
Assignee | ||
Comment 10•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #9)
> that really honor nsIChannel::SetContentType().
that *does not*
I can't type English correctly today, it seems.
Comment 11•13 years ago
|
||
Comment on attachment 573824 [details] [diff] [review]
Make extension updater treat text/html responses as errors
(In reply to Henri Sivonen (:hsivonen) from comment #10)
> (In reply to Henri Sivonen (:hsivonen) from comment #9)
> > that really honor nsIChannel::SetContentType().
>
> that *does not*
>
> I can't type English correctly today, it seems.
Strange, everything I can see is using http there. That said I've realised which bit of the code this is now (more caffeine helped!) and it might actually be ok to allow http in this case, but I'd probably need to test that a bit better. This is fine for now.
Attachment #573824 -
Flags: review?(dtownsend) → review+
Comment 12•13 years ago
|
||
Comment on attachment 573822 [details] [diff] [review]
Support HTML in XHR, with tests and with sync XHR support and non-support for multipart
Btw, I filed https://bugzilla.mozilla.org/show_bug.cgi?id=701787
But as of now, it doesn't affect to this bug.
> static nsresult CreateDocument(const nsAString& aNamespaceURI,
> const nsAString& aQualifiedName,
> nsIDOMDocumentType* aDoctype,
> nsIURI* aDocumentURI,
> nsIURI* aBaseURI,
> nsIPrincipal* aPrincipal,
> nsIScriptGlobalObject* aScriptObject,
> bool aSVGDocument,
>- nsIDOMDocument** aResult);
>+ nsIDOMDocument** aResult,
>+ bool aForceHtml = false);
ok, now it is happening... Adding another "force this document".
Could you perhaps replace 'bool aSVGDocument' with some enum.
It could have values none, html, svg for now.
IIRC, there aren't really that many nsContentUtils::CreateDocument calls in the tree, so the change should
be pretty small.
>- mLoadLengthComputable(false), mLoadTotal(0),
>+ mLoadLengthComputable(false),
>+ mIsHtml(false),
>+ mLoadTotal(0),
Shouldn't you initialize also mWarnAboutMultipartHtml
>+ if (mWarnAboutMultipartHtml) {
>+ mWarnAboutMultipartHtml = false;
>+ nsContentUtils::ReportToConsole(nsContentUtils::eDOM_PROPERTIES,
>+ "HTMLMultipartXHRWarning",
>+ nsnull,
>+ 0,
>+ nsnull, // Response URL not kept around
>+ EmptyString(),
>+ 0,
>+ 0,
>+ nsIScriptError::warningFlag,
>+ "DOM Events",
Is there some better string for this than "DOM Events".
>@@ -2041,49 +2087,67 @@ nsXMLHttpRequest::OnStopRequest(nsIReque
> }
> }
>
> channel->SetNotificationCallbacks(nsnull);
> mNotificationCallbacks = nsnull;
> mChannelEventSink = nsnull;
> mProgressEventSink = nsnull;
>
>- mState &= ~XML_HTTP_REQUEST_SYNCLOOPING;
>-
> if (NS_FAILED(status)) {
> // This can happen if the server is unreachable. Other possible
> // reasons are that the user leaves the page or hits the ESC key.
>
> mErrorLoad = true;
> mResponseXML = nsnull;
> }
>
>- NS_ASSERTION(!parser || parser->IsParserEnabled(),
>- "Parser blocked somehow?");
>-
> // If we're uninitialized at this point, we encountered an error
> // earlier and listeners have already been notified. Also we do
> // not want to do this if we already completed.
> if (mState & (XML_HTTP_REQUEST_UNSENT |
> XML_HTTP_REQUEST_DONE)) {
>+ mState &= ~XML_HTTP_REQUEST_SYNCLOOPING;
> return NS_OK;
> }
This change makes the method a bit more error prone.
It is easier to forget to remove XML_HTTP_REQUEST_SYNCLOOPING.
So, add a comment to be careful with XML_HTTP_REQUEST_SYNCLOOPING handling.
>+class nsXHRParseEndListener : public nsIDOMEventListener
>+{
>+public:
>+ NS_DECL_ISUPPORTS
>+ nsresult HandleEvent(nsIDOMEvent *event) {
NS_IMETHOD HandleEvent
and { should be in the next line.
>+ mXHR->ChangeStateToDone();
>+ mXHR = nsnull;
>+ return NS_OK;
>+ }
>+ nsXHRParseEndListener(nsXMLHttpRequest* aXHR) : mXHR(aXHR) {}
>+ virtual ~nsXHRParseEndListener() {}
>+private:
>+ nsRefPtr<nsXMLHttpRequest> mXHR;
>+};
So, XHR keeps the document alive, and document keeps event listener manager alive,
and event listener manager keeps nsXHRParseEndListener alive which keeps XHR alive...
Could you have a raw reference to XHR here, and XHR could also have a raw reference to this listener,
and if either one is being deleted, the variables would be set to null.
(I wonder why the patch doesn't leak right now, but I assume it could leak in some cases)
>@@ -694,17 +694,17 @@ nsHTMLDocument::StartDocumentLoad(const
>
> if (loadAsHtml5 && !viewSource &&
> (!(contentType.EqualsLiteral("text/html") || plainText) &&
> aCommand && !nsCRT::strcmp(aCommand, "view"))) {
> loadAsHtml5 = false;
> }
>
> // TODO: Proper about:blank treatment is bug 543435
>- if (loadAsHtml5 && !viewSource) {
>+ if (loadAsHtml5 && aCommand && !nsCRT::strcmp(aCommand, "view")) {
Could you explain this change.
Attachment #573822 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 13•13 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #12)
> ok, now it is happening... Adding another "force this document".
> Could you perhaps replace 'bool aSVGDocument' with some enum.
> It could have values none, html, svg for now.
I added an enum, but I used a more descriptive value instead of "none". :-)
> >- mLoadLengthComputable(false), mLoadTotal(0),
> >+ mLoadLengthComputable(false),
> >+ mIsHtml(false),
> >+ mLoadTotal(0),
> Shouldn't you initialize also mWarnAboutMultipartHtml
Yes.
> >+ if (mWarnAboutMultipartHtml) {
> >+ mWarnAboutMultipartHtml = false;
> >+ nsContentUtils::ReportToConsole(nsContentUtils::eDOM_PROPERTIES,
> >+ "HTMLMultipartXHRWarning",
> >+ nsnull,
> >+ 0,
> >+ nsnull, // Response URL not kept around
> >+ EmptyString(),
> >+ 0,
> >+ 0,
> >+ nsIScriptError::warningFlag,
> >+ "DOM Events",
> Is there some better string for this than "DOM Events".
I don't know. I don't know what consumes that string and what strings it recognizes.
> So, add a comment to be careful with XML_HTTP_REQUEST_SYNCLOOPING handling.
Added.
> >+class nsXHRParseEndListener : public nsIDOMEventListener
> >+{
> >+public:
> >+ NS_DECL_ISUPPORTS
> >+ nsresult HandleEvent(nsIDOMEvent *event) {
> NS_IMETHOD HandleEvent
> and { should be in the next line.
OK.
> >+ mXHR->ChangeStateToDone();
> >+ mXHR = nsnull;
> >+ return NS_OK;
> >+ }
> >+ nsXHRParseEndListener(nsXMLHttpRequest* aXHR) : mXHR(aXHR) {}
> >+ virtual ~nsXHRParseEndListener() {}
> >+private:
> >+ nsRefPtr<nsXMLHttpRequest> mXHR;
> >+};
>
> So, XHR keeps the document alive, and document keeps event listener manager
> alive,
> and event listener manager keeps nsXHRParseEndListener alive which keeps XHR
> alive...
> Could you have a raw reference to XHR here, and XHR could also have a raw
> reference to this listener,
> and if either one is being deleted, the variables would be set to null.
Wouldn't raw references be more error-prone than having an owning reference and a guarantee that the event will fire, so the reference will get nulled out?
> (I wonder why the patch doesn't leak right now, but I assume it could leak
> in some cases)
When the listener is added, it's clear that the parser has gotten far enough that DOMContentLoaded *will* fire. And nsXHRParseEndListener drops its XHR reference after the event has been handled.
> >- if (loadAsHtml5 && !viewSource) {
> >+ if (loadAsHtml5 && aCommand && !nsCRT::strcmp(aCommand, "view")) {
> Could you explain this change.
Now that this line of code can see a broader range of parser commands, it's important to limit the special case to loading about:blank to a docshell. Otherwise, if about:blank is loaded via XHR, the old parser would be activated, which wouldn't be good.
Attachment #574575 -
Flags: review?(bugs)
Comment 14•13 years ago
|
||
Comment on attachment 574575 [details] [diff] [review]
Address review comments
> class nsXHRParseEndListener : public nsIDOMEventListener
> {
> public:
> NS_DECL_ISUPPORTS
>- nsresult HandleEvent(nsIDOMEvent *event) {
>+ NS_IMETHOD HandleEvent(nsIDOMEvent *event)
>+ {
>+ NS_ASSERTION(mXHR, "This event is supposed to fire exactly once.");
> mXHR->ChangeStateToDone();
>- mXHR = nsnull;
>+ mXHR = nsnull; // Break the reference cycle
> return NS_OK;
> }
> nsXHRParseEndListener(nsXMLHttpRequest* aXHR) : mXHR(aXHR) {}
> virtual ~nsXHRParseEndListener() {}
> private:
> nsRefPtr<nsXMLHttpRequest> mXHR;
This feels still risky. Could you use nsWeakPtr. (XHR happens to extend nsSupportsWeakReference already)
Attachment #574575 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 15•13 years ago
|
||
Thanks for the reviews. Landed with nsWeakPtr.
https://hg.mozilla.org/integration/mozilla-inbound/rev/75ace50df008
https://hg.mozilla.org/integration/mozilla-inbound/rev/5135cc4e6a36
Flags: in-testsuite+
Target Milestone: --- → mozilla11
Comment 16•13 years ago
|
||
Backed out 75ace50df008 in an attempt to fix test_XHRSendData.html crashes after a bad merge from mozilla-central to inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d0c677daedff
Example crash:
https://tbpl.mozilla.org/php/getParsedLog.php?id=7432104&tree=Mozilla-Inbound
Target Milestone: mozilla11 → ---
Comment 17•13 years ago
|
||
The problem is (in the redirect test)
if (NS_FAILED(status)) {
// This can happen if the server is unreachable. Other possible
// reasons are that the user leaves the page or hits the ESC key.
mErrorLoad = true;
mResponseXML = nsnull;
}
and then
nsCOMPtr<nsIDOMEventTarget> eventTarget = do_QueryInterface(mResponseXML);
nsEventListenerManager* manager = eventTarget->GetListenerManager(true);
where status is NS_ERROR_NOT_AVAILABLE. (Which comes from nsInputStreamPump::mStatus.)
Haven't investigated further.
Updated•13 years ago
|
Keywords: dev-doc-needed
Comment 18•13 years ago
|
||
Assignee | ||
Comment 19•13 years ago
|
||
(In reply to Ms2ger from comment #17)
> The problem is (in the redirect test)
>
> if (NS_FAILED(status)) {
> // This can happen if the server is unreachable. Other possible
> // reasons are that the user leaves the page or hits the ESC key.
>
> mErrorLoad = true;
> mResponseXML = nsnull;
> }
>
> and then
>
> nsCOMPtr<nsIDOMEventTarget> eventTarget =
> do_QueryInterface(mResponseXML);
> nsEventListenerManager* manager = eventTarget->GetListenerManager(true);
I changed this part to deal with null mResponseXML, but...
> where status is NS_ERROR_NOT_AVAILABLE. (Which comes from
> nsInputStreamPump::mStatus.)
>
> Haven't investigated further.
The error still propagates via mErrorLoad in ways that tests don't expect.
What did your patches change to make the NS_ERROR_NOT_AVAILABLE occur where it didn't before?
Assignee | ||
Comment 20•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #19)
> What did your patches change to make the NS_ERROR_NOT_AVAILABLE occur where
> it didn't before?
I think I eyeballed through all the patches that came from the m-c side of the merge, but I didn't see anything that'd explain why something now says NS_ERROR_NOT_AVAILABLE but didn't when I pushed to try before the merge.
Assignee | ||
Comment 21•13 years ago
|
||
Ms2ger, whatever is making this fail is the same thing that made you see the "cannot SetMetaDataElement" warning a couple of days ago. That's where the bad rv starts propagating.
Right about here:
http://mxr.mozilla.org/mozilla-central/source/content/html/document/src/nsHTMLDocument.cpp#947
Do you recall which change made you see that warning?
Assignee | ||
Comment 22•13 years ago
|
||
This supplemental patch:
1) Deals with null mResponseXML after error
2) Avoids propagating nsresult from the cache when a late write to the cache entry fails and the error gets overwritten in the XML case anyway.
3) Makes the synchronous mode behave as if HTML parsing support had never been added (except for a console warning).
Attachment #575157 -
Attachment is obsolete: true
Attachment #575813 -
Flags: review?(bugs)
Comment 23•13 years ago
|
||
Comment on attachment 575813 [details] [diff] [review]
Supplemental patch to fix the problems that lead to backout and to unsupport the synchronous mode
>@@ -733,16 +734,30 @@ nsXMLHttpRequest::GetResponseXML(nsIDOMD
> nsnull, // Response URL not kept around
> EmptyString(),
> 0,
> 0,
> nsIScriptError::warningFlag,
> "DOM Events",
> mOwner->WindowID());
> }
>+ if (mWarnAboutSyncHtml) {
>+ mWarnAboutSyncHtml = false;
>+ nsContentUtils::ReportToConsole(nsContentUtils::eDOM_PROPERTIES,
>+ "HTMLSyncXHRWarning",
>+ nsnull,
>+ 0,
>+ nsnull, // Response URL not kept around
>+ EmptyString(),
>+ 0,
>+ 0,
>+ nsIScriptError::warningFlag,
>+ "DOM Events",
>+ mOwner->WindowID());
>+ }
"DOM Events" is just wrong. Just use "DOM". It is used on some case.
Fix also the other error reporting.
Attachment #575813 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 24•13 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #23)
> "DOM Events" is just wrong. Just use "DOM". It is used on some case.
> Fix also the other error reporting.
Thanks. Landed with s/DOM Events/DOM/:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0c4d3b7be17e
Comment 25•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 704284
Assignee | ||
Comment 26•13 years ago
|
||
Notes for documentation:
* HTML parsing is not supported in the synchronous mode (because the synchronous mode fits badly into the architecture of Gecko and the Web platform and we don't want to let people use it more than what the legacy requires)
* HTML support when responseType == "" should be considered to be at risk. If supporting HTML in the default mode causes too much trouble with legacy uses of XHR (particularly because error pages with text/html response bodies now get a non-null responseXML), we might have to disable HTML support for responseType == ""
* HTML support in the asynchronous mode with responseType == "document" can be safely be treated as a feature that will appear in Firefox 11 (unless something totally unforeseeable happens).
* If there is no declared encoding for incoming text/html content (on the HTTP level, by using a BOM or in a <meta> tag within the first 1024 bytes), the default encoding is UTF-8.
Assignee | ||
Comment 27•13 years ago
|
||
Drafted some initial docs: https://developer.mozilla.org/en/HTML_in_XMLHttpRequest
The documentation assumes that the fix for bug 705072 will land.
Comment 28•13 years ago
|
||
This is triggering for HTTP server errors with HTML error pages. :-(
The caller is expecting XML or JSON, but you're parsing the HTML error page now. I don't think it's a good idea to parse the HTML at all unless the caller requests it, e.g. by calling .responseHTML or similar. XHR is mostly a protocol, but many servers respond with an HTML page when they are confused/broken, but most of the time, this error HTML page is useless for the calling app. Parsing it automatically is misguided, in most cases.
Assignee | ||
Comment 29•13 years ago
|
||
BenB, bug 705072 limits HTML parsing to responseType == "document". Legacy usage is resposeType == "". (The spec changed accordingly, too, yesterday.)
Comment 30•13 years ago
|
||
That's good, thanks. This also fixes us (bug 705216) and avoids that normal XMLHttpRequests are not affected. Good, thanks.
Updated•13 years ago
|
Target Milestone: --- → mozilla11
Comment 31•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #27)
> Drafted some initial docs:
> https://developer.mozilla.org/en/HTML_in_XMLHttpRequest
>
> The documentation assumes that the fix for bug 705072 will land.
That documentation has been overhauled and cleaned up by lots of people; it should be good.
Keywords: dev-doc-needed → dev-doc-complete
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•