Last Comment Bug 765620 - XMLHttpRequest hangs (in pseudo-timeout) with responseType = "document" and certain HTML (DOM) structure
: XMLHttpRequest hangs (in pseudo-timeout) with responseType = "document" and c...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: 13 Branch
: x86 Windows XP
: -- normal (vote)
: mozilla16
Assigned To: Henri Sivonen (:hsivonen)
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-17 14:16 PDT by vsemozhetbyt
Modified: 2012-07-10 21:53 PDT (History)
9 users (show)
ryanvm: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Potential fix (14.98 KB, patch)
2012-06-19 06:52 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Review
Potential fix, v2 (14.98 KB, patch)
2012-06-20 00:14 PDT, Henri Sivonen (:hsivonen)
bugs: review+
Details | Diff | Review

Description vsemozhetbyt 2012-06-17 14:16:44 PDT
I. The problem heart.

I have tried to add some new XMLHttpRequest features (direct HTML-to-DOM parsing and timeout handlers) to my extension for rutracker.org and I have run into a problem.

This code nearly always hangs into timeout in Firefox 13 (just delete the mozBackgroundRequest line), 14, 15, 16 — both on 32 and 64 bits versions (you need to log in to reproduce):

var xhr = new XMLHttpRequest();
xhr.open("GET", "http://rutracker.org/forum/index.php", true);
xhr.mozBackgroundRequest = true;
xhr.timeout = 10000;
xhr.channel.loadFlags |= Components.interfaces.nsIRequest.LOAD_BYPASS_CACHE;
xhr.responseType = "document";
xhr.overrideMimeType('text/html; charset=windows-1251');
xhr.onload = function() {
	alert(this.responseXML.title);
}
xhr.onerror = function() {
	alert("Error!");
}
xhr.ontimeout = function() {
	alert("Timeout!");
}
xhr.send(null);


The issue apparently is caused by the xhr.responseType = "document", because this code works perfectly:

var xhr = new XMLHttpRequest();
xhr.open("GET", "http://rutracker.org/forum/index.php", true);
xhr.mozBackgroundRequest = true;
xhr.timeout = 10000;
xhr.channel.loadFlags |= Components.interfaces.nsIRequest.LOAD_BYPASS_CACHE;
xhr.overrideMimeType('text/html; charset=windows-1251');
xhr.onload = function() {
	alert(this.responseText.match(/<title>.+?<\/title>/i)[0]);
}
xhr.onerror = function() {
	alert("Error!");
}
xhr.ontimeout = function() {
	alert("Timeout!");
}
xhr.send(null);


The specification (http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html) says about the responseType = "document" case: Scripts in the resulting document tree will not be executed, resources referenced will not be loaded and no associated XSLT will be applied. The http tracking extensions (HttpFox or Live HTTP Headers) show only one main request without any subrequest (I suppose no iframes must be loaded in such XHR html parsing as well as any images, css, js and so on). In both cases, I have HTTP/1.1 200 OK in 150-200 ms and identical request and response headers. There are no CPU or memory overloads in the time of these requests.


II. The extra observations.

Then I wrote some script for DOM statistic and tried to analyze the problem pages:

var doc = content.document;
var root = doc.documentElement;
var text_char = root.textContent.length;
var elm_nodes = doc.evaluate(".//*", root, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength;
var txt_nodes = doc.evaluate(".//text()", root, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength;
var com_nodes = doc.evaluate(".//comment()", root, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength;
var all_nodes = doc.evaluate(".//node()", root, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength;
var max_nst_lv = 0;
var max_nst_lv_nodes = 0;
for (var level = 1, pattern = "./node()"; level <= 50; level++, pattern += "/node()") {
	var elm_num = doc.evaluate(pattern,root,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null).snapshotLength;
	if (elm_num) {
		max_nst_lv = level;
		max_nst_lv_nodes = elm_num;
	}
}

alert(
	text_char + "\ttext characters\n\n" +	
	elm_nodes + "\telement nodes\n" +
	txt_nodes + "\ttext nodes\n" +
	com_nodes + "\tcomment nodes\n" +
	all_nodes + "\tall nodes\n\n" +	
	max_nst_lv_nodes + " nodes in the " + max_nst_lv + " maximum nesting level\n"
);

Here you have some puzzling data:

1. The main site page (http://rutracker.org/forum/index.php) for authorized users with JavaScript toggled out has 49677 text characters, 4192 HTML elements, 4285 text nodes, 77 comments, 8554 nodes in all; 577 nodes in the 25th maximum nesting level.

2. The same page for not authorized users with JavaScript toggled out has 47831 text characters, 3336 HTML elements, 4094 text nodes, 73 comments, 7503 nodes in all; 1136 nodes in the 24th maximum nesting level. Well, there are no timeouts with this simpler DOM structure.

3. I had simplified the problem page structure (delete some frequently repetitive elements) then I had uploaded the new one to a test host and there were no timeouts as well.

4. The script elements and their position have a strong hold over the problem. If I delete all the 19 script elements, the timeouts disappear. If I delete all the script except the first one, I have the timeouts. If I delete most of scripts from the beginning and leave the last, well, five scripts elements the timeouts disappear. If I leave the empty scripts tags without src or code, I have the timeouts. If I replace all the scripts with other tag (i.e. style), the timeouts disappear.

5. I have written a script on PERL for creating test pages. Then I have create a page with 10 MB of code: 9732505 text characters, 25004 HTML elements, 25002 text nodes, 1000 comments, 51006 nodes in all; 1000 nodes in the 27th maximum nesting level. There were no timeouts. However, when I had added some scripts the timeouts returned.

III. Simple test case for debugging.

With this PERL script I have create a page with some minimum problem code:

use strict;
use warnings;
open(OUTPUT, '>:raw:encoding(UTF-8)', "test.html") or die "Cannot write to test.html: $!\n";
print OUTPUT
	"<!DOCTYPE html PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN' 'http://www.w3.org/TR/html4/loose.dtd'>\n" .
	"<html><head><meta http-equiv='Content-Type' content='text/html; charset=UTF-8'><title>Test</title></head><body>" .
	 (("<div class='abcd'>abcd" x 25 . "</div>" x 25 ) x 10 . "<script type='text/javascript'>var a = 1234;</script>") x 20 .
	"</body></html>\n";
close(OUTPUT);

The page has 20265 text characters, 5024 HTML elements, 5022 text nodes, 0 comments, 10046 nodes in all including 20 simplest script elements; 200 nodes in the 27th maximum nesting level. I have 10 timeouts in 10 test requests for this page.

Different DOM simplifications for this page give different and sometimes very puzzling effects (every next change was applied after returning to previous structure, no changes were applied together).

- move all the script elements to the end of code (nothing else changes): 0 timeouts in 10 requests.
- replace all the script elements with span elements with one attribute and the same text content (nothing is moved): 0 timeouts in 10 requests.
- reduce script text by 3 characters: 7 timeouts in 10 requests.
- delete script text (leave empty tags only): 6 timeouts in 10 requests.
- reduce div text to 1 character: 5 timeouts in 10 requests.
- delete div text (the page has no visible text): 7 timeouts in 10 requests.
- reduce class attribute of div elements to 1 character: 8 timeouts in 10 requests.
- delete class attribute of div elements: 1 timeouts in 10 requests.
- reduce script numbers to 2 (in the middle and the end of the code): 10 timeouts in 10 requests again.
- reduce script numbers to 1 (in the beginning of the code): still 10 timeouts in 10 requests (but if you move this script element to the code end there will be no timeouts at all).
- reduce div numbers (and their text nodes) by half (saving maximum nesting level): 3 timeouts in 10 requests.
- reduce maximum nesting level by half (element and text node numbers remain almost the same, while the number of nodes in the maximum nesting level increase by half): 7 timeouts in 10 requests.
- reduce maximum nesting level just to 3 (body/div/text or body/script/ text) saving element numbers: 8 timeouts in 10 requests.

IV. Interim resume.

May be Firefox allocates very limited resources for XMLHttpRequest HTML-to-DOM parsing and these ones are exhausted by some puzzling combination of DOM elements' size, number, type and order. Some strange effect is given by script elements and their order. These resources have to be enlarged and the strange element type and order dependences have to be removed.
Comment 1 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-18 09:19:48 PDT
Henri, do we end up blocking the XHR parser and never unblocking it or something?
Comment 2 Alex Vincent [:WeirdAl] 2012-06-18 16:06:05 PDT
Given that you're talking about the same source only slightly modified, these questions are probably off base.  But I have to ask anyway:

(1) Have you tried running this testcase against a local host instead of your remote server?
(2) Have you timed the execution of your Perl script?
(3) Are you loading static pages which your Perl script generated, or pages which the script is generating on the fly?
(4) Are there any HTTP redirects involved (say, from rutracker.org to www.rutracker.org )?

I ask because it may be relevant:  the clock starts when send() is called.  True, 10 seconds is a very long time for a XHR, but who knows what else might be going on?
Comment 3 vsemozhetbyt 2012-06-18 22:04:54 PDT
1. It is strange, I have no timeouts on the local host neither with http://rutracker.org/forum/index.php copy nor with the test page from the PERL script (Apache 2.2 on the 127.0.0.1, while on the test remote server and on the rutracker.org they have nginx).
2-3. I do not use PERL script as a CGI: it makes a static page on the disk less than in a second and then I upload this page to the host.
4. No, there are no redirect involved. In all cases I has HTTP/1.1 200 OK and all the code in a 200-300 ms.

Well, I guessed 10 sec may be not sufficient, and I often set a minute and still I had timeouts.
Comment 4 Henri Sivonen (:hsivonen) 2012-06-19 04:26:57 PDT
I, too, was unable to reproduce a problem using Apache.

That the problem depends on script elements being present but also depends on the server and doesn't reliably reproduce when one script element is present suggests that the problem here is that if a buffer boundary-dependent discretionary yield to the event loop occurs at a point where script execution would occur during normal page load, control fails to return to the parser.

It would be nice to have a minimized test case somewhere on the Web on a server running nginx, but it might fit possible to find the bug by reading the code with the previous paragraph in mind.
Comment 5 Henri Sivonen (:hsivonen) 2012-06-19 04:59:03 PDT
I don't see the problem when loading the page in the browser with cache cleared and JavaScript disabled.
Comment 6 Henri Sivonen (:hsivonen) 2012-06-19 05:07:10 PDT
Yeah, XHR and normal load with scripting disabled go through different code paths.
Comment 7 vsemozhetbyt 2012-06-19 05:47:57 PDT
Here are the test URLs:

The copy of the main page at rutracker.org for authorized users:
http://vsemozhetbyt.ru/temp/win_cyr/test_from_rutracker.html

The test page made by the perl script mentioned:
http://vsemozhetbyt.ru/temp/utf_8/test_from_perl.html

The server is nginx (this is my test host).
Just replace the URL in the xhr.open line in the first code snippet.
Comment 8 Henri Sivonen (:hsivonen) 2012-06-19 06:52:53 PDT
Created attachment 634380 [details] [diff] [review]
Potential fix

I still haven't seen the bug in action, but from code inspection I expect this patch to fix the bug.
Comment 9 vsemozhetbyt 2012-06-19 07:00:55 PDT
Sorry, I don't know the Nightly patching protocol. How can I test the patch if I use binary Nightly build on Windows XP with simple daily update?
Comment 10 Henri Sivonen (:hsivonen) 2012-06-20 00:14:17 PDT
Created attachment 634789 [details] [diff] [review]
Potential fix, v2

The previous patch had a silly || vs. && logic mistake in it. Sigh.

(In reply to vsemozhetbyt from comment #9)
> Sorry, I don't know the Nightly patching protocol. How can I test the patch
> if I use binary Nightly build on Windows XP with simple daily update?

Win32 and Linux x86_64 builds should appear in http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/hsivonen@iki.fi-1a3fd9c36bd4/ in a few hours. (The location doesn't exist yet.)
Comment 12 vsemozhetbyt 2012-06-20 04:39:25 PDT
@Henri Sivonen

I have tried this one: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/hsivonen@iki.fi-1a3fd9c36bd4/try-win32/firefox-16.0a1.en-US.win32.installer.exe

I have no timeouts at all with this build, neither for rutracker.org host nor for the test host (both rutracker.org copy and perl-script-test-page are loaded and parsed fine).

Thank you. Can you tell me whether this patch could be applied to Nightly auto-updates and to 13-15 versions?
Comment 13 Henri Sivonen (:hsivonen) 2012-06-20 06:48:46 PDT
Comment on attachment 634789 [details] [diff] [review]
Potential fix, v2

Requesting review since this is green on try and comment 12 indicates that this fixes the bug. There is no test case, because it's still not clear to me what exactly the bug was so I don't know what buffer boundary conditions to try to trigger with a test case.

(In reply to vsemozhetbyt from comment #12)
> @Henri Sivonen
> 
> I have tried this one:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/hsivonen@iki.fi-
> 1a3fd9c36bd4/try-win32/firefox-16.0a1.en-US.win32.installer.exe
> 
> I have no timeouts at all with this build, neither for rutracker.org host
> nor for the test host (both rutracker.org copy and perl-script-test-page are
> loaded and parsed fine).

Thank you.

> Thank you. Can you tell me whether this patch could be applied to Nightly
> auto-updates and to 13-15 versions?

If this passes review, the patch will most likely land in time for Firefox 16 and show up on Nightly shortly after landing. It's virtually certain this patch won't be provided for Firefox 13. While it would be technically possible for this patch to make it in for Firefox 15, I expect the patch not to get approval for fast-tracking to Firefox 15, so it will likely take until Firefox 16 for this to show up on the release channel.
Comment 14 vsemozhetbyt 2012-06-20 09:34:01 PDT
@Henri Sivonen

Thank you for your time and efforts. I hope the fix to be of use not only for me.
Comment 15 Henri Sivonen (:hsivonen) 2012-06-21 04:13:52 PDT
Replying to question on IRC:

The key parts of this fix are:
1) The stream parser will only emit script execution tree operations when loading content into an docshell and scripting is enabled.
2) When preventing script execution, the tree builder will now emit script execution prevention tree operations and those operations are not special the way this script running operation is, so they don't have to be the last operation in the flush queue.

With this change, when the stream parser is in a mode where script execution is going to be prevented, it will never speculate and it will never emit script running operations, so the main thread never goes near the code paths that normally involve script execution and continuations after execution. Instead, when executing tree operations, the operation for preventing scripts from executing is handled like any other operation.
Comment 16 Henri Sivonen (:hsivonen) 2012-06-21 04:15:40 PDT
Hopefully, it's easy to be convinced that the new code makes more sense than the old code even though it's harder to see where exactly the bug in the old code was. The bug must have involved the stream parser not ensuring that the script running operation was the last in a queue while the executor code still wanted it to be the last op in a queue whenever it appears in a queue at all.
Comment 17 mayhemydg 2012-06-22 05:24:06 PDT
Me and many other Firefox users could reproduce this issue: XHR with responseType='document' never firing onload even though it has finished downloading the page.


I don't understand why is that it goes to great lengths to do something equivalent to this:

var doc = document.implementation.createHTMLDocument('');
doc.documentElement.innerHTML = this.response;
Comment 18 Henri Sivonen (:hsivonen) 2012-06-22 11:25:00 PDT
If this gets r+ before July 11, please treat the patch as checkin-needed, since I'll be unable to land patches between now and July 11.
Comment 19 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-06-23 04:37:10 PDT
I'll review this next week.
Comment 20 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-06-23 15:20:01 PDT
Comment on attachment 634789 [details] [diff] [review]
Potential fix, v2

>+    void SetPreventScriptExecution(bool aPrevent) {
I'm not sure about the coding style in html5 parser, but in general
{ should go to its own line in this kind of case.
Comment 21 vsemozhetbyt 2012-07-08 14:04:09 PDT
Please, can anybody confirm the bug?

Here is a testcase again.

1. Open the JavaScript Editor (Scratchpad), toggle it to the browser context (see http://blog.mozilla.org/devtools/2011/08/15/introducing-scratchpad/ - the last paragraph of the part "A Word about Scopes"). This is the context of extensions as well.

2. Run this code:

var xhr = new XMLHttpRequest();
xhr.mozBackgroundRequest = true;
xhr.open("GET", "http://vsemozhetbyt.ru/temp/utf_8/test_from_perl.html", true);
xhr.timeout = 10000;
xhr.channel.loadFlags |= Components.interfaces.nsIRequest.LOAD_BYPASS_CACHE;
xhr.responseType = "document";
xhr.onload = function() {
	alert(this.responseXML.title);
}
xhr.ontimeout = function() {
	alert("Timeout");
}
xhr.onerror = function() {
	alert("HTTP error");
}
xhr.send(null);


I have a timeout for this test URL, for URL http://rutracker.org/forum/index.php for authorized users and even for URL of this bug (https://bugzilla.mozilla.org/show_bug.cgi?id=765620) when I am authorized in the bugzilla.
Comment 22 Ryan VanderMeulen [:RyanVM] 2012-07-09 17:55:17 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/3d6027262f4f
Comment 23 Ryan VanderMeulen [:RyanVM] 2012-07-10 15:48:43 PDT
https://hg.mozilla.org/mozilla-central/rev/3d6027262f4f
Comment 24 vsemozhetbyt 2012-07-10 21:53:33 PDT
Thank you, people.

Note You need to log in before you can comment on or make changes to this bug.