Implement the Large-Allocation header

RESOLVED FIXED in Firefox 52

Status

()

defect
P3
normal
RESOLVED FIXED
3 years ago
4 months ago

People

(Reporter: Nika, Assigned: Nika)

Tracking

(Depends on 1 bug, Blocks 1 bug, {dev-doc-complete})

unspecified
mozilla52
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 fixed)

Details

Attachments

(2 attachments, 3 obsolete attachments)

Blocks: 1304141
Blocks: 1277066
Depends on: 1222516, 1267339
Priority: -- → P1
Just clarifying expectations around timeline here.
Priority: P1 → P3
Assignee: nobody → michael
Summary: Implement Document Isolation → Implement the Large-Allocation header
The plans here have changed to be focused on implementing a minimal non-semantics-changing form of requesting process switching, the Large-Allocation header. A short explainer of the ideas behind this header can be found in this document: https://gist.github.com/mystor/5739e222e398efc6c29108be55eb6fe3
In order to implement this hint, ehsan and I were talking about various strategies.

Ideally, we would be able to re-use the channel which is currently being used to fetch the document in the new process in order to perform the load. Ideally this would look something like the following:

* Load is initiated, and the channel is marked by the child process as being "safe to switch across processes"
  * This hint is set if there are no incoming or outgoing references to the window object which are visible from other content windows, such as an opener.
* When the headers are received in the parent process, if the Large-Allocation header is present, it checks the safe to switch processes flag. If that flag is not set, it aborts these steps and proceeds as normal (probably sending a message to devtools to emit a warning). We may want to make this decision in the child process with some sort of PreStartRequest handler which is run before OnStartRequest? I'm not sure.
* If that flag is set, the channel actors are torn down, and a new process is created. The new process somehow starts a special navigation in its new docshell. This navigation connects its actors to the existing nsHTTPChannel, and the load continues as normal.

(This is a little light on details, especially with regards to how history will need to be ported over (probably using session restore), and what will happen to the old actors (probably torn down), but I am trying to focus on the channel redirecting).

:jduell, I think you would be the person to know if this sort of swapping of target processes for channels and/or the load replaying is possible or just a pipe-dream. Any thoughts?
Flags: needinfo?(jduell.mcbugs)
I haven't tested this on an actual 32-bit system to see if it enables me to successfully consistently allocate 1GB. I worry that the process will fragment too much during startup for that allocation to succeed, so I may also have to do more work to reserve the memory up front on process startup.

I'll post here once I have tested that.
We have decided to implement at least an initial version of this patch without performing any complex channel redirection logic, instead performing multiple network requests when the header is identified.
Flags: needinfo?(jduell.mcbugs)
Attachment #8803534 - Flags: review?(bugs)
Attachment #8803535 - Flags: review?(bugs)
https://treeherder.mozilla.org/#/jobs?author=michael@thelayzells.com&repo=try

Try looks pretty green which suggests that the allocate gigabyte test passes!
My gut sense is that loading a channel in a new process is probably easiest achieved not by migrating a single HTTP channel across processes, but by having the new process "re-open" the URI with a new channel.  It sounds like this is the approach you're taking in this patch, so that's good. Re-opening should read the data from cache (unless the response comes with no-cache headers, and/or the response is bigger than our max-cache-size, which is 50 MB right now IIRC. We could add a flag or something to guarantee that the object stays cached if we need to.)
Comment on attachment 8803535 [details] [diff] [review]
Part 2: Add tests for the Large-Allocation header, r=

>+[browser_largeAllocation.js]
>+skip-if = !e10s # Large-Allocation not supported in e10s
er, what? "not supported in e10s"?

>+add_task(function*() {
>+  yield SpecialPowers.pushPrefEnv({
>+    set: [
>+      ["dom.ipc.processPriorityManager.testMode", true],
>+      ["dom.ipc.processPriorityManager.enabled", true],
>+      ["dom.largeAllocationHeader.enabled", true],
>+    ]
>+  });
So you don't need to enable dom.ipc.processCount ?
Why you need to use processPriorityManager?
Does "ipc:content-created" not work?

>+  yield SpecialPowers.pushPrefEnv({
>+    set: [
>+      ["dom.ipc.processPriorityManager.testMode", true],
>+      ["dom.ipc.processPriorityManager.enabled", true]
Also here, don't know why these are needed.

>+  // In an iframe, cross process navigation shouldn't work
I have no idea what this is trying to test and why...

>+  yield BrowserTestUtils.withNewTab("about:blank", function*(aBrowser) {
>+    ok(true, "Starting test 1");
>+    let pid1 = yield getPID(aBrowser);
>+
>+    // Fail the test if we create a process
>+    let stopExpectNoProcess = expectNoProcess();
>+
>+    yield ContentTask.spawn(aBrowser, TEST_URI, TEST_URI => {
>+      content.document.body.innerHTML = `<iframe src='${TEST_URI}'></iframe>`;
>+
>+      return new Promise(resolve => {
>+        content.document.body.querySelector('iframe').onload = () => {
>+          ok(true, "Iframe finished loading");
>+          resolve();
>+        };
>+      });
>+    });
>+
>+    let pid2 = yield getPID(aBrowser);
>+
>+    is(pid1, pid2, "The PID should not have changed");
...this checks pids of browsers, not iframes.
Nothing interesting has happened in the browser so why would process change.
Attachment #8803535 - Flags: review?(bugs) → review-
Comment on attachment 8803534 [details] [diff] [review]
Part 1: Implement support for the Large-Allocation header behind the dom.largeAllocationHeader.enabled pref, r=

>+++ b/dom/base/nsGkAtomList.h
>@@ -2470,8 +2470,10 @@ GK_ATOM(nsuri_svg, "http://www.w3.org/2000/svg")
> GK_ATOM(onsourceopen, "onsourceopen")
> GK_ATOM(onsourceended, "onsourceended")
> GK_ATOM(onsourceclosed, "onsourceclosed")
> GK_ATOM(onupdatestart, "onupdatestart")
> GK_ATOM(onupdate, "onupdate")
> GK_ATOM(onupdateend, "onupdateend")
> GK_ATOM(onaddsourcebuffer, "onaddsourcebuffer")
> GK_ATOM(onremovesourcebuffer, "onremovesourcebuffer")
>+
>+GK_ATOM(FreshProcess, "freshProcess")
Should be lowercase f, and this should go where other f* stuff are.

> ContentParent::GetNewOrUsedBrowserProcess(bool aForBrowserElement,
>                                           ProcessPriority aPriority,
>-                                          ContentParent* aOpener)
>+                                          ContentParent* aOpener,
>+                                          bool aFreshProcess)
> {
>   if (!sNonAppContentParents)
>     sNonAppContentParents = new nsTArray<ContentParent*>();
> 
>   int32_t maxContentProcesses = Preferences::GetInt("dom.ipc.processCount", 1);
>   if (maxContentProcesses < 1)
>     maxContentProcesses = 1;
> 
>-  if (sNonAppContentParents->Length() >= uint32_t(maxContentProcesses)) {
>+  if (sNonAppContentParents->Length() >= uint32_t(maxContentProcesses) &&
>+      !aFreshProcess) {
I think we really should honor maxContentProcesses here. In the test you could increase dom.ipc.processCount.
Or another option is to add some new pref controlling the number of "large allocation processes".
But we can't just let large allocation headers to always create new processes.
(and we need to have a test for that case too)

>+/* static */ bool
>+TabChild::AttemptLargeAllocationLoad(nsIHttpChannel* aChannel)
So this method in TabChild feel a bit odd. Most of it has nothing to do with TabChild and it is called in parent process too
in case non-remote xul:browsers are used.
Perhaps move to nsContentUtils.


>+  // XXX: Is the request method case sensitive? I'm not sure if I need to do the
>+  // lower case translation here.
>+  if (NS_WARN_IF(!requestMethod.LowerCaseEqualsLiteral("get"))) {
we seem to do this elsewhere too, but not always.
Looks like httpchannel defaults anyhow to GET internally, even if "get" was passed.
So this should be fine.

>+
>+  if (GetFrom(outer)->TakeIsFreshProcess())  {
This may crash, since GetFrom() may return null


>+  // At this point the fress process load should succeed! We just need to get
>+  // ourselves a nsIWebBrowserChrome3 to ask to perform the reload.
Why you think so. This code maybe called in parent process too.
(well, sure the 'if' above would have crashed in case this was called in parent process)

>+# LOCALIZATION NOTE: Do not translate "Large-Allocation", as it is a literal header name
>+LargeAllocationSuccess=This page was loaded in a new process due to a Large-Allocation header.
>+# LOCALIZATION NOTE: Do not translate "Large-Allocation", as it is a literal header name. Do not translate GET.
>+LargeAllocationNonGetRequest=A Large-Allocation header was ignored due to the load being triggered by a non-GET request.
>+# LOCALIZATION NOTE: Do not translate "Large-Allocation", as it is a literal header name
>+LargeAllocationRelatedBrowsingContexts=A Large-Allocation header was ignored due to the presence of windows which have a reference to this browsing context.
>+# LOCALIZATION NOTE: Do not translate "Large-Allocation", as it is a literal header name
>+LargeAllocationInIFrame=A Large-Allocation header was ignored due to the load occuring within an iframe.
>+# LOCALIZATION NOTE: Do not translate "Large-Allocation", as it is a literal header name
>+LargeAllocationNonE10S=A Large-Allocation header was ignored due to multiprocess FireFox being disabled.
You know, the browser using Gecko is called Firefox, not FireFox ;)


>+
>+    if (mozilla::Preferences::GetBool("dom.largeAllocationHeader.enabled", false)) {
I wouldn't mind if you cached this pref value using BoolVarCache
Attachment #8803534 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] from comment #10)
> I think we really should honor maxContentProcesses here. In the test you
> could increase dom.ipc.processCount.
> Or another option is to add some new pref controlling the number of "large
> allocation processes".
> But we can't just let large allocation headers to always create new
> processes.
> (and we need to have a test for that case too)

I think the best option would probably be to have two prefs, `dom.ipc.processCount` and `dom.ipc.maxProcessCount`, (or something like that), and only allow creating new processes up to maxProcessCount. We need to make sure that normal browsing doesn't use up all of our processes, otherwise our large allocations might fail. The entire point of this header is to create a _new_ process.

> >+  // At this point the fress process load should succeed! We just need to get
> >+  // ourselves a nsIWebBrowserChrome3 to ask to perform the reload.
> Why you think so. This code maybe called in parent process too.
> (well, sure the 'if' above would have crashed in case this was called in
> parent process)

Don't we early return already if `!XRE_IsContentProcess()`?

> You know, the browser using Gecko is called Firefox, not FireFox ;)

Shhhh
oh, I missed the XRE_IsContentProcess() check
"A Large-Allocation header was ignored due to multiprocess FireFox being disabled." for that case is wrong. One can still create non-remote <xul:browser>s even in e10s enabled FF.
(In reply to Olli Pettay [:smaug] from comment #13)
> "A Large-Allocation header was ignored due to multiprocess FireFox being
> disabled." for that case is wrong. One can still create non-remote
> <xul:browser>s even in e10s enabled FF.

Would a better wording be "A Large-Allocation header was ignored due to the document not being loaded out of process?" I imagine that the reason why that failure would be reached would be because of multiprocess Firefox being disabled, and I worry that the other error wouldn't be very useful.
Addons and such could add their own xul:browsers and we have also sidebar which, IIRC, doesn't use multiprocess etc. So I think your suggestion is fine (without '?').
Attachment #8803534 - Attachment is obsolete: true
Attachment #8803535 - Attachment is obsolete: true
MozReview-Commit-ID: B2UpbKd2ptp
Attachment #8804351 - Flags: review?(bugs)
I totally forgot to change the localization strings in the above updated patches, but I don't want to give your review queue more churn. I'll make sure to get it in for the next version.
Comment on attachment 8804351 [details] [diff] [review]
Part 2: Add tests for the Large-Allocation header

>+    // Allocate a gigabyte of memory in the content process
>+    yield ContentTask.spawn(aBrowser, null, () => {
>+      let arrayBuffer = new ArrayBuffer(1024*1024*1024);
>+      ok(arrayBuffer, "Successfully allocated a gigabyte of memory in content process");
So you've run this on tryserver couple of times and it didn't choke?
Attachment #8804351 - Flags: review?(bugs) → review+
Comment on attachment 8804350 [details] [diff] [review]
Part 1: Implement support for the Large-Allocation header behind the dom.largeAllocationHeader.enabled pref

+   * avaliable memory.
available 

+NS_IMETHODIMP nsContentTreeOwner::ReloadInFreshProcess(nsIDocShell* aDocShell,
+                                                       nsIURI* aURI,
+                                                       nsIURI* aReferrer,
+                                                       bool* _retval)
aRetVal


I'm not quite happy to propagate the bool param everywhere, but it doesn't quite fit into TabContext either. 

You probably want some browser/ peer to look at the patch too.
Attachment #8804350 - Flags: review?(bugs) → review+
Comment on attachment 8804350 [details] [diff] [review]
Part 1: Implement support for the Large-Allocation header behind the dom.largeAllocationHeader.enabled pref

Actually, since this is a rarely use feature, sending that message right after creating TabParent might not be a too bad idea after all.
Would keep rest of the codebase cleaner.

Could you update the patch to do that.
Attachment #8804350 - Flags: review+ → review-
Attachment #8804350 - Attachment is obsolete: true
/me ignores :smaug's review queue and ni?s him for another review anyways.
Flags: needinfo?(bugs)
Flags: needinfo?(bugs)
Attachment #8804839 - Flags: review+
Comment on attachment 8804839 [details] [diff] [review]
Part 1: Implement support for the Large-Allocation header behind the dom.largeAllocationHeader.enabled pref

Hey mike, is there any chance you could also look at this to make sure that the session store bits are up to snuff?
Attachment #8804839 - Flags: review+ → review?(mdeboer)
Comment on attachment 8804839 [details] [diff] [review]
Part 1: Implement support for the Large-Allocation header behind the dom.largeAllocationHeader.enabled pref

Review of attachment 8804839 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for asking! I don't see a problem here. r=me.
Attachment #8804839 - Flags: review?(mdeboer) → review+
Blocks: 1314098
No longer blocks: 1304141
Pushed by michael@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f0cee6a12df7
Part 1: Implement support for the Large-Allocation header behind the dom.largeAllocationHeader.enabled pref, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/9bbdaf510059
Part 2: Add tests for the Large-Allocation header, r=smaug
jukka, I just landed the initial version of this patch behind a pref (dom.largeAllocationHeader.enabled) on inbound. Is there any chance you could look into adding the headers into emunittest to see if it fixes the issues you are running into? You'll probably have to use noopener and https://developer.mozilla.org/en-US/docs/Web/API/BroadcastChannel to communicate between the test harness and the test pages, as windows which have an opener will be prevented from crossing process boundaries.

If you need any help with this, let me know and I can try to help you out! I'd love to be able to start either validating that this approach improves things, and/or look at ways we can improve its reliability even more.
Flags: needinfo?(jujjyl)
https://hg.mozilla.org/mozilla-central/rev/f0cee6a12df7
https://hg.mozilla.org/mozilla-central/rev/9bbdaf510059
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Blocks: 1314792
Depends on: 1320997
Reworked the emunittest suite to work with the new Large-Allocation header, and looks like it's working very well. Without the header, the suite has no chance of successfully completing in a 32-bit browser, which after only five tests is down to only about 320MB of contiguous memory. With the header, I'm getting a solid 1792MB of free contiguous heap space at every test start (suite has 33 tests run in succession) and all tests pass. 

Great work on this one Michael, let's take this proposal forward!
Flags: needinfo?(jujjyl)
Blocks: 1331083
The test of this patch does not work with multiple content processes. Could you please look into this? I'm about to land it on nightly and might have to temporarily turn off the test.
Flags: needinfo?(michael)
examples: https://treeherder.mozilla.org/#/jobs?repo=ash&selectedJob=70192835
bc6 linux64 asan
bc2 osx10.10 debug

running locally on OSX:
68 INFO TEST-UNEXPECTED-FAIL | dom/tests/browser/browser_largeAllocation.js | PIDs 2 and 3 should match - Got 36843, expected 36858
Stack trace:
    chrome://mochikit/content/browser-test.js:test_is:913
    chrome://mochitests/content/browser/dom/tests/browser/browser_largeAllocation.js:null:291
    @chrome://mochitests/content/browser/dom/tests/browser/browser_largeAllocation.js:262:9
    TaskImpl_run@resource://gre/modules/Task.jsm:319:42
    process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:917:23
    walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:801:7
    Promise*scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:734:11
    schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:765:7
    completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:702:7
    promise callback*completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:693:7
    TaskImpl_run@resource://gre/modules/Task.jsm:324:15
    promise callback*TaskImpl_handleResultValue@resource://gre/modules/Task.jsm:396:7
    TaskImpl_run@resource://gre/modules/Task.jsm:327:15
    process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:917:23
    walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:801:7
    Promise*scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:734:11
    schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:765:7
    completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:702:7
    receiveMessage@resource://testing-common/ContentTask.jsm:113:9
69 INFO TEST-UNEXPECTED-FAIL | dom/tests/browser/browser_largeAllocation.js | There should be 3 tabs - Got 2, expected 3
Stack trace:
    chrome://mochikit/content/browser-test.js:test_is:913
    chrome://mochitests/content/browser/dom/tests/browser/browser_largeAllocation.js:null:295
    @chrome://mochitests/content/browser/dom/tests/browser/browser_largeAllocation.js:262:9
    TaskImpl_run@resource://gre/modules/Task.jsm:319:42
    process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:917:23
    walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:801:7
    Promise*scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:734:11
    schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:765:7
    completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:702:7
    promise callback*completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:693:7
    TaskImpl_run@resource://gre/modules/Task.jsm:324:15
    promise callback*TaskImpl_handleResultValue@resource://gre/modules/Task.jsm:396:7
    TaskImpl_run@resource://gre/modules/Task.jsm:327:15
    process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:917:23
    walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:801:7
    Promise*scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:734:11
    schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:765:7
    completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:702:7
    receiveMessage@resource://testing-common/ContentTask.jsm:113:9
70 INFO TEST-UNEXPECTED-FAIL | dom/tests/browser/browser_largeAllocation.js | Uncaught exception - at chrome://browser/content/tabbrowser.xml:2497 - TypeError: aTab is undefined
Stack trace:
    removeTab@chrome://browser/content/tabbrowser.xml:2497:1
    @chrome://mochitests/content/browser/dom/tests/browser/browser_largeAllocation.js:298:5
    @chrome://mochitests/content/browser/dom/tests/browser/browser_largeAllocation.js:262:9
    @chrome://mochitests/content/browser/dom/tests/browser/browser_largeAllocation.js:262:9
    Tester_execTest@chrome://mochikit/content/browser-test.js:737:9
    Tester.prototype.nextTest</<@chrome://mochikit/content/browser-test.js:657:7
    SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:744:59
    Tester_execTest@chrome://mochikit/content/browser-test.js:737:9
    Tester.prototype.nextTest</<@chrome://mochikit/content/browser-test.js:657:7
    SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:744:59
71 INFO TEST-UNEXPECTED-FAIL | dom/tests/browser/browser_largeAllocation.js | Found an unexpected tab at the end of test run: about:blank - 
Buffered messages finished
SUITE-END | took 112s
MacBook-Pro-50:mozilla-central gkrizsanits$ ./mach mochitest browser/components/contextualidentity/test/browser/browser_forgetAPI_EME_forgetThisSite.js
Blocks: 1332343
I filed bug 1332343
Flags: needinfo?(michael)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.