Closed Bug 1576714 Opened 1 year ago Closed 1 year ago

Initiate subframe process switches from parent process

Categories

(Core :: DOM: Content Processes, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla71
Fission Milestone M4
Tracking Status
firefox71 --- fixed

People

(Reporter: nika, Assigned: nika)

References

(Blocks 1 open bug)

Details

Attachments

(11 files, 2 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
17.29 KB, text/plain
Details
5.91 KB, text/plain
Details
27.12 KB, text/plain
Details
60.57 KB, image/png
Details
47 bytes, text/x-phabricator-request
Details | Review

Currently when a process switch would occur, the parent process asks the child process to trigger the process internally, and the subframe creates a BrowserBridgeChild directly.

As we always trigger the process switch from the parent process, we should start creating the new BrowserBridgeParent in the parent process, and send it down instead.

Depends on: 1579213
Blocks: 1490803
No longer blocks: fission
Status: NEW → ASSIGNED
Depends on: 1556489

This is useful in part 3, where the initialization will need to be called from
multiple places.

This flips the direction in which the BrowserBridge actor is generally created
such that it is generally created in the parent and sent down to a child
process.

This is done by making the decision about what kind of switch to perform in the
parent, and sending messages down to child processes async to orchestrate these
process changes.

Process launching is changed to use an async MozPromise-returning API in this
patch, though the actual process launching still occurs synchronously. A future
patch will enable performing async process launching through the
NewOrUsedBrowserProcess mechanism.

I know of at least a few timing issues which exist with the new logic,
especially around the state of the BrowsingContext during the process
transition. I decided to not try to fix all of these issues in this patch, as
many are complex and will require changing how we manage the lifecycle of
BrowsingContext substantially. I do, however, think that the new logic is more
reliable and has fewer timing issues than the previous logic.

This patch changes when the original HttpChannelChild gets canceled during a
process switch to be after when the process switch is completed. This is needed
to prevent the load event firing too early in the original content process.

This delays when the DocumentChannelChild is canceled during a process switch to
be after the switch has been completed, to prevent the load event firing too
early in the original content process.

This test was originally written to test HTTPResponseProcessSelection before it
was hooked up into the process switch machinery. It hooks into some parts of the
process switch process which should probably be removed in the future (such as
overriding the child listener component registration), and is broken under
fission anyway.

No longer depends on: 1556489
Attachment #9095597 - Attachment is obsolete: true
Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a303fc193f60
Part 1: Move BBC initialization into BrowserBridgeChild, r=kmag
https://hg.mozilla.org/integration/autoland/rev/6996b7705f06
Part 2: Remove mIPCOpen from PBrowserBridge actors, r=kmag
https://hg.mozilla.org/integration/autoland/rev/b91221da32c7
Part 3: Initiate subframe process switches from the parent, r=kmag
https://hg.mozilla.org/integration/autoland/rev/88e3b4b7fbaf
Part 4: Delay canceling original http channel during process switch, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/b3467f1bdde7
Part 5: Delay canceling original document channel during process switch, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/083967e704d2
Part 6: Remove browser_cross_process_redirect test, r=valentin
Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9a5072019168
Part 1: Move BBC initialization into BrowserBridgeChild, r=kmag
https://hg.mozilla.org/integration/autoland/rev/7dce423417d8
Part 2: Remove mIPCOpen from PBrowserBridge actors, r=kmag
https://hg.mozilla.org/integration/autoland/rev/ebda40f96884
Part 3: Initiate subframe process switches from the parent, r=kmag
https://hg.mozilla.org/integration/autoland/rev/eece722082c7
Part 4: Delay canceling original http channel during process switch, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/2e156655c31e
Part 5: Delay canceling original document channel during process switch, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/faecc9f35b49
Part 6: Remove browser_cross_process_redirect test, r=valentin
https://hg.mozilla.org/integration/autoland/rev/d0c49f00eb91
Part 7: Remove now-passing fail-if, r=kmag
Regressions: 1586338
Regressions: 1586244

I spent a good amount of time trying to figure this one (comment 13) out, and I think I'm going to give it a break for a little bit.

The specific issue causing this failure is that the screenshot taken is slightly different, namely it is missing the blank white space which would normally be reserved for the scrollbar.

Unfortunately, this isn't just an issue with the screenshot. If the test is run with --keep-open=true, it's visible that the oop iframe has no scrollbar at all, and is unscrollable. It does participate in hit-testing correctly, however, as the text within the oop iframe is selectable. I've attached a screenshot: https://bug1576714.bmoattachments.org/attachment.cgi?id=9100235

The main change which occurred in this bug was around the timing of messages. With the patch applied, the PBrowser actor is created earlier than before, and more time passes before information such as dimensions is made available to it from the embedder process. I have tried moving around various messages to make sure they're delivered in as similar of an order as possible to the previous set-up, and it has unfortunately not been successful.

I performed a diff of the specific messages being received by the new content process between a version without my patch applied and one with my patch applied, and found relatively few differences (https://bug1576714.bmoattachments.org/attachment.cgi?id=9100228). The one major difference I noticed was that the working version (new) received an extra PAPZ::Msg_RequestContentRepaint message. This lead me to begin looking at APZ as a potential culprit.

I collected some APZC logs from during the process switch. I noticed that the behaviour between the two cases was somewhat different here, with the working case doing a bunch more stuff (https://bug1576714.bmoattachments.org/attachment.cgi?id=9100230). I'm still not sure if this actually means much.

I also dumped the layers information (https://bug1576714.bmoattachments.org/attachment.cgi?id=9100232).

I'm not entirely sure what's going on here yet, but I believe it has something to do with offset or other similar information being computed incorrectly for APZ early during the setup of the new PBrowser, before the dimension information becomes available. This makes me worried that the issue is an existing race in our oopif code, which is exacerbated by my changes.

Flags: needinfo?(nika)

Manually resizing the iframe with script using devtools after the page has loaded also doesn't clear things up.

I should also have said that this only appears to happen when the OOP sub-document is a generic XML document that Firefox doesn't recognize (i.e. when we use nsXMLPrettyPrinter and XMLPrettyPrint.xsl to generate the document that is displayed).

(In reply to :Nika Layzell (ni? for response) from comment #18)

The specific issue causing this failure is that the screenshot taken is slightly different, namely it is missing the blank white space which would normally be reserved for the scrollbar.

So in the passing case the first iframe has a black white space where the scroll bars are on the other two iframes?

I'm not entirely sure what's going on here yet, but I believe it has something to do with offset or other similar information being computed incorrectly for APZ early during the setup of the new PBrowser, before the dimension information becomes available. This makes me worried that the issue is an existing race in our oopif code, which is exacerbated by my changes.

Hmm, my first suspicion would be layout code, apz doesn't have much to do with deciding whether or not we get scrollbars or allocated the space for them. I could see some scrollbar related decision changing in layout affecting the apz msgs we later send though. I think a good starting point to start looking at that code is nsHTMLScrollFrame::ReflowContents and see if the decision it makes is different and if so why (I hope I'm not leading you down the wrong path!).

Needinfo'ing Emilio per his request since he knows something about the XML pretty printer binding code.

Flags: needinfo?(emilio)

That said, I brought this up during the Layout standup meeting today and we thought avoiding this XML pretty printer bug shouldn't block your patches, Nika (sites relying on it are probably very rare). If you agree, I think Emilio would prefer that you disable the test and land your patches to make it easier for him to look at the issue.

Flags: needinfo?(nika)

Comment on attachment 9095596 [details]
Bug 1576714 - Part 6: Remove browser_cross_process_redirect test,

Revision D47313 was moved to bug 1578507. Setting attachment 9095596 [details] to obsolete.

Attachment #9095596 - Attachment is obsolete: true

This flag is only meant for window.open() stuff, so not relevant to iframes at
all.

This preserves the current fission behavior (which is quite broken) of always
showing scrollbars.

The way to control scrollbars for iframes (the scrolling attribute) is not
handled at all for Fission, I filed a bug and left a few FIXMEs.

That was actually not too annoying to figure out :)

Flags: needinfo?(emilio)
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ec7d3f3eab2a
Don't update scrollbar visibility for non-top-level remote browsers. r=nika
Attachment #9098677 - Attachment description: Bug 1576714 - Part 7: Remove now-passing fail-if, → Bug 1576714 - Part 6: Remove now-passing fail-if,
Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/21053f81f2cf
Part 1: Move BBC initialization into BrowserBridgeChild, r=kmag
https://hg.mozilla.org/integration/autoland/rev/7a8d759536e0
Part 2: Remove mIPCOpen from PBrowserBridge actors, r=kmag
https://hg.mozilla.org/integration/autoland/rev/7935fa1ac774
Part 3: Initiate subframe process switches from the parent, r=kmag
https://hg.mozilla.org/integration/autoland/rev/c5f3fab8d36d
Part 4: Delay canceling original http channel during process switch, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/6e6f09db8755
Part 5: Delay canceling original document channel during process switch, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/0594b5176839
Part 6: Remove now-passing fail-if, r=kmag
Flags: needinfo?(nika)
Regressions: 1589054
Blocks: 1609187
Depends on: 1610614
You need to log in before you can comment on or make changes to this bug.