Proxy nsDocumentViewer::PermitUnload to the child process (make beforeunload events and dialogs work for remote tabs / e10s)

RESOLVED FIXED in Firefox 45

Status

()

defect
P3
normal
RESOLVED FIXED
5 years ago
Last year

People

(Reporter: billm, Assigned: billm)

Tracking

(Depends on 1 bug, Blocks 1 bug)

unspecified
Firefox 45
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10sm7+, firefox45 fixed)

Details

Attachments

(10 attachments, 3 obsolete attachments)

3.60 KB, patch
Gijs
: review+
Details | Diff | Splinter Review
12.06 KB, patch
Gijs
: review+
Details | Diff | Splinter Review
21.98 KB, patch
Gijs
: review+
Details | Diff | Splinter Review
41.86 KB, patch
Gijs
: review+
Details | Diff | Splinter Review
1.73 KB, patch
Gijs
: review+
Details | Diff | Splinter Review
5.59 KB, patch
Gijs
: review+
Details | Diff | Splinter Review
1.38 KB, patch
Gijs
: review+
Details | Diff | Splinter Review
7.49 KB, patch
mconley
: review+
Details | Diff | Splinter Review
2.67 KB, patch
Gijs
: review+
Details | Diff | Splinter Review
47.85 KB, patch
Gijs
: review+
Details | Diff | Splinter Review
This seems tricky, since PermitUnload is typically called synchronously from browser.xul's onclose event handler. One option is to use CPOWs. Another option is to not close the window, and then wait for the child to tell us whether we should close the window.

We have a similar issue with session restore and we use CPOWs there.
Mass tracking-e10s flag change. Filter bugmail on "2be0fcce-e36a-4e2c-aa80-0e3d33eb5406".
tracking-e10s: --- → +
Assignee: nobody → wmccloskey
Priority: -- → P3
Duplicate of this bug: 1047685
Move old M2 P3 bugs to M4 (because tracking-e10s=m5+ flag doesn't exist yet).

Updated

4 years ago
Summary: Proxy nsDocumentViewer::PermitUnload to the child process → Proxy nsDocumentViewer::PermitUnload to the child process (make beforeunload events and dialogs work for remote tabs / e10s)

Comment 4

4 years ago
Making this async is going to be tricky, because generally we need to tell the onclose/onunload handler whether closing is OK in sync fashion - so I guess we'd need to tell it "don't close" and then, if the content process OKs closing, re-attempt closing it.

Updated

4 years ago
Blocks: 1100700
Posted patch functional changes (obsolete) — Splinter Review
The basic strategy I took here is to send an async message to the child asking it to run PermitUnload and spin a nested event loop until it returns a response. Since PermitUnload already uses a nested event loop, we're not really adding any new nested event loops. The strategy of canceling the window/tab closing and then actually closing it later didn't seem workable to me. For one thing, it would be a nightmare to fix all the tests that rely on removeTab being synchronous (as it is I had to change many tests).

The logic for timeouts is a little tricky. I mainly just wanted to avoid the case where you close a window and we wait for a timeout to expire in each tab. So the code works fairly hard to ensure that a single timeout is enough to cancel all the PermitUnloads ongoing at the time.

The mCallerIsClosingWindow thing was also annoying, since I didn't want to have to broadcast ResetCloseWindow messages to everyone if the unload was canceled. Instead, I just made sure that we only call PermitUnload once for a given page. Mostly I did that by moving some of the logic into JS code, where we have more information about what's going on. nsGlobalWindow::CanClose is now implemented entirely in JS for chrome windows.
Attachment #8643400 - Flags: review?(gijskruitbosch+bugs)
Posted patch test changes (obsolete) — Splinter Review
Lots of tests couldn't cope with a nested event loop running during removeTab. I hacked around this by passing in a parameter that avoids doing permitUnload in these cases.

Usually the problem was just that some event was received during the removeTab code and we hadn't installed a handler for it yet or something. Sometimes bad stuff happened if, for example, some lazy tabview initialization ran during the removeTab event loop. Another problem was tests that didn't use waitForExplicitFinish: if they spin a nested event loop, we seem to start running the next test before the previous one has finished.

A really awful e10s-related problem is that we sometimes send a sync message up to the parent to inform it of some event. If the parent calls removeTab while handling that event, then we'll deadlock. The child can't respond to the permitUnload message until it has gotten a reply to the sync message.

Mostly I was able to fix these issues by making the messages async since they didn't actually need to be sync. In one or two cases I had to run the removeTab call based off of executeSoon. I looked at all the places where we call removeTab in production code and it doesn't seem like this can happen. Add-ons can trigger it unfortunately. In the worst case, that should trigger a timeout after 5 seconds.
Attachment #8643405 - Flags: review?(gijskruitbosch+bugs)

Comment 7

4 years ago
(In reply to Bill McCloskey (:billm) from comment #6)
> Created attachment 8643405 [details] [diff] [review]
> test changes
> 
> Lots of tests couldn't cope with a nested event loop running during
> removeTab. I hacked around this by passing in a parameter that avoids doing
> permitUnload in these cases.

This sounds very scary. Why couldn't we use BrowserTestUtils.removeTab? That already assumes that the tabclose will be async. I realize it means adjusting tests, too, but it will keep them a lot closer to "real" use.

(I've not done an actual review or anything yet, just reading bugmail atm. I'll try to get to this review today but it might have to be tomorrow...)
Flags: needinfo?(wmccloskey)
(In reply to :Gijs Kruitbosch from comment #7)
> This sounds very scary. Why couldn't we use BrowserTestUtils.removeTab? That
> already assumes that the tabclose will be async. I realize it means
> adjusting tests, too, but it will keep them a lot closer to "real" use.

That would be cleaner in the sense that it would make it easier to remove the nested event loop in the future. But it doesn't solve the problem of tests that aren't prepared for things happening during the removeTab call. I could try to fix those test issues more directly (moving around the registration of event handlers, fixing the tabview issues, etc.), but that would be significantly more work and I'm not sure we'd gain a whole lot from it.
Flags: needinfo?(wmccloskey)

Comment 9

4 years ago
Comment on attachment 8643400 [details] [diff] [review]
functional changes

Review of attachment 8643400 [details] [diff] [review]:
-----------------------------------------------------------------

This looks OK, but I have a few questions below, so I'll leave the review pending for now.

::: browser/base/content/tabbrowser.xml
@@ +1167,5 @@
>  
>                this._tabAttrModified(oldTab, ["selected"]);
>                this._tabAttrModified(this.mCurrentTab, ["selected"]);
>  
> +              // This isn't working quite right in e10s. Hangs when you later try to quit.

Is this comment still accurate? Because that is scary. Do you know what in particular is causing the hang?

@@ +1180,5 @@
> +                  // This prevents confusing user flows like the following:
> +                  //   1. User attempts to close Firefox
> +                  //   2. User switches tabs (ingoring a beforeunload prompt)
> +                  //   3. User returns to tab, presses "Leave page"
> +                  let promptBox = this.getTabModalPromptBox(oldBrowser);

This callback can be async, which means that presumably oldBrowser could be dead now (ie no longer in the window's DOM) - it looks like in that case, this method will work but calling anything on promptBox might blow up (if oldBrowser.parentNode is null, which is what I'm assuming right now, but I could be wrong...).

Or will the callback just never fire? In which case, are we going to leak? :-)

@@ +2176,5 @@
> +            if (!aTab._pendingPermitUnload && !aTabWillBeMoved && !aSkipPermitUnload) {
> +              // We need to block while calling permitUnload() because it
> +              // processes the event queue and may lead to another removeTab()
> +              // call before permitUnload() returns.
> +              const kTimeout = 5000;

This is defined as the same thing in several places, which makes me wonder if we should keep it somewhere else. Maybe a field on browser (the xbl thing)?

Comment 10

4 years ago
Comment on attachment 8643405 [details] [diff] [review]
test changes

Review of attachment 8643405 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/test/general/browser_bug455852.js
@@ +9,5 @@
>    EventUtils.synthesizeKey("w", { accelKey: true });
> +
> +  while (tab.parentNode) {
> +    yield new Promise(resolve => executeSoon(resolve));
> +  }

nit:
let tabClosedPromise = BrowserTestUtils.removeTab(tab, {dontRemove: true});
synthesizeKey(...);
yield tabClosedPromise;

::: browser/base/content/test/general/browser_bug521216.js
@@ +19,5 @@
>      is(actual.toString(), expected.toString(),
>         "got events and progress notifications in expected order");
> +
> +    let thisTab = tab;
> +    executeSoon(() => {

This change and looking at the bug that this testcase was created for makes me suspicious. What is this trying to do and why is it necessary?

::: browser/devtools/sourceeditor/test/helper_codemirror_runner.js
@@ +27,5 @@
>      '};' +
>      'function check() { ' +
>      '  var doc = content.document; var out = doc.getElementById("status"); ' +
>      '  if (!out || !out.classList.contains("done")) { return setTimeout(check, 100); }' +
> +    '  sendAsyncMessage("done", { failed: content.wrappedJSObject.failed });' +

Is this an unrelated whitespace change?

::: services/fxaccounts/FxAccountsOAuthClient.jsm
@@ +195,5 @@
>                };
>              }
>  
> +            // if the message asked to close the tab
> +            if (data.closeWindow && target) {

Moving this code doesn't seem like a test change... what's going on here, if you recall?

::: toolkit/mozapps/extensions/test/xpinstall/head.js
@@ +164,5 @@
>      this.installsCompletedCallback = null;
>      this.runningInstalls = null;
>  
>      if (callback)
> +      executeSoon(() => callback(count));

This looks evil. What's going on here? :-)
Attachment #8643405 - Flags: review?(gijskruitbosch+bugs)
Will upload new patches in a bit. Making my way through the comments now.

(In reply to :Gijs Kruitbosch from comment #9)
> > +              // This isn't working quite right in e10s. Hangs when you later try to quit.
> 
> Is this comment still accurate? Because that is scary. Do you know what in
> particular is causing the hang?

It is accurate! Thanks for reminding me :-). I tracked it down and it was a bug in how remote prompts work. When we removed the tabmodalprompt, an XBL destructor would run that would call the abort handler for the prompt. That would cause RemotePrompt.jsm's onPromptClose handler to run, which would try to remove the prompt again.
http://mxr.mozilla.org/mozilla-central/source/browser/modules/RemotePrompt.jsm#44
The second remove would throw a NotFoundException, causing us to skip the move that gets us out of the nested event loop.

I fixed this by making removePrompt ignore the NotFoundException. The alternative would be to fix RemotePrompt.jsm, but I couldn't think of an easy way to know whether we need to remove the prompt or not. In the non-abort case, we do want to remove it.

> @@ +1180,5 @@
> > +                  // This prevents confusing user flows like the following:
> > +                  //   1. User attempts to close Firefox
> > +                  //   2. User switches tabs (ingoring a beforeunload prompt)
> > +                  //   3. User returns to tab, presses "Leave page"
> > +                  let promptBox = this.getTabModalPromptBox(oldBrowser);
> 
> This callback can be async, which means that presumably oldBrowser could be
> dead now (ie no longer in the window's DOM) - it looks like in that case,
> this method will work but calling anything on promptBox might blow up (if
> oldBrowser.parentNode is null, which is what I'm assuming right now, but I
> could be wrong...).
> 
> Or will the callback just never fire? In which case, are we going to leak?
> :-)

If oldBrowser is removed from the DOM, it will no longer receive messages, so the callback will never fire. Removing oldBrowser from the DOM also disconnects all message listeners, so we won't leak.

> @@ +2176,5 @@
> > +            if (!aTab._pendingPermitUnload && !aTabWillBeMoved && !aSkipPermitUnload) {
> > +              // We need to block while calling permitUnload() because it
> > +              // processes the event queue and may lead to another removeTab()
> > +              // call before permitUnload() returns.
> > +              const kTimeout = 5000;
> 
> This is defined as the same thing in several places, which makes me wonder
> if we should keep it somewhere else. Maybe a field on browser (the xbl
> thing)?

I just took out the timeout parameter since we always wait 5 seconds. I moved the constant to permitUnload in remote-browser.xml.
"causing us to skip the move that gets us out of the nested event loop"

Somehow I meant to say "code" and I said "move" instead.
(In reply to :Gijs Kruitbosch from comment #10)
> nit:
> let tabClosedPromise = BrowserTestUtils.removeTab(tab, {dontRemove: true});
> synthesizeKey(...);
> yield tabClosedPromise;

I did not know about that rather interesting option. Fixed.

> ::: browser/base/content/test/general/browser_bug521216.js
> @@ +19,5 @@
> >      is(actual.toString(), expected.toString(),
> >         "got events and progress notifications in expected order");
> > +
> > +    let thisTab = tab;
> > +    executeSoon(() => {
> 
> This change and looking at the bug that this testcase was created for makes
> me suspicious. What is this trying to do and why is it necessary?

That code is triggered off of a progress listener. Progress listeners in tests use the add-on shims, which means that they run synchronously. This way they can use CPOWs to look at the state of the content process, knowing that it hasn't moved on already.

The problem here is that the content process is waiting synchronously for the progress listener to finish. When we call removeTab, we create a deadlock since the content process can't respond to the permitUnload message.

The alternative here would be to pass in {skipPermitUnload: true}.

> ::: browser/devtools/sourceeditor/test/helper_codemirror_runner.js
> @@ +27,5 @@
> >      '};' +
> >      'function check() { ' +
> >      '  var doc = content.document; var out = doc.getElementById("status"); ' +
> >      '  if (!out || !out.classList.contains("done")) { return setTimeout(check, 100); }' +
> > +    '  sendAsyncMessage("done", { failed: content.wrappedJSObject.failed });' +
> 
> Is this an unrelated whitespace change?

No, I changed sendSyncMessage to sendAsyncMessage. The reason is essentially the same as above: the "done" listener in the parent calls removeTab.

There doesn't appear to be any reason to use a sync message in the test.

> ::: services/fxaccounts/FxAccountsOAuthClient.jsm
> @@ +195,5 @@
> >                };
> >              }
> >  
> > +            // if the message asked to close the tab
> > +            if (data.closeWindow && target) {
> 
> Moving this code doesn't seem like a test change... what's going on here, if
> you recall?

Yes, it's not. The problem is the ordering of the closeWindow code (which calls removeTab) in relation to the onComplete callback. The unpatched code calls onComplete and then does removeTab. Lots of the onComplete callbacks in tests call something that starts the next test. I think the next test won't actually start until the next turn of the event loop, so that works okay when removeTab is sync. When it runs the event loop, we start the next test while we're still in the middle of removeTab, and all sorts of chaos results.

The patch reverses the order of onComplete/tearDown and the removeTab calls. If you want, I can get a separate review on this from whoever owns this code.

> ::: toolkit/mozapps/extensions/test/xpinstall/head.js
> @@ +164,5 @@
> >      this.installsCompletedCallback = null;
> >      this.runningInstalls = null;
> >  
> >      if (callback)
> > +      executeSoon(() => callback(count));
> 
> This looks evil. What's going on here? :-)

Basically the same as above. I'm having trouble remembering the exact path, but it involves this sync message:
http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/amInstallTrigger.js#83
Somehow we trigger removeTab from there, which deadlocks in the same way.

Comment 14

4 years ago
Comment on attachment 8643400 [details] [diff] [review]
functional changes

Clearing review while waiting for new patch(es). :-)
Attachment #8643400 - Flags: review?(gijskruitbosch+bugs)

Updated

4 years ago
Duplicate of this bug: 1199645
Posted patch functional changes v2 (obsolete) — Splinter Review
Sorry for the huge delay here. I ended up spending all my time on WebExtensions for a long time.

I haven't really changed too much here I don't think. When I rebased, I realized you already fixed the issue where we would hang when switching tabs and then quitting if a PermitUnload dialog was displayed. The change was in bug 1191973.
Attachment #8643400 - Attachment is obsolete: true
Attachment #8643405 - Attachment is obsolete: true
Attachment #8666248 - Flags: review?(gijskruitbosch+bugs)
This test is just broken as far as I can tell. It uses gBrowser.currentURI without waiting for the page to load, so it gets about:blank. It mostly seems to work anyway, but with the changes here, we can get a load notification in the middle of removing a tab, so currentURI will change unexpectedly.
Attachment #8666249 - Flags: review?(gijskruitbosch+bugs)
Posted patch tabview fixesSplinter Review
This patch fixes some tabview-related tests. What I found is that TabView listens for TabShow and TabHide events and initializes itself lazily at that time. During initialization, it resets all the .hidden properties on tabs based on its own model of visibility. For tests that try to manipulate .hidden property themselves, this is a problem: if tabview is initialized in the middle of the test, things go wrong. Most of the time that doesn't happen, but now the initialization can finish in the middle of removeTab.

To solve the problem I wait for tabview to initialize before each of these tests.
Attachment #8666252 - Flags: review?(gijskruitbosch+bugs)
Here are the remaining test changes. This is basically the same patch as I posted before, I think. The explanations for it still hold.
Attachment #8666256 - Flags: review?(gijskruitbosch+bugs)

Comment 20

4 years ago
Comment on attachment 8666248 [details] [diff] [review]
functional changes v2

Review of attachment 8666248 [details] [diff] [review]:
-----------------------------------------------------------------

Was there a trypush for this set of patches? :-)

::: browser/base/content/browser.js
@@ +6441,5 @@
> +{
> +  // Avoid redundant calls to canClose from showing multiple
> +  // PermitUnload dialogs.
> +  if (window.skipNextCanClose) {
> +    window.skipNextCanClose = false;

Are we never going to have >2 calls here?

::: browser/base/content/tabbrowser.xml
@@ +499,5 @@
> +                  stack.removeChild(aPrompt);
> +                } catch (e if e.name == "NotFoundError") {
> +                  // Prompt doesn't exist (maybe it was already removed).
> +                  return;
> +                }

Doesn't your comment on this bug mean this code is now obsolete, or did I misunderstand?

Also, if we do still need this, rather than a try... catch, you could actually check if aPrompt was still in the DOM?

@@ +4039,5 @@
>              case "DOMWindowClose": {
>                if (this.tabs.length == 1) {
> +                // We already did PermitUnload in the content process
> +                // for this tab (the only one in the window). So we don't
> +                // need to do it again for any tabs.

Is this true even in the non-e10s case?

@@ +4048,5 @@
>  
>                let tab = this.getTabForBrowser(browser);
>                if (tab) {
> +                // Skip running PermitUnload since it already happened in
> +                // the content process.

Ditto.

::: toolkit/content/widgets/browser.xml
@@ +1073,5 @@
>              window.addEventListener("keypress", this, true);
>              window.addEventListener("keyup", this, true);
>           ]]>
> +        </body>
> +      </method>

What's going on here?

::: toolkit/content/widgets/remote-browser.xml
@@ +266,5 @@
> +          let permitUnload;
> +          let id = this._permitUnloadId++;
> +          let msgListener, observer;
> +          let mm = this.messageManager;
> +          function done(result) {

Nit: You could move this below and just declare & define the msgListener in one statement, ditto for the observer - the done() function will get hoisted and there shouldn't be any TDZ issues then, I think?

@@ +294,5 @@
> +          mm.sendAsyncMessage("PermitUnload", {id});
> +          mm.addMessageListener("PermitUnload", msgListener);
> +          Services.obs.addObserver(observer, "message-manager-close", false);
> +
> +          let tm = Cc["@mozilla.org/thread-manager;1"].getService(Ci.nsIThreadManager);

Nit: not necessary, can use Services.tm (we're using Services in this file so I presume we can assume it's there)
Attachment #8666248 - Flags: review?(gijskruitbosch+bugs)

Comment 21

4 years ago
Comment on attachment 8666249 [details] [diff] [review]
fix browser_relatedTabs.js

Review of attachment 8666249 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/test/general/browser_relatedTabs.js
@@ +1,5 @@
>  /* This Source Code Form is subject to the terms of the Mozilla Public
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +add_task(function* test() {

I... I'm actually wondering if this still creates a global "test" function (I think so?) and how the test framework decides which (or both?) to use...

maybe just remove the name? :-)

@@ +17,5 @@
> +    return new Promise(resolve => {
> +      tab.linkedBrowser.addEventListener("load", function listener(e) {
> +        tab.linkedBrowser.removeEventListener("load", listener, true);
> +        resolve();
> +      }, true);

Please use BrowserTestUtils' openNewForegroundTab instead - you can pass a function for what you want to open, so you can make the addTab call with the argument.
Attachment #8666249 - Flags: review?(gijskruitbosch+bugs) → review+

Comment 22

4 years ago
Comment on attachment 8666252 [details] [diff] [review]
tabview fixes

Review of attachment 8666252 [details] [diff] [review]:
-----------------------------------------------------------------

r+ either way but see below.

::: browser/base/content/test/general/browser_visibleTabs.js
@@ +1,5 @@
>  /* This Source Code Form is subject to the terms of the Mozilla Public
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +add_task(function* test() {

Same concern as before about the globally scoped "test" function. Was there a reason to not use the same runTest thing you did in the other test? :-)

(same question for browser_visibleTabs_{contextMenu.js,tabPreview.js})
Attachment #8666252 - Flags: review?(gijskruitbosch+bugs) → review+

Comment 23

4 years ago
Comment on attachment 8666256 [details] [diff] [review]
test changes v2

Review of attachment 8666256 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/test/general/browser.ini
@@ -147,5 @@
>  [browser_autocomplete_tag_star_visibility.js]
>  [browser_backButtonFitts.js]
>  skip-if = os == "mac" # The Fitt's Law back button is not supported on OS X
>  [browser_beforeunload_duplicate_dialogs.js]
> -skip-if = e10s # bug 967873 means permitUnload doesn't work in e10s mode

Can you also enable and check:

https://dxr.mozilla.org/mozilla-central/source/docshell/test/browser/browser.ini#103

and maybe look at and/or file a followup bug for:

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/startup/tests/browser/browser.ini#4

::: browser/base/content/test/general/browser_bug521216.js
@@ +19,5 @@
>      is(actual.toString(), expected.toString(),
>         "got events and progress notifications in expected order");
> +
> +    let thisTab = tab;
> +    executeSoon(() => {

Meh. Nit: function(tab) {
  ...
}.bind(null, tab);

instead of the let thisTab.

::: browser/base/content/test/general/browser_bug676619.js
@@ +1,2 @@
>  function test () {
> +  requestLongerTimeout(2);

How confident are we that this is working and we're not just adding an intermittent timeout or whatever with this patch?

::: dom/workers/test/serviceworkers/browser_force_refresh.js
@@ +31,5 @@
>          if (cachedLoad) {
> +          removeEventListener('base-load', eventHandler, true);
> +          removeEventListener('base-register', eventHandler, true);
> +          removeEventListener('base-sw-ready', eventHandler, true);
> +          removeEventListener('cached-load', eventHandler, true);

Deep sigh. Feel free to land this specific bit today because srsly.

::: services/fxaccounts/FxAccountsOAuthClient.jsm
@@ +211,5 @@
> +                }
> +              } else {
> +                log.debug("OAuth flow failed to close the tab. TabBrowser not found.");
> +              }
> +            }

So can you move these hunks to a non-test patch? :-)
Attachment #8666256 - Flags: review?(gijskruitbosch+bugs) → review+

Comment 24

4 years ago
(also, I'm sorry, but I probably just bitrotted you a little bit with bug 636905 just now :-( )
(In reply to :Gijs Kruitbosch from comment #20)
> Was there a trypush for this set of patches? :-)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c5cba1cd46c6

I'll push another one once I'm done making these changes.

> ::: browser/base/content/browser.js
> @@ +6441,5 @@
> > +{
> > +  // Avoid redundant calls to canClose from showing multiple
> > +  // PermitUnload dialogs.
> > +  if (window.skipNextCanClose) {
> > +    window.skipNextCanClose = false;
> 
> Are we never going to have >2 calls here?

I tried to come up with a concise proof that we won't, but I can't. There are too many paths that converge here.

The reason we've had problems in the past is that window.close() would trigger its own PermitUnload dialog. If the user still wanted to close, and closing the tab also caused us to close the XUL window, then we would do another PermitUnload call. That won't happen anymore because we set skipNextCanClose when closing the XUL window.

The reason I chose this approach over the "flag reset" approach in the current code is that it ensures that all the state we need is in the parent process. If we tried to use a reset function in e10s, we would need a lot of messaging to set and reset the flag in all the content processes. That seems somewhat more error-prone to me than skipNextCanClose, which is just in the parent.

> ::: browser/base/content/tabbrowser.xml
> @@ +499,5 @@
> > +                  stack.removeChild(aPrompt);
> > +                } catch (e if e.name == "NotFoundError") {
> > +                  // Prompt doesn't exist (maybe it was already removed).
> > +                  return;
> > +                }
> 
> Doesn't your comment on this bug mean this code is now obsolete, or did I
> misunderstand?

Yes, forgot about this. I've removed it.

> @@ +4039,5 @@
> >              case "DOMWindowClose": {
> >                if (this.tabs.length == 1) {
> > +                // We already did PermitUnload in the content process
> > +                // for this tab (the only one in the window). So we don't
> > +                // need to do it again for any tabs.
> 
> Is this true even in the non-e10s case?

This path is only taken in the e10s case. In non-e10s, we run the DOMWindowClose event handler in tabbrowser.xml. I've made almost identical modifications to that code (which you should see in the next hunk).

In the non-e10s case, nsGlobalWindow::CanClose calls PermitUnload. And it happens shortly before the DOMWindowClose event is fired in nsGlobalWindow::CloseOuter. So the comment is true in non-e10s.

> 
> @@ +4048,5 @@
> >  
> >                let tab = this.getTabForBrowser(browser);
> >                if (tab) {
> > +                // Skip running PermitUnload since it already happened in
> > +                // the content process.
> 
> Ditto.

Same thing here.

> 
> ::: toolkit/content/widgets/browser.xml
> @@ +1073,5 @@
> >              window.addEventListener("keypress", this, true);
> >              window.addEventListener("keyup", this, true);
> >           ]]>
> > +        </body>
> > +      </method>
> 
> What's going on here?

I noticed the indentation was wrong. Since I was changing this file, I fixed it.

> ::: browser/base/content/test/general/browser_bug676619.js
> @@ +1,2 @@
> >  function test () {
> > +  requestLongerTimeout(2);
> 
> How confident are we that this is working and we're not just adding an intermittent timeout or
> whatever with this patch?

We already have an intermittent failure bug filed on it. In all the reports I looked at, the test finished in about 48 seconds. The current timeout is 45. So I think this is fine. The test opens a lot of dialogs, which for some reason are said to be slow on tinderbox, so I guess that's the explanation.

I've made the other changes you requested. I'm also posting some new patches to enable the other tests you pointed out.
Updated main patch.
Attachment #8666248 - Attachment is obsolete: true
Attachment #8667613 - Flags: review?(gijskruitbosch+bugs)
Found this bug while trying to enable browser_onbeforeunload_navigation.js for e10s. Basically, if onPromptClose is called from within tabPrompt.appendPrompt (which happens because the test intercepts an observer that fires when the prompt is added) then we don't properly remove the prompt (since newPrompt hasn't been set yet).

I thought about running onPromptClose in a separate turn of the event loop, but this seems safer.
Attachment #8667614 - Flags: review?(gijskruitbosch+bugs)
This test was doing all sorts of unfortunate stuff with CPOWs in e10s. The main problem is that the onbeforeunload handler was actually set to an object in the chrome process. That caused all sorts of problems with the nested event loop.

I moved the beforeunload handler to the content process. There are still a bunch of CPOWs here, but they're pretty standard ones I guess.
Attachment #8667617 - Flags: review?(gijskruitbosch+bugs)
This patch fixes up the tests in toolkit/components/startup/tests/browser. The problem here is that DOMWillOpenModalDialog fires in the content process and the chrome process. The content process event is actually forwarded to the chrome process using our add-on shims. However, that's not the one we want for the test (since the dialog hasn't actually appeared yet). So I did some magic to ignore the content process event.
Attachment #8667619 - Flags: review?(gijskruitbosch+bugs)

Comment 30

4 years ago
Comment on attachment 8667613 [details] [diff] [review]
functional changes v3

Review of attachment 8667613 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Bill McCloskey (:billm) from comment #25)
> (In reply to :Gijs Kruitbosch from comment #20)
> > Was there a trypush for this set of patches? :-)
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=c5cba1cd46c6
> 
> I'll push another one once I'm done making these changes.
> 
> > ::: browser/base/content/browser.js
> > @@ +6441,5 @@
> > > +{
> > > +  // Avoid redundant calls to canClose from showing multiple
> > > +  // PermitUnload dialogs.
> > > +  if (window.skipNextCanClose) {
> > > +    window.skipNextCanClose = false;
> > 
> > Are we never going to have >2 calls here?
> 
> I tried to come up with a concise proof that we won't, but I can't. There
> are too many paths that converge here.
> 
> The reason we've had problems in the past is that window.close() would
> trigger its own PermitUnload dialog. If the user still wanted to close, and
> closing the tab also caused us to close the XUL window, then we would do
> another PermitUnload call. That won't happen anymore because we set
> skipNextCanClose when closing the XUL window.
> 
> The reason I chose this approach over the "flag reset" approach in the
> current code is that it ensures that all the state we need is in the parent
> process. If we tried to use a reset function in e10s, we would need a lot of
> messaging to set and reset the flag in all the content processes. That seems
> somewhat more error-prone to me than skipNextCanClose, which is just in the
> parent.

Right, sorry, my question wasn't to question the approach in general, but more the fact that we're resetting as soon as we double-hit this path, and I was wondering if that was safe. I just combed through the sites in this patch that set this bool to true, and actually, there is only one (besides this one) that explicitly sets it to false.

To put it more proactively: can't we just not reset this to false, and always leave it at true once it gets set (except the reset case I alluded to in the previous para)? Why do we need to reset it at all? Seems to me like we don't need to check it again once we have checked it - assuming the answer was "yes, we can unload" at that point.

r=me assuming this gets addressed either way (either you agree, we remove the falsy assignment and it passes on try, or you disagree and tell me why, and we move on - I don't think it makes sense to overrotate on this)

::: services/fxaccounts/FxAccountsOAuthClient.jsm
@@ +203,5 @@
> +              if (tabbrowser) {
> +                let tab = tabbrowser.getTabForBrowser(target);
> +
> +                if (tab) {
> +                  tabbrowser.removeTab(tab, {skipPermitUnload: false});

Nit: The options bit here seems unnecessary? Or am I missing a trick?
Attachment #8667613 - Flags: review?(gijskruitbosch+bugs) → review+

Comment 31

4 years ago
Comment on attachment 8667614 [details] [diff] [review]
RemotePrompt fix

Review of attachment 8667614 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/modules/RemotePrompt.jsm
@@ +71,5 @@
>        args.promptActive = true;
>  
>        newPrompt = tabPrompt.appendPrompt(args, onPromptClose);
>  
> +      if (needRemove) {

Can you add a comment above this code explaining why we need to do this?
Attachment #8667614 - Flags: review?(gijskruitbosch+bugs) → review+

Comment 32

4 years ago
Comment on attachment 8667617 [details] [diff] [review]
browser_onbeforeunload_navigation.js fix

Review of attachment 8667617 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Bill McCloskey (:billm) from comment #28)
> Created attachment 8667617 [details] [diff] [review]
> browser_onbeforeunload_navigation.js fix
> 
> This test was doing all sorts of unfortunate stuff with CPOWs in e10s. The
> main problem is that the onbeforeunload handler was actually set to an
> object in the chrome process. That caused all sorts of problems with the
> nested event loop.
> 
> I moved the beforeunload handler to the content process. There are still a
> bunch of CPOWs here, but they're pretty standard ones I guess.

This patch seems to miss the enabling of the test in browser.ini (for e10s) ?

r=me with that addressed. :-)

::: docshell/test/browser/browser_onbeforeunload_navigation.js
@@ +141,2 @@
>    loadStarted = false;
> +  testTab.linkedBrowser.loadURI(TARGETED_PAGE);

You could avoid more of these CPOWs by doing a bunch of this in a ContentTask, if you wanted, but this wfm, I guess.
Attachment #8667617 - Flags: review?(gijskruitbosch+bugs) → review+

Comment 33

4 years ago
Comment on attachment 8667619 [details] [diff] [review]
enable toolkit/components/startup tests

Review of attachment 8667619 [details] [diff] [review]:
-----------------------------------------------------------------

This one too doesn't seem to have browser.ini ?
Attachment #8667619 - Flags: review?(gijskruitbosch+bugs) → review+
> To put it more proactively: can't we just not reset this to false, and always leave it at
> true once it gets set (except the reset case I alluded to in the previous para)? Why do we
> need to reset it at all? Seems to me like we don't need to check it again once we have
> checked it - assuming the answer was "yes, we can unload" at that point.

Yes, I've changed it so we do that.

Comment 38

4 years ago
Why did this get backed out?
Flags: needinfo?(wmccloskey)
I landed some patches last week (bug 1191145 and bug 1191143) that seem to be implicated in a topcrash (bug 1210821). When I tried to back them out, I got some test failures unless I backed this out too. Something about the nested event loop here must depend on bug 1191143. I'll investigate soon.
Status: RESOLVED → REOPENED
Flags: needinfo?(wmccloskey)
Resolution: FIXED → ---
The reason this stuff depended on the CPOW patches that got backed out is that the following could happen:

event fires in content process (usually "load")
we send a sync CPOW message to the parent notifying it
parent calls removeTab while processing the event
removeTab spins a nested event loop and asks the child to run beforeunload
child doesn't respond because it's blocked in its sync CPOW message
parent times out and the removeTab happens anyway, but often the test times out first

The CPOW patches made this a little nicer and avoided the test failures, but I decided I should just fix the tests to avoid using shims. Bug 1219504 does that.

However, browser_addonShims.js also calls removeTab from an event handler. I don't want to remove shims there since we're trying to test them. So instead this patch moves the removeTab to a separate turn of the event loop.

This could be a problem if an add-on calls removeTab from an event handler that's triggered in the content process. Hopefully that won't happen. If it does, the worst that will happen is a delay until removeTab times out.
Attachment #8680394 - Flags: review?(mconley)
Attachment #8680394 - Flags: review?(mconley) → review+

Updated

4 years ago
Duplicate of this bug: 1220109

Updated

4 years ago
Depends on: 1221050
Somehow I missed this test before. It may have been recently enabled on e10s.
Attachment #8683957 - Flags: review?(gijskruitbosch+bugs)

Comment 45

4 years ago
Comment on attachment 8683957 [details] [diff] [review]
fix browser_bug902350.js

Review of attachment 8683957 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

::: dom/base/test/browser_bug902350.js
@@ +37,5 @@
>  }
>  
>  // Need to capture 2 loads, one for the main page and one for the iframe
>  function MixedTest1A() {
> +  dump("XYZ\n");

Nit: debugging code :-)
Attachment #8683957 - Flags: review?(gijskruitbosch+bugs) → review+
Seeing more failures. Had to back this out again.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b50bd85e4a3

It looks like a bunch of tests got enabled in e10s recently. Very frustrating.
This patch fixes all the newly enabled tests. Gijs, if you would like me to split it up and spread around the reviews I'm happy to do that.

There's one tricky issue here. The tests in toolkit/mozapps/extensions/test/browser/ were doing a very odd check:

-      gBrowser.addEventListener("load", function(event) {
-        if (!(event.target instanceof Document) ||
-            event.target.location.href == "about:blank")
-          return;
-        gBrowser.removeEventListener("load", arguments.callee, true);

The instanceof check came from bug 630462. It fixes some sort of Seamonkey intermittent orange. Neil, can you remember why this was necessary? I'd like to avoid it if possible.
Flags: needinfo?(neil)
Attachment #8685090 - Flags: review?(gijskruitbosch+bugs)

Comment 50

4 years ago
Comment on attachment 8685090 [details] [diff] [review]
more test fixes

Review of attachment 8685090 [details] [diff] [review]:
-----------------------------------------------------------------

::: docshell/test/browser/browser_bug234628-1.js
@@ +8,2 @@
>  
> +  is(content.document.getElementsByTagName("iframe")[0].contentDocument.documentElement.textContent.indexOf('\u20AC'), 85, "Child doc should be windows-1252 initially");

Here and others, I believe content.frames[0] would also work, and would be shorter and slightly clearer, maybe? :-)

::: docshell/test/browser/browser_bug92473.js
@@ +46,5 @@
>                extractJarToTmp(jar) :
>                getChromeDir(getResolvedURI(gTestPath));
>    var rootDir = Services.io.newFileURI(dir).spec;
>  
> +  waitForExplicitFinish();

Nit: Can just keep this at the top of test(), right?

::: docshell/test/browser/head.js
@@ +72,5 @@
> + */
> +function runCharsetTest(url, check1, charset, check2) {
> +  waitForExplicitFinish();
> +  gBrowser.selectedTab = gBrowser.addTab(url);
> +  BrowserTestUtils.browserLoaded(gBrowser.selectedBrowser).then(afterOpen);

BrowserTestUtils.openNewForegroundTab(gBrowser, url, true).then(afterOpen) ?
Attachment #8685090 - Flags: review?(gijskruitbosch+bugs) → review+
(In reply to Bill McCloskey from comment #49)
> The tests in toolkit/mozapps/extensions/test/browser/ were doing a very odd check:
> 
> -      gBrowser.addEventListener("load", function(event) {
> -        if (!(event.target instanceof Document) ||
> -            event.target.location.href == "about:blank")
> -          return;
> -        gBrowser.removeEventListener("load", arguments.callee, true);
> 
> The instanceof check came from bug 630462. It fixes some sort of Seamonkey
> intermittent orange. Neil, can you remember why this was necessary? I'd like
> to avoid it if possible.

Yes. You're adding a capturing load event listener to the tabbrowser itself. This means that it listens to anything that loads, and not actually the load you're trying to listen to. In particular, SeaMonkey's tabbrowser still loads its favicons. This check is needed to filter out those loads.
Flags: needinfo?(neil)
(In reply to neil@parkwaycc.co.uk from comment #51)
> Yes. You're adding a capturing load event listener to the tabbrowser itself.
> This means that it listens to anything that loads, and not actually the load
> you're trying to listen to. In particular, SeaMonkey's tabbrowser still
> loads its favicons. This check is needed to filter out those loads.

OK, it sounds like the change I've made is safe then. The listener is no longer on the tabbrowser. Now it's in a frame script, which presumably won't see favicon loads.

Comment 56

4 years ago
This is FIXED now, right?
Flags: needinfo?(wmccloskey)
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Flags: needinfo?(wmccloskey)
Resolution: --- → FIXED

Updated

4 years ago
Depends on: 1234957

Comment 57

4 years ago
Hrmpf, lying flags. This only got fixed for 45, because of the backout and later relanding.
Target Milestone: Firefox 44 → Firefox 45
I can tell you that since at least mid December 2015 Nightlys, I can call .removeCurrentTab() from my add-on's chrome process and it doesn't do anything except hang the browser for a few seconds; no error, no exception, no closed tab, nothing but delay. This only seems to apply to e10s tabs. Basically though, my addon is broken by this behavior, it is definitely not fixed in FF46.

Before this, it would work sometimes for a few weeks and then stop working for a few weeks on Nightly.Could someone verify that this bug is responsible? so I don't have to file a duplicate?
Flags: needinfo?(wmccloskey)

Comment 59

4 years ago
(In reply to snuz_2@yahoo.com from comment #58)
> I can tell you that since at least mid December 2015 Nightlys, I can call
> .removeCurrentTab() from my add-on's chrome process and it doesn't do
> anything except hang the browser for a few seconds; no error, no exception,
> no closed tab, nothing but delay. This only seems to apply to e10s tabs.
> Basically though, my addon is broken by this behavior, it is definitely not
> fixed in FF46.
> 
> Before this, it would work sometimes for a few weeks and then stop working
> for a few weeks on Nightly.Could someone verify that this bug is
> responsible? so I don't have to file a duplicate?

No, because you didn't tell us which add-on this was...

Please file a new bug with exact STR and details about your add-on, and CC Bill and me.

It would also be helpful if you determined an exact regression range using mozregression ( http://mozilla.github.io/mozregression/ ) which you can use with an existing profile (switch for that on the commandline) to use a profile that has your add-on installed.
Flags: needinfo?(wmccloskey)
Depends on: 1250088
Depends on: 1312279
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.