Open Bug 1598832 Opened 5 years ago Updated 2 years ago

Remote Agent has to shutdown safely while browser is shutting down [Unable to stop listener NS_ERROR_UNEXPECTED]

Categories

(Remote Protocol :: Agent, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: impossibus, Unassigned)

References

(Regression)

Details

(Keywords: regression)

I see an error now when I run the browser with ./mach run --setpref "remote.enabled=true" --setpref "remote.log.level=Debug" --setpref "browser.dom.window.dump.enabled=true" --remote-debugger and then Quit:

1574479057499	RemoteAgent	ERROR	unable to stop listener: [Exception... "Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIWindowMediator.getEnumerator]"  nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)"  location: "JS frame :: chrome://remote/content/targets/TabObserver.jsm :: stop :: line 111"  data: no] Stack trace: stop()@TabObserver.jsm:111
unwatchForTabs()@Targets.jsm:82
unwatchForTargets()@Targets.jsm:41
destructor()@Targets.jsm:121
close()@RemoteAgent.jsm:109

My first guess is that this is related to Bug 1543115, but I haven't checked.

Failure here seems to come from inside the WindowMediator, and is related to the ready state of the window:
https://searchfox.org/mozilla-central/rev/a78233c11a6baf2c308fbed17eb16c6e57b6a2ac/xpfe/appshell/nsWindowMediator.cpp#164

I assume that the window isn't ready yet, means we run code too early.

Maja, could you check with the revision before Andreas' patch?

Flags: needinfo?(mjzffr)
Priority: -- → P3

How do you quit the browser? I can’t reproduce this with SIGINT.

I think it likely that bug 1543115 surfaced this
because nsIRemoteAgent.close() is called unconditionally
when the RemoteAgent Rust API is dropped:

impl Drop for RemoteAgent {
    fn drop(&mut self) {
        // it should always be safe to call nsIRemoteAgent.close()
        if let Err(e) = self.close() {
            error!("unable to close remote agent listener: {}", e);
        }
    }
}

And becuase we now log errors that happen inside nsIRemoteAgent.close():

  async close() {
    try {
      // if called early at startup, preferences may not be available
      try {
        Preferences.reset(Object.keys(RecommendedPreferences));
      } catch (e) {}

      // destroy targets before stopping server,
      // otherwise the HTTP will fail to stop
      if (this.targets) {
        this.targets.destructor();
      }

      if (this.listening) {
        await this.server.stop();
      }
    } catch (e) {
      // this function must never fail
      log.error("unable to stop listener", e);
    } finally {
      this.server = null;
      this.targets = null;
    }
  }

The questions then are:

  1. Did we not call RemoteAgent#close() previously at the end of test runs?
  2. If we do, why is if (this.targets) true when called the second time from Rust?

(For what it’s worth, I don’t think --setpref "remote.log.level=Debug
or --setpref "browser.dom.window.dump.enabled=true"
are necessary to reproduce this.)

I don't see this with SIGINT either, only the in-browser Quit (so Alt+F4 or Cmd+Q). I'll check earlier revision later today.

I talked to whimboo about this earlier this morning and forgot to
update this bug…

The reason it’s not reproducible with SIGINT is because Firefox
doesn’t trap signals. This has been a known issue with Firefox for
a long time that geckodriver is also affected by, see bug 336193.
This means we rely heavily on the kernel to deal with freeing memory
and other resources, which would explain why Drop for RemoteAgent
isn’t being called and consequently why nsIRemoteAgent.close()
isn’t hit on SIGINT.

Command + Q is different because it’s a keyboard shortcut registered
and handled by Firefox, which means it can trigger a safer shutdown
code path. This means that when we clean up browsers/tabs/windows
and unload the JS stack, it is not always in a predictable order.

We arrived at two ideas:

  • whimboo noted that we don’t use weak references for tabs/windows
    in TabObserver and that we probably should

  • I noted that we could also have a remote-shutdown-requested
    notification that would run nsIRemoteAgent.close() explicitly as
    an async shutdown step which would hopefully trigger when Command + Q
    and/or nsIAppStartup.quit() is called

Flags: needinfo?(mjzffr)

Maja, do you still see that failure? I assume that my refactoring on bug 1634029 fixed it.

Flags: needinfo?(mjzffr)

No, still same error. Aren't you able to reproduce it on your mac?

1590410064943	RemoteAgent	ERROR	unable to stop listener: [Exception... "Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIWindowMediator.getEnumerator]"  nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)"  location: "JS frame :: chrome://remote/content/observers/TargetObserver.jsm :: stop :: line 59"  data: no] Stack trace: stop()@TargetObserver.jsm:59
unwatchForTabs()@TargetList.jsm:82
unwatchForTargets()@TargetList.jsm:41
destructor()@TargetList.jsm:121
close()@RemoteAgent.jsm:115
Flags: needinfo?(mjzffr)

Not yet, no. Hm, so it actually fails in getting the window enumerator via Services.wm.getEnumerator(). The unexpected error is thrown only if the mReady property is not true:

https://searchfox.org/mozilla-central/rev/501eb4718d73870892d28f31a99b46f4783efaa0/xpfe/appshell/nsWindowMediator.cpp#163

So I assume we hit the problem because the WindowEnumerator has already unregistered all open windows due to the xpcom-shutdown observer notification:

https://searchfox.org/mozilla-central/rev/501eb4718d73870892d28f31a99b46f4783efaa0/xpfe/appshell/nsWindowMediator.cpp#722

I noticed that we do not have any observer notifications setup for a shutdown that got triggered outside of the Remote Agent while it is still running.

We would need code that runs as blocker for xpcom-shutdown or even earlier so we won't hit this error.

Summary: Unable to stop listener NS_ERROR_UNEXPECTED in TabObserver.jsm → Remote Agent has to shutdown safely while browser is shutting down [Unable to stop listener NS_ERROR_UNEXPECTED]

fyi, I see this on both my machines

Strange. I see it myself again now.

Has Regression Range: --- → yes
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.