Closed Bug 1125377 Opened 5 years ago Closed 5 years ago

Marionette fails during getWindowHandles with e10s turned on when accessing a CPOW

Categories

(Testing :: Marionette, defect)

x86_64
Linux
defect
Not set

Tracking

(e10s+, firefox38 fixed)

RESOLVED FIXED
mozilla38
Tracking Status
e10s + ---
firefox38 --- fixed

People

(Reporter: chmanchester, Assigned: chmanchester)

References

(Blocks 1 open bug)

Details

(Keywords: pi-marionette-server)

Attachments

(3 files, 1 obsolete file)

This happens intermittently on mn-e10s jobs (bug 1124966), but constantly in the test suite :erahm is working on that ports the Are We Slim Yet tests to use marionette. He provided a test case that reproduces this that I'll upload shortly. Basically, it opens 15 tabs, forces a gc, closes all but one tab, and repeats the process. The second time through the test fails when figuring out how many tabs to close based on marionette.window_handles. In marionette-server.js, the failure occurs when attempting to retrieve a window ID from a CPOW (we check whether the CPOW is null, and then QI it. If fails in QI, not the null check). I can catch it in the browser toolbox debugger, but inspecting the window there crashes the tab.

I'm going to look into this some more, but I'm pretty over-subscribed for the next two weeks. We may need help from the e10s team figuring this one out, and we may need to stop keeping track of listener scripts in marionette via window ids (something we might need to do if CPOWs go away anyways).
This is the test. Run with --e10s to reproduce.
Assignee: nobody → cmanchester
Status: NEW → ASSIGNED
Summary: Marionette fails during get_window_handles when accessing a CPOW → Marionette fails during get_window_handles with e10s turned on when accessing a CPOW
From a quick test, translating use of a CPOW to use messages (a new command or small script to load into each browser we need to identify) effectively works around this.
Blocks: 1125484
Summary: Marionette fails during get_window_handles with e10s turned on when accessing a CPOW → Marionette fails during getWindowHandles with e10s turned on when accessing a CPOW
I see the same now for my work on handling chrome windows on bug 1123401. The failure might be a bit different, but it also happens given that the latest nightly has e10s turned on, and we do not disable it anymore by default.

https://travis-ci.org/mozilla/firefox-ui-tests/builds/48325279

File "/home/travis/build/mozilla/firefox-ui-tests/client/marionette/marionette.py", line 1293, in execute_script    filename=os.path.basename(frame[0]))
File "/home/travis/build/mozilla/firefox-ui-tests/client/marionette/decorators.py", line 36, in _    return func(*args, **kwargs)
File "/home/travis/build/mozilla/firefox-ui-tests/client/marionette/marionette.py", line 634, in _send_message    self._handle_error(response)
File "/home/travis/build/mozilla/firefox-ui-tests/client/marionette/marionette.py", line 689, in _handle_error    raise errors.JavascriptException(message=message, status=status, stacktrace=stacktrace)JavascriptException: JavascriptException: NS_ERROR_FAILURE: stacktrace:	execute_script @windows.py, line 284	inline javascript, line 1	src: "              var win = window.open();"
Blocks: 1123401
The simplest command to execute for reproduction for me is:

    def test_failure(self):
        self.marionette.execute_script(""" window.open(); """)

Then it fails with the above mentioned error.
Henrik, this doesn't look like the same issue. I think this issue could be bug 1047603.
No longer blocks: 1123401
Hi Mike. Because this works most of the time, and working around it by sending messages instead of using a CPOW works around it, it seems like this could be an issue with CPOWs and of interest to e10s. Alternatively, this code could be making a dangerous assumption about CPOWs that I need to fix.

I don't expect you're familiar with marionette, but the code that hits it is:

https://hg.mozilla.org/mozilla-central/file/6b29617a738f/testing/marionette/marionette-server.js#l1387

When run as a part of the test case attached to this bug (described in comment 0), calling QueryInterface throws undefined.  Do you have a sense of whether this sort of thing should be expected to reliably work?

Thanks in advance, and please let me know if there might be someone else on your team I should contact about this.
Flags: needinfo?(mconley)
When I say "throws undefined", I mean that when we are in a catch block starting with "catch (e) {...", "e" within that block is undefined.
It's possible that the contentWindow on the content side no longer exists, and you're attempting to access a "dead CPOW". Do you see warnings about dead CPOWs in the browser console when you hit this?
Flags: needinfo?(mconley) → needinfo?(cmanchester)
Thanks, I do have: "JavaScript error: , line 0: Error: operation not possible on dead CPOW" just before the exception. Is there a rule of thumb for safe handling of CPOWs that might be dead, or is it a question of individual callers knowing when the wrapped object must be alive (I'm not sure how to do that in this case, but I imagine it's possible)?
Flags: needinfo?(cmanchester) → needinfo?(mconley)
(In reply to Chris Manchester [:chmanchester] from comment #9)
> Thanks, I do have: "JavaScript error: , line 0: Error: operation not
> possible on dead CPOW" just before the exception. Is there a rule of thumb
> for safe handling of CPOWs that might be dead, or is it a question of
> individual callers knowing when the wrapped object must be alive (I'm not
> sure how to do that in this case, but I imagine it's possible)?

CPOWs are wonderful and dangerous things. :) While they do the job of transparently proxying synchronous calls to and from a remote object, they do not hold a reference to that object. That means if the object is garbage collected on the content side, the CPOW is now "dead" and cannot be operated upon.

In order for the CPOW to stay alive, something needs to be holding a reference to the thing you want to keep alive in the content process.

I believe you can check the dead-ness of a CPOW with Cu.isDeadWrapper, so that might help you avoid the exception. But if you really need those contentWindows to not be dead, you need something on the content side to hold a reference to them.
Flags: needinfo?(mconley)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #10)
> (In reply to Chris Manchester [:chmanchester] from comment #9)
> > Thanks, I do have: "JavaScript error: , line 0: Error: operation not
> > possible on dead CPOW" just before the exception. Is there a rule of thumb
> > for safe handling of CPOWs that might be dead, or is it a question of
> > individual callers knowing when the wrapped object must be alive (I'm not
> > sure how to do that in this case, but I imagine it's possible)?
> 
> CPOWs are wonderful and dangerous things. :) While they do the job of
> transparently proxying synchronous calls to and from a remote object, they
> do not hold a reference to that object. That means if the object is garbage
> collected on the content side, the CPOW is now "dead" and cannot be operated
> upon.
> 
> In order for the CPOW to stay alive, something needs to be holding a
> reference to the thing you want to keep alive in the content process.
> 
> I believe you can check the dead-ness of a CPOW with Cu.isDeadWrapper, so
> that might help you avoid the exception. But if you really need those
> contentWindows to not be dead, you need something on the content side to
> hold a reference to them.

Guarding with isDeadWrapper does the trick, thanks! In our case we're using the window IDs just to figure out what windows are around, so avoiding dead ones is probably enough.
Things aren't quite as easy to fix as I've said in comment 11, but I have a fix for this based on mapping browsers to outerWindowId's rather than always going through a CPOW (waiting on try results now). I noticed this same feature is being added in bug 1077168, so maybe we can piggyback on that when it lands.
/r/3877 - Bug 1125377 - Store window ids in a weak map keyed by browser objects to prevent attempts to access dead windows in marionette's getWindowHandles.
/r/3879 - Bug 1125377 - Marionette test case reproducing the issue.

Pull down these commits:

hg pull review -r d7bb50999c3557513af8a6501b65c41abe53c8ba
I've been working on this concurrently with bug 1096488 (it's sort of a flavor of the same problem, assuming we can always access an outerWindowId safely, and assuming the outerWindowId we're using as a listenerId never changes), but I think this should land first and can race with bug 1077168, which will afford a simplification but isn't strictly needed. The second commit above is erahm's test, it takes about 60 seconds to run but might be a worthy regression test.
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester

/r/3877 - Bug 1125377 - Store window ids in a weak map keyed by browser objects to prevent attempts to access dead windows in marionette's getWindowHandles.
/r/3879 - Bug 1125377 - Marionette test case reproducing the issue.

Pull down these commits:

hg pull review -r d7bb50999c3557513af8a6501b65c41abe53c8ba
Attachment #8564438 - Flags: review?(jgriffin)
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester

/r/3877 - Bug 1125377 - Store window ids in a weak map keyed by browser objects to prevent attempts to access dead windows in marionette's getWindowHandles.
/r/3879 - Bug 1125377 - Marionette test case reproducing the issue.

Pull down these commits:

hg pull review -r 0e1246544cb80f6b21b4bf79d08a85b19cac4b19
Attachment #8564438 - Flags: review?(jgriffin)
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester

https://reviewboard.mozilla.org/r/3875/#review3115

::: testing/marionette/marionette-server.js
(Diff revision 2)
>    register: function BO_register(uid, id) {

With this patch, this method no longer uses the 'id' parameter, so we should adjust it and all call sites.

Very nice, I didn't know about browser.permanentKey.
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester

/r/3877 - Bug 1125377 - Store window ids in a weak map keyed by browser objects to prevent attempts to access dead windows in marionette's getWindowHandles.
/r/3879 - Bug 1125377 - Marionette test case reproducing the issue.

Pull down these commits:

hg pull review -r 643f714fe9b114ca20121e25ae3a5eb8060019da
Attachment #8564438 - Flags: review?(jgriffin)
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester

https://reviewboard.mozilla.org/r/3875/#review3267

Ship It!
Attachment #8564438 - Flags: review?(jgriffin) → review+
https://hg.mozilla.org/mozilla-central/rev/23c081755b11
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
This took care of the issue in AWSY and the test in comment 1 but not the cause of the intermittents. The failure is at the point we go to access the contentWindowAsCPOW of a newly opened window so reopening. Bug 1077168 will allow us to get our window ID without using a CPOW at all, so that should take care of this. 

Short of that we could just try/catch around that point, which is unfortunate, but this really seems like a timing quirk that needs to be worked around.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1136378
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Attachment #8564438 - Attachment is obsolete: true
Attachment #8619216 - Flags: review+
Attachment #8619217 - Flags: review+
You need to log in before you can comment on or make changes to this bug.