Marionette breaks in e10s enabled Nightly

RESOLVED FIXED in mozilla36

Status

defect
P3
normal
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: pzhang, Assigned: chmanchester)

Tracking

(Blocks 1 bug, {pi-marionette-goal, pi-marionette-server})

unspecified
mozilla36
x86_64
Linux
Points:
---
Dependency tree / graph
Bug Flags:
firefox-backlog +

Firefox Tracking Flags

(e10sm6+)

Details

(Whiteboard: [qa?])

Attachments

(3 attachments, 2 obsolete attachments)

I tried to run marionette test in e10s enabled Nightly, first, i used --type=browser+oop --binary=$nightly/firefox, it turns out that the opened Nightly didn't enable e10n, then i tried to create a e10s enabled profile as a workaround:
  - Open Nightly with a new created profile, say e10s-test
  - Open about:config and set pref |browser.tabs.remote.autostart| to true
  - Close Nightly, and run tests with profile e10s-test:
    $ python runtests.py --type=browser+oop --binary=$nightly/firefox --profile=$profile_dir/e10s-test mytest.js

    or

    $ python runtests.py --type=browser --binary=$nightly/firefox --profile=$profile_dir/e10s-test mytest.js

the result is the Nightly is opened with e10s enabled, seconds later a new tab is opened, then nothing is happened, no logs, no errors.
BTW, does oop only works for b2g? 

Here is the failed log when running marionette test with --type=browser+oop:
  $ python runtests.py --type=browser+oop --binary=$nightly/firefox test_abc.js

starting httpd
running webserver on http://127.0.0.1:38150/
SUITE-START | Running 1 tests
SUITE-START | Running 0 tests
TEST-START | test_abc.js
running oop
TEST-END UNEXPECTED-ERROR | test_abc.js | expected PASS | JavascriptException: JavascriptException: TypeError: navigator.mozSettings is undefined
stacktrace:
	execute_async_script @marionette_test.py, line 469
	inline javascript, line 1
	src: "let setReq = navigator.mozSettings.createLock().set({'lockscreen.enabled': false});"
Traceback (most recent call last):
  File "/home/pzhang/work/b2g/B2G-FLAME/gecko/testing/marionette/client/marionette/marionette_test.py", line 171, in run
    testMethod()
  File "/home/pzhang/work/b2g/B2G-FLAME/gecko/testing/marionette/client/marionette/marionette_test.py", line 469, in runTest
    }""", script_timeout=60000)
  File "/home/pzhang/work/b2g/B2G-FLAME/gecko/testing/marionette/client/marionette/marionette.py", line 1167, in execute_async_script
    filename=os.path.basename(frame[0]))
  File "/home/pzhang/work/b2g/B2G-FLAME/gecko/testing/marionette/client/marionette/decorators.py", line 35, in _
    return func(*args, **kwargs)
  File "/home/pzhang/work/b2g/B2G-FLAME/gecko/testing/marionette/client/marionette/marionette.py", line 614, in _send_message
    self._handle_error(response)
  File "/home/pzhang/work/b2g/B2G-FLAME/gecko/testing/marionette/client/marionette/marionette.py", line 662, in _handle_error
    raise errors.JavascriptException(message=message, status=status, stacktrace=stacktrace) | took 39ms
SUITE-END | took 0s

SUMMARY
-------
passed: 0
failed: 1
todo: 0

FAILED TESTS
-------
test_abc.js
SUITE-END | took 0s
pzhang, thanks for testing. e10s (OOP) mostly works in Nightly desktop browser, but it sounds like there are some e10s problems with the Marionette tests.
Blocks: e10s-tests
tracking-e10s: --- → ?
Summary: Marionette breaks in e10s enabled Nightly. → Marionette breaks in e10s enabled Nightly
The --type=oop flag for Marionette is only for B2G.  Can you re-run the test with just --type=browser?
bumping to p1 for blocking e10s testing
Priority: -- → P1
Assignee: nobody → jmathies
Blocks: old-e10s-m2
(In reply to Jonathan Griffin (:jgriffin) from comment #3)
> The --type=oop flag for Marionette is only for B2G.  Can you re-run the test
> with just --type=browser?

It's ok with just --type=browser.
I think we can close this as WORKSFORME then; let me know if you disagree.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
(In reply to Jonathan Griffin (:jgriffin) from comment #6)
> I think we can close this as WORKSFORME then; let me know if you disagree.

Sorry, i meant it works with --type=browser only in e10s *disabled* Firefox. 

What i want to do is running tests in the e10s enabled Firefox, the only way i can find out is creating a e10s enabled profile, and run with it by using --profile, but marionette doesn't work, the test didn't run.
Ah, in that case, can you attach the gecko.log file that's produced when running the tests?
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Posted file gecko.log
This is the log i ran:
  python runtests.py --binary=$NIGHTLY/firefox --profile=$PROFILE/e10s-test --type=browser browser/base/content/test/
Attachment #8465933 - Attachment mime type: text/x-log → text/plain
The log shows that Marionette is dieing during shutdown.  The command-line you used doesn't actually run any tests; there are no Marionette tests under browser/base/content/test (those are mochitests).

You could try:
python runtests.py --binary=$NIGHTLY/firefox --profile=$PROFILE/e10s-test --type=browser $topsrcdir/testing/marionette/client/marionette/tests/unit-tests.ini

...although I suspect that won't work well either.
Status: REOPENED → ASSIGNED
Downgrading, Marionette isn't used in desktop tests for much of anything. Other test suites (mochitest-plain, mochitest-browser) take precedence over this.
Priority: P1 → P3
Status: ASSIGNED → NEW
Flags: firefox-backlog+
Flags: firefox-backlog+
Flags: firefox-backlog+
Move old M2 P3 bugs to M4 (because tracking-e10s=m5+ flag doesn't exist yet).
This gets marionette running with e10s turned on. Some of the tests fail, so this patch has a few things just to keep the harness from hanging when those do. I think the failures are related to known e10s issues: window.open, cookies, and text selection, but others would know more about this.

There's a lot of flakiness on the emulator tests on try, but I'm not sure whether my changes would have caused that. I'm a newcomer to this part of marionette so might be overlooking something obvious.

Try with e10s on:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=5f2eb7ec1de1

Try with e10s off:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=6ec7f06f5b1e
Attachment #8500751 - Flags: feedback?(jgriffin)
Assignee: jmathies → cmanchester
Status: NEW → ASSIGNED
Comment on attachment 8500751 [details] [diff] [review]
marionette support for e10s in desktop

Review of attachment 8500751 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great!  As far as the emulator tests go, they're known to be problematic.  test_getthreads.js fails very frequently.  I've retriggered a bunch more so we can see if the intermittents seem to be higher with https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=6ec7f06f5b1e or not; if they do, we can take a closer look.

::: testing/marionette/marionette-server.js
@@ +2847,5 @@
>      else if (newTab) {
>        this.tab = this.addTab(this.startPage);
>        //if we have a new tab, make it the selected tab
>        this.browser.selectedTab = this.tab;
> +      callback(win, newTab);

Do we not have the ability to detect when a new tab has loaded?  If not, we'll have a race condition without this listener, so we may want to remove the creation of a new tab (and just use the active tab instead), which is something David and I have already discussed.
Attachment #8500751 - Flags: feedback?(jgriffin) → feedback+
(In reply to Jonathan Griffin (:jgriffin) from comment #15)
> Comment on attachment 8500751 [details] [diff] [review]
> marionette support for e10s in desktop
> 
> Review of attachment 8500751 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks great!  As far as the emulator tests go, they're known to be
> problematic.  test_getthreads.js fails very frequently.  I've retriggered a
> bunch more so we can see if the intermittents seem to be higher with
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=6ec7f06f5b1e or
> not; if they do, we can take a closer look.
> 
> ::: testing/marionette/marionette-server.js
> @@ +2847,5 @@
> >      else if (newTab) {
> >        this.tab = this.addTab(this.startPage);
> >        //if we have a new tab, make it the selected tab
> >        this.browser.selectedTab = this.tab;
> > +      callback(win, newTab);
> 
> Do we not have the ability to detect when a new tab has loaded?  If not,
> we'll have a race condition without this listener, so we may want to remove
> the creation of a new tab (and just use the active tab instead), which is
> something David and I have already discussed.

I think I tracked this down to a case of bug 1012871 ("Events don't propagate automatically from child process to parent"). The answer is to move the event listener to the frame script, but trying that it doesn't seem to work reliably for windows opened by client scripts.
Maybe we should just remove the tab opening code from newSession.  David, what do you think?
Flags: needinfo?(dburns)
+100000000000 for removing tab opening code when calling newSession. I would love for it to use the currently opened tab
Flags: needinfo?(dburns)
Quick update: the problem with moving the listener to the frame script and registering based on that is that loading the frame script doesn't always correspond to something that fires a load event, so registerSelf isn't called in these cases and we lose track of the current frame (this is the case in test_window_switching.py). I have a workaround that makes the behaviour depend on whether we're starting a session on try here:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=955e256c7b50

I'm going to look at removing the tab opening code next, but this may suffice if that ends up being complicated.
Ok, getting rid of the tab opening code went fine. Here's that patch on try:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=edda96f19cce
Reusing the current open tab failed TestSwitchRemoteFrame.test_we_can_switch_to_a_remote_frame_by_index on osx. The check in marionette-server.js [1] is failing to detect when we're in a remote frame. Running through try with lots of logging inserted revealed that the href of the listenerWindow at this point is not a mismatch (without the tab reuse patch it persistently refers to "chrome://browser/content/hiddenWindow.xul"). We must be leaking some small amount of state between tests now that we reuse the tab. I thought this might just be a matter of navigating to "about:blank" between tests, and I guess we should do that, but it only moves this failure to TestSwitchRemoteFrame.test_remote_frame_revisit, where "about:blank" unexpectedly matches.

[1] http://hg.mozilla.org/mozilla-central/annotate/71edd80236b2/testing/marionette/marionette-server.js#l2639
This is just the new tab removal part, and takes a different approach for determining whether Marionette:register is coming from a remote frame. The message manager match is enforced by a prior receipt of Marionette:switchToFrame in the server. The danger is that this will match more than it did before, which could be a show-stopper, but I think the cases we care about would be matched by a switch up to the global message manager. If this seems risky I think we could add something to explicilty track the chain of commands that leads to registering a remote frame, but this looks tricky because of how registerSelf is called.

This looks pretty ok on try (the failures in Gij, Gip, and Mnw are consistent with m-c, those are hidden right now I guess):
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=5ec0e5ccf8a4
Attachment #8505616 - Flags: feedback?(jgriffin)
(In reply to Chris Manchester [:chmanchester] from comment #22)
> Created attachment 8505616 [details] [diff] [review]
> Re-use the current open tab when starting a new session in marionette
> 
> This is just the new tab removal part, and takes a different approach for
> determining whether Marionette:register is coming from a remote frame. The
> message manager match is enforced by a prior receipt of
> Marionette:switchToFrame in the server. The danger is that this will match
> more than it did before, which could be a show-stopper, but I think the
> cases we care about would be matched by a switch up to the global message
> manager. If this seems risky I think we could add something to explicilty
> track the chain of commands that leads to registering a remote frame, but
> this looks tricky because of how registerSelf is called.

I was looking at this again, and I think in the case we load the frame script in marionette-frame-manager, currentRemoteFrame isn't set early enough to guarantee this all happens in the right order. That's fixable, but I'm still interested in feedback on the approach. 

> 
> This looks pretty ok on try (the failures in Gij, Gip, and Mnw are
> consistent with m-c, those are hidden right now I guess):
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=5ec0e5ccf8a4
Comment on attachment 8505616 [details] [diff] [review]
Re-use the current open tab when starting a new session in marionette

Review of attachment 8505616 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great.

::: testing/marionette/marionette-server.js
@@ +2634,5 @@
>  
>          //go in here if we're already in a remote frame.
> +        if (this.curBrowser.frameManager.currentRemoteFrame !== null &&
> +            (!listenerWindow ||
> +             this.messageManager == this.curBrowser.frameManager.currentRemoteFrame.messageManager.get())) {

+1 ... this is a much better way to do this!
Attachment #8505616 - Flags: feedback?(jgriffin) → feedback+
Found fails test_set_window_size.py on linux on try (https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9789df1ee9f6), investigating now.
(In reply to Chris Manchester [:chmanchester] from comment #25)
> Found fails test_set_window_size.py on linux on try
> (https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9789df1ee9f6),
> investigating now.

It looks like resize events are bleeding between tests now that the tab is reused. At least I doubt I've broken the maximize_window functionality. I'm going to see if re-working the test makes a difference.
> It looks like resize events are bleeding between tests now that the tab is
> reused. At least I doubt I've broken the maximize_window functionality. I'm
> going to see if re-working the test makes a difference.

That wasn't it. screen.availHeight and screen.availWidth appear to be overreporting screen space slightly on Ubuntu, meaning that when we request a resize to these dimensions in the maximize code, we actually get two resizes: one to these slightly-too-large dimensions, and a second to correct down to the actually available screen space. Before my changes, the assertion that now fails would run between the two re-size events, but now it runs after both, so we find that the actual width/height are not the same as availWidth/availHeight (they're the smaller dimensions reflecting the true available screen space), and the test fails. Inserting a delay before the assertion runs is enough to consistently fail the test in this same manner on try (https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=4ba2669071e6).

I don't know the best way to deal with this, but I think we could either make the test more clever to detect that exactly this is happening, or perhaps skip it on linux.
> I don't know the best way to deal with this, but I think we could either make the test more clever to 
> detect that exactly this is happening, or perhaps skip it on linux.

I don't have a strong preference.  David, do you?
Flags: needinfo?(dburns)
I am happy to skip on Linux for now and then put it the queue for us to have a look at in the future
Flags: needinfo?(dburns)
See Also: → 1085717
Attachment #8500751 - Attachment is obsolete: true
Attachment #8505616 - Attachment is obsolete: true
Comment on attachment 8508742 [details] [diff] [review]
Modification to marionette to support running tests with e10s enabled.

Review of attachment 8508742 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/marionette/client/marionette/geckoinstance.py
@@ +25,5 @@
> +                      "browser.displayedE10SPrompt": 5,
> +                      "browser.displayedE10SPrompt.1": 5,
> +                      "browser.displayedE10SPrompt.2": 5,
> +                      "browser.displayedE10SPrompt.3": 5,
> +                      "browser.displayedE10SPrompt.4": 5}

What do these prefs do, btw?
Attachment #8508742 - Flags: review?(jgriffin) → review+
Attachment #8508743 - Flags: review?(jgriffin) → review+
(In reply to Jonathan Griffin (:jgriffin) from comment #32)
> Comment on attachment 8508742 [details] [diff] [review]
> Modification to marionette to support running tests with e10s enabled.
> 
> Review of attachment 8508742 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: testing/marionette/client/marionette/geckoinstance.py
> @@ +25,5 @@
> > +                      "browser.displayedE10SPrompt": 5,
> > +                      "browser.displayedE10SPrompt.1": 5,
> > +                      "browser.displayedE10SPrompt.2": 5,
> > +                      "browser.displayedE10SPrompt.3": 5,
> > +                      "browser.displayedE10SPrompt.4": 5}
> 
> What do these prefs do, btw?

These turn off the e10s prompt, which gets in the way now that we're reusing the first tab to open. https://bugzilla.mozilla.org/show_bug.cgi?id=1081996#c9 has the details on the prefs themselves.
I pushed this to try again when inbound was closed to see about anything I might have missed... unfortunately there's a timeout in W(2) on linux 64 that looks unfamiliar: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9223b59783c2

I ran this through try against wpt earlier as well and didn't see cause for concern. Re-triggering that a few time didn't re-create the issue: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=98c4666c0309
(In reply to Chris Manchester [:chmanchester] from comment #34)
> I pushed this to try again when inbound was closed to see about anything I
> might have missed... unfortunately there's a timeout in W(2) on linux 64
> that looks unfamiliar:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9223b59783c2
> 
> I ran this through try against wpt earlier as well and didn't see cause for
> concern. Re-triggering that a few time didn't re-create the issue:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=98c4666c0309

This was fixed by https://treeherder.mozilla.org/ui/#/jobs?repo=mozilla-inbound&revision=223e2f4b0d47 

Sorry about the noise.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/cdaf3cf65e31
https://hg.mozilla.org/mozilla-central/rev/8947682fbed3
Status: ASSIGNED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Iteration: --- → 36.1
Whiteboard: [qa?]
Blocks: 1092223
Duplicate of this bug: 941770
You need to log in before you can comment on or make changes to this bug.