Closed Bug 1256439 Opened 8 years ago Closed 8 years ago

Marionette slowdown (runs twice that long) since implicit waits have been landed

Categories

(Remote Protocol :: Marionette, defect)

defect
Not set
major

Tracking

(firefox46 unaffected, firefox47 fixed, firefox48 fixed)

RESOLVED FIXED
mozilla48
Tracking Status
firefox46 --- unaffected
firefox47 --- fixed
firefox48 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

(Keywords: perf, regression)

Attachments

(1 file)

We had bug 1244837 which fixed a massive slowdown for Firefox 47.0a1 and 46.0a2. But in the last days I can still see that Marionette runs about twice that long for me on my local machine and also all taskcluster builds. We are really wasting lots of machine power here. I wish we would have a perf test for comparison that we no longer cause those regressions.

I will have a look at mozregression now.
Here a comparison for the same kind of test:

SUMMARY (46.0a2)
-------
 0:18.47 LOG: MainThread INFO passed: 10
 0:18.47 LOG: MainThread INFO failed: 0
 0:18.47 LOG: MainThread INFO todo: 0
 0:18.52 SUITE_END: MainThread 

SUMMARY (47.0a2)
-------
 0:55.06 LOG: MainThread INFO passed: 10
 0:55.06 LOG: MainThread INFO failed: 0
 0:55.06 LOG: MainThread INFO todo: 0
 0:55.10 SUITE_END: MainThread
Regression started with this merge from inbound to mozilla-central:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ad029c4e30b017efb2a255294b4528071d951430&tochange=c1e0d1890cfee9d86c8d566b0490053f21e0afc6

Testing changesets (aa5e5a12ce63 and 52b85b8b8c14) around the patch landing date of bug 1250102 gave me the real regression range.
Severity: normal → major
Flags: needinfo?(dburns)
Flags: needinfo?(ato)
Maybe the reason is the following default interval?

> function implicitlyWaitFor(func, timeout, interval = 100) {

This could definitely slow down everything.
Ok, that's definitely the problem here:
https://dxr.mozilla.org/mozilla-central/source/testing/marionette/element.js#668

Shortening the interval to 10ms makes it faster so all in all it takes about 20s instead of 55s in my local build.
Blocks: 1250102
Using 0s for the interval is even worse the latest Firefox beta: 16s vs 9s
Hm. Just a side-note for this method. Do we leak a timer instance given that we only cancel it when resolving the promise?
The underlying issue here is the initial wait time of 100ms before the function gets evaluated the first time. I do not see why this is necessary. Instead we should do the first evaluation immediately and only wait if the callback fails. This will restore the former performance of Marionette and leaves all tests as passing.
I will take it.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Flags: needinfo?(dburns)
Flags: needinfo?(ato)
Here a measure of our firefox-ui-functional tests on my local machine with and without the upcoming fix.

Without fix:
 5:32.17 LOG: MainThread INFO passed: 92
 5:32.17 LOG: MainThread INFO failed: 0
 5:32.17 LOG: MainThread INFO todo: 1
 5:32.21 SUITE_END: MainThread 

With fix:
 2:44.97 LOG: MainThread INFO passed: 92
 2:44.97 LOG: MainThread INFO failed: 0
 2:44.97 LOG: MainThread INFO todo: 1
 2:44.99 SUITE_END: MainThread 

Those are 332s vs. 165s. So we can indeed speed-up Marionette by a factor of 2x.

I'm wrestling through exiting Marionette unit tests now given that a lot of them fail on my machine.
As of now the implicit wait method waits the default interval before doing the first check against the passed in callback. This is not necessary and it can be changed for an immediate check. If that one fails we can still enter the timer driven loop and continue to wait.

Review commit: https://reviewboard.mozilla.org/r/40093/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/40093/
Attachment #8730697 - Flags: review?(dburns)
Summary: Marionette slowdown (runs twice that long) since Firefox 47.0 → Marionette slowdown (runs twice that long) since implicit waits have been landed
https://reviewboard.mozilla.org/r/40093/#review36585

This change looks good to me, but can you please rename the commit message to be more descriptive?  Instead of explaining what it should not do, try to explain the change.

E.g. “Run implicit wait function once before polling” or something similar, and then explain what it’s doing wrong in the commit message.
Comment on attachment 8730697 [details]
MozReview Request: Bug 1256439 - Run implicit wait callback once before start polling. r?automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/40093/diff/1-2/
Attachment #8730697 - Attachment description: MozReview Request: Bug 1256439 - Implicit wait should not wait an interval before doing the first check. r?automatedtester → MozReview Request: Bug 1256439 - Run implicit wait callback once before start polling. r?automatedtester
Comment on attachment 8730697 [details]
MozReview Request: Bug 1256439 - Run implicit wait callback once before start polling. r?automatedtester

https://reviewboard.mozilla.org/r/40093/#review36743
Attachment #8730697 - Flags: review?(dburns) → review+
https://hg.mozilla.org/mozilla-central/rev/7314496982c0
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Test-only change which fixes a regression in Marionette which is also necessary on current aurora.
Whiteboard: [checkin-needed-aurora]
This has already been uplifted to Aurora https://hg.mozilla.org/releases/mozilla-aurora/rev/0a61b7f086d7
Whiteboard: [checkin-needed-aurora]
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: