Closed Bug 1669169 Opened 2 months ago Closed 15 days ago

Set marionette.actors.enabled to true by default for all builds

Categories

(Testing :: Marionette, task, P1)

Default
task

Tracking

(Fission Milestone:M7, firefox84 fixed)

RESOLVED FIXED
84 Branch
Fission Milestone M7
Tracking Status
firefox84 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Blocks 7 open bugs)

Details

(Whiteboard: [marionette-fission-mvp][simple])

Attachments

(3 files, 1 obsolete file)

Once Fission builds work fine with the marionette.actors.enabled preference set to true, we will have to enable the preference for all the builds, means non-Fission too.

Blocks: 1669172
Blocks: 1669174

Tracking marionette-fission-mvp bugs for Fission Beta milestone (M7)

Fission Milestone: --- → M7
No longer blocks: 1580699

FYI actors for Fission jobs will be enabled tomorrow via bug 1660168. I will check soon what needs to be done to enable it for all builds.

Here a preliminary try push with actors enabled for non-Fission jobs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b5f6e645275bb7cc1cd6c369621c1678ac43d2f

We seem to hit a performance regression for non-Fission builds at least when both the actor and framescript implementations are active.

That means that we should only register the framescript when actors aren't in use. There is bug 1669174.

No longer blocks: 1669174
Depends on: 1669174
Blocks: 1669174
No longer depends on: 1669174

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #0)

Once Fission builds work fine with the marionette.actors.enabled preference set to true, we will have to enable the preference for all the builds, means non-Fission too.

By doing so we still have to keep test coverage for the framescript implementation. As such a new job kind would have to be added for non-Fission builds only. For Fission we should only support actors.

No longer blocks: 1669174
Depends on: 1669174
Depends on: 1673851

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #4)

We seem to hit a performance regression for non-Fission builds at least when both the actor and framescript implementations are active.

That means that we should only register the framescript when actors aren't in use. There is bug 1669174.

Looks like the performance regression was a red herring, and that work is not really necessary to get actors enabled for all builds. Maybe we should at least move the page navigation events over to the actors (bug 1673823) to make navigation a bit more stable.

Depends on: 1673823
No longer depends on: 1669174
Depends on: 1675320
No longer depends on: 1673851

Given that bug 1675320 is actually a low intermittent failure for Fission builds we decided to not let it block enabling actors for all builds. As such I will upload the patches soon that will turn actors on by default, and also adds a new Marionette job for TaskCluster that runs the framescript code path on opt builds only.

No longer depends on: 1675320
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Priority: P3 → P1

Comment on attachment 9186839 [details]
Bug 1669169 - [marionette] Hardening registration and unregistration of MarionetteEvents actor.

Revision D96525 was moved to bug 1676283. Setting attachment 9186839 [details] to obsolete.

Attachment #9186839 - Attachment is obsolete: true
Depends on: 1676283
Regressions: 1676283
No longer regressions: 1676283
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1363594da38e
[marionette] Enable usage of JSWindowActor by default when Fission is disabled. r=marionette-reviewers,jdescottes
https://hg.mozilla.org/integration/autoland/rev/4229097b45c7
[marionette] Add --disable-actors command line argument to force usage of deprecated framescripts instead of actors. r=marionette-reviewers,jdescottes,maja_zf
https://hg.mozilla.org/integration/autoland/rev/bbf90b97018a
[marionette] Add MnFr TaskCluster job to run Marionette tests with actors disabled. r=marionette-reviewers,jdescottes,jmaher,maja_zf
Status: ASSIGNED → RESOLVED
Closed: 15 days ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch
Regressions: 1677475

As per bug 1677475 this change attributes to huge performance improvements:

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
22% Images macosx1014-64-shippable-qr 6,808,488.80 -> 5,324,379.10
21% Images windows10-64-shippable 7,281,816.21 -> 5,743,408.09
20% Images linux1804-64-shippable 5,531,400.10 -> 4,416,162.52
20% Images linux1804-64-shippable-qr 5,723,944.14 -> 4,580,189.36
19% Images windows10-64-shippable-qr 7,683,332.16 -> 6,187,840.94
19% Images windows7-32-shippable 6,165,551.84 -> 5,005,238.78
14% Explicit Memory windows10-64-shippable 328,308,125.66 -> 282,289,160.78
14% Explicit Memory windows7-32-shippable 270,060,971.52 -> 232,226,621.77
14% Resident Memory linux1804-64-shippable-qr 1,147,429,973.58 -> 988,011,424.95
14% Explicit Memory linux1804-64-shippable 358,584,795.96 -> 309,388,253.07
14% Explicit Memory windows10-64-shippable-qr 347,818,343.53 -> 300,111,776.46
13% Explicit Memory linux1804-64-shippable-qr 599,887,245.23 -> 520,569,325.12
12% Explicit Memory macosx1014-64-shippable-qr 398,158,096.69 -> 348,672,169.15
12% Resident Memory windows10-64-shippable-qr 547,322,517.50 -> 479,487,633.70
11% Resident Memory windows10-64-shippable 507,236,742.83 -> 449,415,892.09
11% Heap Unclassified windows7-32-shippable 35,549,067.62 -> 31,636,763.74
11% Heap Unclassified windows10-64-shippable-qr 56,739,704.72 -> 50,561,922.16
10% Heap Unclassified linux1804-64-shippable-qr 288,328,260.46 -> 258,227,806.70
10% Resident Memory windows10-64-shippable-qr 538,771,884.24 -> 483,816,172.30
10% Heap Unclassified windows10-64-shippable 44,664,979.54 -> 40,177,199.78
10% JS linux1804-64-shippable-qr 88,494,494.90 -> 79,660,064.82
10% JS linux1804-64-shippable 88,521,389.48 -> 79,873,908.01
10% Resident Memory linux1804-64-shippable 627,056,782.27 -> 565,954,584.79
9% Resident Memory macosx1014-64-shippable-qr 787,653,100.93 -> 714,441,972.31
9% JS windows10-64-shippable-qr 89,035,157.31 -> 80,776,586.75
9% JS windows10-64-shippable 88,619,251.19 -> 80,547,273.17
9% Resident Memory windows7-32-shippable 528,511,962.94 -> 480,395,089.35
9% JS macosx1014-64-shippable-qr 89,875,671.52 -> 82,001,873.13
8% Heap Unclassified linux1804-64-shippable 69,651,112.99 -> 64,218,327.82
7% JS windows7-32-shippable 66,986,737.48 -> 62,417,510.05
7% Images linux1804-64-shippable-qr tp6 8,421,448.20 -> 7,860,156.90
6% Heap Unclassified macosx1014-64-shippable-qr 109,628,575.20 -> 103,274,877.49
6% Images linux1804-64-shippable tp6 8,516,567.48 -> 8,031,870.89
6% Images windows7-32-shippable tp6 7,873,318.17 -> 7,429,401.20
5% Images macosx1014-64-shippable-qr tp6 9,296,358.33 -> 8,810,724.50

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

This is awesome, it's always nice to reduce the overhead of our testing infrastructure, and get a more accurate read on real-world performance.

We should definitely make sure to verify that these improvements are real though, when changing a test framework, there's always the possibility of accidentally breaking our measurements, which could also lead to this kind of improvement.

Flags: needinfo?(hskupin)

So after some investigation I would say that we should trust those values. Here is what I did:

  1. I run awsy jobs locally by modifying the load time to quicker get results while it iterates through all the test pages. I did that once with actors and then with the framescript. Both modes behave exactly the same as far as I can see.

  2. I then remembered that we have memory leaks with Marionette like https://github.com/mozilla/geckodriver/issues/181, and that users of geckodriver report that Firefox's memory usage grows a lot while navigating pages. And that is exactly what awsy is doing here.

  3. I run both awsy tests again by creating a gecko profile for each. So here is the profile with actors, and the profile with the framescript. Note that both awsy were quick runs with a bit of over 1 minute. Usually those jobs run for about 1 hour.

When you compare both you can already see that the profile for the framescript case uses about twice that much as for the actor case. If awsy runs way longer and does a lot more navigations, those numbers should become even worse.

Flags: needinfo?(hskupin)

I'm not sure if this bug or bug 1676283 cause the improvement but I presume that you're happy that regression bug 1677475 was partially inaccurate and the items left are all imrpovements. Can't guarantee that some other alert items that will appear won't be regressions, but that's it for the moment.
== Change summary for alert #27636 (as of Fri, 13 Nov 2020 12:48:23 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
10% Resident Memory windows10-64-shippable-qr 538,771,884.24 -> 483,816,172.30
6% Images linux1804-64-shippable tp6 8,516,567.48 -> 8,031,870.89

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=27636

You need to log in before you can comment on or make changes to this bug.