Closed Bug 1472308 Opened Last year Closed Last year

Disable parallel OMTP for OSX 10.9 users via Normandy pref experiment

Categories

(Core :: Graphics, enhancement)

Unspecified
macOS
enhancement
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
firefox61 + fixed

People

(Reporter: RyanVM, Assigned: mythmon)

References

Details

Over in bug 1471892, we believe that parallel OMTP is causing crashes in OSX CoreGraphics code for users on 10.9. We would like to confirm that to be the case via Normandy.

Can we please create a recipe which changes the "layers.omtp.paint-workers" pref to a value of 1 for all OSX 10.9 users running Fx61 on the release channel? Thanks!
Flags: needinfo?(mcooper)
I've created a recipe on stage and prod implementing the behavior described in comment 0. I've enabled the one on stage for testing.

Stage recipe: https://normandy-admin.stage.mozaws.net/recipe/509/

Prod recipe: https://normandy-admin.prod.mozaws.net/recipe/495/

Public API: https://normandy.cdn.mozilla.net/api/v1/recipe/495/

For ease, here is the current filter expression:

    (
      normandy.telemetry.main.environment.system.os.name == "Darwin"
      && (
        normandy.telemetry.main.environment.system.os.version == "13.0.0"
        || normandy.telemetry.main.environment.system.os.version == "13.4.0"
      )
      && normandy.channel == "release"
      && normandy.version >= "61.0"
      && normandy.version < "62.0"
    )
Flags: needinfo?(mcooper)
Recipe looks good to me, thanks! Andrei, does your team have an OSX 10.9 system handy to test this out?
Flags: needinfo?(andrei.vaida)
We've been able to test the parallel OMTP stage disabling on Firefox 61.0 build3 (20180621125625). The results look good, but there is one detail that requires clarification: the disabling process takes place only after the browser is restarted. Is this expected or not? See more details about the performed testing in this etherpad https://public.etherpad-mozilla.org/p/request_bug1472308. 
Please let me know if additional testing is required or if there are questions about this report.
Flags: needinfo?(andrei.vaida) → needinfo?(mcooper)
Based on the testing details, I believe the restart behavior is as expected. Specifically:

> the recipe execution is made only after restart

This is as intended. Normally Normandy runs on a timer, every 6 hours. By setting app.normandy.dev_mode = true, Normandy runs at every start up. In this test procedure, the restart is the trigger to cause Normandy to run, instead of waiting around for the timer to fire.
Flags: needinfo?(mcooper)
Sounds like this is ready to go live then. Thanks for the quick testing!
This is now published.
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Hi Michael, per the investigation in bug 1471892, we'd like to revise this recipe such that we set the "layers.omtp.enabled" pref to false rather than changing "layers.omtp.paint-workers". What's the best way to go about that now?
Status: RESOLVED → REOPENED
Flags: needinfo?(mcooper)
Resolution: FIXED → ---
Because of the way that we set up this recipe, we'll need to make a new recipe to make this change, and deactivate the old one. That means that users will either have a window where both prefs are changed, or where neither of them are. If all goes well, it should be only a few seconds, but in some corner cases it could be up to six hours of Firefox running. Are either of these cases a problem? We can bias the edge cases one way or the other if needed.

I'll set up a new recipe for this, and we can figure out the transition.
We could probably just turn off the existing recipe ASAP since it's not helping anyway. That said, the old recipe shouldn't matter once the new one is live anyway (since disabling OMTP will outweigh any settings for controlling it when enabled), so it probably isn't a big deal.
The new recipe is 502.

Delivery Console (needs VPN currently): https://delivery-console.prod.mozaws.net/recipe/502/
Public API: https://normandy.cdn.mozilla.net/api/v1/recipe/502/

Feel free to disable the old recipe at any time, if it isn't helping. If you're happy with the new one, it should also be ready to go.
Flags: needinfo?(mcooper)
OK, I've disabled the old recipe and approved and published the new one. Thanks, Mike!
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
I think we need to update this recipe to make it a user pref so it's present on startup.
Flags: needinfo?(mcooper)
Unfortunately, we can't change the branch of the preference experiment either, since users don't update after the initial enrollment. I'll create a new recipe with the change, and add one to my counter of how bad an idea immutable enrollment was.
Part 3 is now live.
the crash rate for bug 1471892 isn't really going away. iulia, could you check if the pref flip from recipe number 3 filters through to your installation?
Flags: needinfo?(iulia.cristescu)
https://crash-stats.mozilla.com/report/index/bca7bf69-4100-43c5-9afb-db7990180716#tab-telemetryenvironment e.g. contains:

 "hotfix-omtp-61-osx-10-9-pt3-bug-1472308":{"branch":"hotfix","type":"normandy-exp"}

but the app notes still say "OMTP+1".
(In reply to [:philipp] from comment #16)
> the crash rate for bug 1471892 isn't really going away. iulia, could you
> check if the pref flip from recipe number 3 filters through to your
> installation?

I don't see the recipe being fetched or executed on OSX 10.9 (neither on stage). What is the difference between this one and the previous (the one from comment 1)? Also, the "layers.omtp.enabled" pref is still true.
Flags: needinfo?(mcooper)
Flags: needinfo?(madperson)
Flags: needinfo?(iulia.cristescu)
The second and third recipes were only published on prod, not stage. Only the third recipe is currently active. I verified that it is in the payload being sent to clients. Looking at Telemetry events, I see about 4 million enrollment events, about 15K enrollments, and only a handful of enrollment failures.

The difference between the first recipe and the second is the pref being changed. The first used "layers.omtp.paint-workers", whereas the second used "layers.omtp.enabled". It was necessary to make a second recipe since preference experiments can't be updated in place for already enrolled users.

There are two differences between the second and third recipes. The third targets the user branch, instead of the default branch (to make it compatible with a feature that requires a restart). The other change is to use a range expression for OSX version instead of fixed versions. I'm not sure this is relevant, but the targeting new behavior is a superset of the old behavior.

Neither of these changes would affect whether or not the recipe was being fetched, since fetching happens without regard to the targeting or recipe arguments.

Iulia, can you double check that the recipe is not being fetched? Can you share your STR if you still aren't seeing the recipe?
Flags: needinfo?(mcooper)
The notebook I used to calculate enrollment is here: https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/21345/command/21365
Iulia: I have a theory of why you didn't see the recipe. We recently changed the caching behavior of Normandy in a way that shouldn't affect this, but might. I have two requests: First, next time you try this can you run Firefox with MOZ_LOG=sync,timestamp,nsHttp:5,cache2:5 to record some extra network details? And second, if you get a run where you don't see the recipe again, please *save the profile*. It may be related to bug 1354151, which we have had a lot of trouble tracking down.
Flags: needinfo?(iulia.cristescu)
Michael, sorry for the late answer and thank you for the clarification! 
I double checked the third recipe and today I encountered no issues. 
I used exactly the same station (Marvericks 10.9.5) and the same build (61.0.1 build1 20180704003137), created new profiles, set "app.normandy.dev_mode" to "true" to run recipes immediately on startup and "app.normandy.logging.level" to "0" to enable more logging, then restarted the browser for the changes to take place. The "Hotfix: Disable parallel OMTP for OSX 10.9 - part 3 [Bug 1472308]" recipe was successfully displayed as fetched and executed in Browser Console and also, the "layers.omtp.enabled" was properly switched to "false". In this moment, the crash from bug 1471892 was still reproducible. After a restart (that also made sense, it was required in order to have OMTP disabled), I wasn't able to reproduce the crash anymore, using the steps mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1471892#c18. 
I suppose that the unsuccessful attempts from yesterday were caused by bug 1354151, but unfortunately I previously deleted the "bad" profiles and also I didn't manage to reproduce that awkward situation anymore.
Flags: needinfo?(madperson)
Flags: needinfo?(iulia.cristescu)
Please let me know if additional testing is required or if there is something unclear.
Thanks for the details Iulia. It sounds like your steps are correct, and make sense. I'm still puzzled why you ran into the problem last time you checked, but it seems like the problem has gone away now.
So, to summarize, the recipe *does* appear to be properly taking effect after a restart. And Iulia's STR for reproducing the crash no longer appear to work afterwards. But we're still not seeing a drop in crash rate in bug 1471892?!
This has been running for a while now. Is it still useful and necessary? Should it be ended? It is targeting Release 61 only, so it's usefulness is rapidly diminishing.
Go ahead and turn it off.
The Normandy recipe (https://delivery-console.prod.mozaws.net/recipe/509/) has been disabled.
You need to log in before you can comment on or make changes to this bug.