Remove Screenshots rollout pref

RESOLVED FIXED in Firefox 56

Status

()

defect
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: _6a68, Assigned: _6a68)

Tracking

unspecified
Firefox 57
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 fixed, firefox57 fixed)

Details

Attachments

(1 attachment)

We planned to use the `extensions.screenshots.system-disabled` pref to allow SHIELD to gradually roll out Screenshots to our Beta and Release populations.

That's happening in 55, so we can now remove the `system-disabled` pref from Nightly.
Assignee: nobody → jhirsch
Summary: Remove Screenshots roll-out pref → Remove Screenshots rollout pref
Comment on attachment 8892518 [details]
Bug 1386333 - Remove Screenshots rollout pref;

https://reviewboard.mozilla.org/r/163508/#review168882
Attachment #8892518 - Flags: review?(standard8) → review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ba769a1d2f82
Remove Screenshots rollout pref; r=standard8
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/ba769a1d2f82
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Comment on attachment 8892518 [details]
Bug 1386333 - Remove Screenshots rollout pref;

Approval Request Comment

[Feature/Bug causing the regression]:
It's not a regression.  We'd like to drop this pref in beta to get us to 100% users w/ screenshots enabled

[User impact if declined]:
Beta users don't get the screenshots button.

[Is this code covered by automated tests?]:
No

[Has the fix been verified in Nightly?]:
Yes, screenshots is on in nightly.

[Needs manual test from QE? If yes, steps to reproduce]: 
No

[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
No

[Why is the change risky/not risky?]:
It's just removing a pref.  Biggest risk is the screenshots service failing under too much traffic, but we were preffed on to 50% of beta already (via shield) and were fine and any problems would be fixable on the server.

[String changes made/needed]:
None
Attachment #8892518 - Flags: approval-mozilla-beta?
Comment on attachment 8892518 [details]
Bug 1386333 - Remove Screenshots rollout pref;

This moves control of the rollout to the Shield infrastructure. Right now that means 100% of beta users will get Screenshots in 56 beta 2.
Attachment #8892518 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I had to back this out from Beta for making Win7 ts_paint nearly perma-crash. I confirmed this patch as the culprit with a series of Try pushes.

https://hg.mozilla.org/releases/mozilla-beta/rev/57026202f82d

The scary thing to me is that the stacks were all over the place. Trashed memory somehow? I worry that 57 will also be affected when it goes to Beta. Not sure if there's been any Try simulations yet during this cycle.
https://treeherder.mozilla.org/logviewer.html#?job_id=122287926&repo=mozilla-beta
https://treeherder.mozilla.org/logviewer.html#?job_id=122288473&repo=mozilla-beta
https://treeherder.mozilla.org/logviewer.html#?job_id=122288491&repo=mozilla-beta
https://treeherder.mozilla.org/logviewer.html#?job_id=122288496&repo=mozilla-beta
https://treeherder.mozilla.org/logviewer.html#?job_id=122288500&repo=mozilla-beta
https://treeherder.mozilla.org/logviewer.html#?job_id=122288504&repo=mozilla-beta
Flags: needinfo?(jhirsch)
Target Milestone: --- → mozilla57
RyanVM, was this behavior observed when 56 was in Nightly?
I know the pref was disabled by default but I don't recall seeing this crash on our talos builds where we flipped the pref.
excuse me, enabled by default
Flags: needinfo?(ryanvm)
In IRC discussion with Jared, he's pointed out a few other crashes that look maybe related to this, but they've been difficult to narrow down. Beyond that, not as far as I know, but I also don't regularly watch Treeherder regularly, so I'm not a great person to answer questions about CI runs when 56 was on Nightly.
Flags: needinfo?(ryanvm)
These are shutdown crashes where we kill the content process because because it's still starting up when the parent process is shutting down. The shutdown blockers added in bug 1357486 were meant to prevent this, but it looks like the 1s timeout added in bug 1380267 is not quite enough to make talos happy.
> These are shutdown crashes where we kill the content process because because it's still starting up when the parent process is shutting down. The shutdown blockers added in bug 1357486 were meant to prevent this, but it looks like the 1s timeout added in bug 1380267 is not quite enough to make talos happy.

Interesting. Note that the patch in bug 1386457 was designed to more strictly synchronize startup and shutdown (because of a bug at startup time in 55 RC1: https://github.com/mozilla-services/screenshots/issues/3257).

Is it possible to estimate how often this might affect real world users?
Flags: needinfo?(kmaglione+bmo)
(In reply to Jared Hirsch [:_6a68] [:jhirsch] from comment #14)
> Is it possible to estimate how often this might affect real world users?

Probably not very often. Regardless, it should be easy enough to fix by just increasing the shutdown blocker timeout.
Flags: needinfo?(kmaglione+bmo)
kmag, RyanVM did a build of 57 as 56 with the above patch and it's looking fine in Treeherder. Any differences with the shutdown handling code in 57 that would prevent the crashes in comment 8?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=203bee2dc76f9f34a3cde40264b1b4b0ebbbbe09
Flags: needinfo?(kmaglione+bmo)
(In reply to Jared Hirsch [:_6a68] [:jhirsch] from comment #14)
> > These are shutdown crashes where we kill the content process because because it's still starting up when the parent process is shutting down. The shutdown blockers added in bug 1357486 were meant to prevent this, but it looks like the 1s timeout added in bug 1380267 is not quite enough to make talos happy.
> 
> Interesting. Note that the patch in bug 1386457 was designed to more
> strictly synchronize startup and shutdown (because of a bug at startup time
> in 55 RC1: https://github.com/mozilla-services/screenshots/issues/3257).

Actually, this patch will completely break the shutdown blocker that's meant to prevent this kind of issue, since we register the shutdown blocker when the shutdown method is called, and you don't call the shutdown method until the startup() method has resolved, which in the failure case is after shutdown has already started.
Flags: needinfo?(kmaglione+bmo)
> Actually, this patch will completely break the shutdown blocker that's meant to prevent this kind of issue, since we register the shutdown blocker when the shutdown method is called, and you don't call the shutdown method until the startup() method has resolved, which in the failure case is after shutdown has already started.

Oh, ok. Let me see if I understand correctly:

On one hand, screenshots needs to call `webextension.shutdown()` without waiting for startup to complete. This starts the shutdown blocker timer, which ensures the process is killed before the main Firefox process exits, avoiding a shutdown crash.

On the other hand, if we don't synchronize startup and shutdown, it seems like startup and shutdown could race, which might break the feature. For instance, [1] was caused by two racing startup() calls, but I could imagine a startup/shutdown race if shield were flipping the value the other way (from enabled back to disabled), or if the addon wasn't fully started when updates were applied. Are these edge cases possible?

What's the right fix here? Would it be enough to synchronize the startup calls, as in the fix for [1], but not synchronize the shutdown calls?

[1] https://github.com/mozilla-services/screenshots/issues/3257#issuecomment-319509605
Flags: needinfo?(kmaglione+bmo)
Please note here are the 32 and 64 bit artifact builds from the failing Beta push:

win32 build (win 7 tests run here) - https://queue.taskcluster.net/v1/task/FnA25Eo4TB24ZAv-JCPJ1Q/runs/1/artifacts/public/build/target.zip
win64 build (win 10 tests run here) - https://queue.taskcluster.net/v1/task/E2QxMRIMSQyNwao6PiqsNg/runs/0/artifacts/public/build/target.zip
(In reply to Jared Hirsch [:_6a68] [:jhirsch] from comment #18)
> > Actually, this patch will completely break the shutdown blocker that's meant to prevent this kind of issue, since we register the shutdown blocker when the shutdown method is called, and you don't call the shutdown method until the startup() method has resolved, which in the failure case is after shutdown has already started.
> 
> Oh, ok. Let me see if I understand correctly:
> 
> On one hand, screenshots needs to call `webextension.shutdown()` without
> waiting for startup to complete. This starts the shutdown blocker timer,
> which ensures the process is killed before the main Firefox process exits,
> avoiding a shutdown crash.
> 
> On the other hand, if we don't synchronize startup and shutdown, it seems
> like startup and shutdown could race, which might break the feature. For
> instance, [1] was caused by two racing startup() calls, but I could imagine
> a startup/shutdown race if shield were flipping the value the other way
> (from enabled back to disabled), or if the addon wasn't fully started when
> updates were applied. Are these edge cases possible?

No, in Firefox 56+, the Extension.shutdown method always waits for the last startup to complete before beginning shutdown, and vice versa, so this workaround is entirely unnecessary.
Flags: needinfo?(kmaglione+bmo)
Try pushes for bug 1389786 support this re-landing once Screenshots 10.12.0 is on Beta.
Depends on: 1389786
Target Milestone: mozilla57 → Firefox 57
Product: Cloud Services → Firefox
Flags: needinfo?(jhirsch)
You need to log in before you can comment on or make changes to this bug.