Closed Bug 1335736 Opened 7 years ago Closed 7 years ago

Users stuck on a couple of old beta versions (48.0b99, 49.0b99 & 29.0b8)

Categories

(Release Engineering :: Release Requests, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philipp, Unassigned)

References

Details

after we found out that some users on the beta channel got stranded on 44.0b1 in bug 1277925, there are also a couple of other particular older versions which diverge from a normal/expected pattern in crash stats data.

you can look at the distribution of crash reports from beta versions in the past 24 hours for example: http://bit.ly/2kVoMAl

versions sticking out would be primarily be 48.0b99 + 49.0b99 (16-/8-times the volume we have from 50.0b99), 29.0b8, 48.0b1 and 34.0b9...
See Also: → 1335732
We would expect 48 to be high up the list because it's a watershed; in fact it might be seeing an increase right now from clients suddenly able to update past 44.

The other versions listed here I don't have immediate explanations for; I can't reproduce a failure to update from those versions, so it's certainly not the same issue as was fixed in bug 1334220.
See Also: → 1337148
Priority: -- → P1
:phillip, is it still the case?
Flags: needinfo?(madperson)
yes, adjusting the query from comment #0 to the past 24 hours still shows a high number of reports coming from those outdated versions:

1 	53.0b5 	11413 	34.48 %
2 	53.0b4 	2912 	8.80 %
3 	44.0b1 	2270 	6.86 %
4 	48.0rc 	1046 	3.16 %
5 	49.0rc 	752 	2.27 %
6 	53.0b2 	664 	2.01 %
7 	53.0b1 	474 	1.43 %
8 	29.0b8 	468 	1.41 %
9 	48.0b1 	400 	1.21 %
10 	52.0rc 	357 	1.08 %
11 	34.0b9 	289 	0.87 %
Flags: needinfo?(madperson)
Following up latest channel meeting, are these versions high enough for release engineering to review the update rules on beta?
> 44.0b1 	2270 	6.86 %
> 48.0rc 	1046 	3.16 %
> 49.0rc 	752 	2.27 %

Thanks :)
Flags: needinfo?(lhenry)
the high user numbers on 29.0b8, 44.0b1, 48.0rc are probably explained by watershed rules.
*29.0b8 was the watershed in bug 1009893 - do we even care about this any longer and could watersheds be lifted if they aren't needed anymore?
*44.0b1 was the sha-1 signing watershed
*48.0rc was the sse2 watershed, but most crash reports from this version are from users with cpus that fully support sse2

so 49.0rc4, 48.0b1 and 34.0b9 still don't have a clear explanation...
I am not sure here. From my last conversation about update orphans with rstrong this may just be the gradual stream of updates from infrequent users. It would likely be rstrong and mhowell who would end up investigating this, and they have a lot of work already. I don't have much time to look into it until after 53 release.
Looking back at this for a 24 hour period last week, looks like 1179 crashes reported from 44.0b1, so maybe those users are gradually updating. I don't think this is unusual for a watershed beta. 

I agree with philipp's point about 29.0b8. We shouldn't need that watershed anymore (for an onboarding tour from 29). Johan, can we eliminate that rule? I'm not sure that will help anyone update but at least it removes a pointless update rule from balrog.
Flags: needinfo?(lhenry) → needinfo?(jlorenzo)
Redirecting to the current releaseduty folks.
Flags: needinfo?(rgarbas)
Flags: needinfo?(mtabara)
Flags: needinfo?(jlorenzo)
Removing myself as garbas++ is taking care of this!
Flags: needinfo?(mtabara)
:lizzard: I scheduled a deletion (Scheduled Change ID: 165) for 29.0b8 rule which is due on Wed, 2017-07-26.

Is there a policy in place (which I'm not aware of) that you did not schedule this yourself. From what I can tell you should have enough permission to do this yourself.
Flags: needinfo?(rgarbas)
Oh I don't know. I've never deleted a rule before. I will take a look and maybe try it next time. It's more likely I would ask for advice/confirmation since I know the balrog rules can be tricky.
Flags: needinfo?(lhenry)
Since QE should also sign off, could we test the rule deletion? What is the staging server for balrog? 

Andrei, once we have the rule on staging maybe you could test that pre-29 versions update correctly without anything odd happening.  Thanks!
Flags: needinfo?(rgarbas)
Flags: needinfo?(andrei.vaida)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #12)
> Oh I don't know. I've never deleted a rule before. I will take a look and
> maybe try it next time. It's more likely I would ask for advice/confirmation
> since I know the balrog rules can be tricky.

No worries, makes sense. 

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #13)
> Since QE should also sign off, could we test the rule deletion? What is the
> staging server for balrog? 
> 
> Andrei, once we have the rule on staging maybe you could test that pre-29
> versions update correctly without anything odd happening.  Thanks!

Yes ... and no. 
Yes - we have a staging instance that lies within https://balrog-admin.stage.mozaws.net/rules
No - as it doesn't mirror the production rules, sadly. However, there's a workaround to that. We can actually copy-paste the rules from production - since they're just a handful of those - and schedule the same rule deletion on staging as well, then QE  can start testing. 

I'll add a NI for me as well to discuss this with :garbas on Monday.

Steps on Monday:
0. copy-paste Firefox/beta rules from production to staging as well
1. schedule the deletion quickly (will ask Sylvestre/jcristau on European timezone to signoff from RelMan)
2. handoff to QE for testing
3. ideally by the time lizzard comes online on PST, we'll have an answer here and we can leave the production change from Tuesday enact
Flags: needinfo?(mtabara)
Got side-tracked today and didn't have time to proceed with this. Apologies for this.
On the other hand, did some investigation and apparently there's a better way to test this.

The current rule id 52 that we want to delete is actually addressing "Firefox : beta*", which means it's applicable to also "beta-localtest" and "beta-cdntest". Like all the other rules from "Firefox: beta" filtering, they all have these channel "beta*" set.

So what we can do is this:

0. Schedule a change for rule 52 to change channel from "beta*" to "beta"
Effect: it still stays valid for "beta" channel but is no longer valid for "beta-localtest", nor "beta-cdntest"

1. QE does the testing on "beta-localtest" or "beta-cdntest" to test that pre-29 versions update correctly without anything odd happening

2. Once they signoff that everything is okay, we can finally schedule the change to delete the rule.
Flags: needinfo?(mtabara)
Scheduled the change :mtabara: mentioned.
Flags: needinfo?(rgarbas)
Status update:

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #15)
> 0. Schedule a change for rule 52 to change channel from "beta*" to "beta"
> Effect: it still stays valid for "beta" channel but is no longer valid for
> "beta-localtest", nor "beta-cdntest"

Done.
 
> 1. QE does the testing on "beta-localtest" or "beta-cdntest" to test that
> pre-29 versions update correctly without anything odd happening

Currently ongoing.
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #17)
  
> > 1. QE does the testing on "beta-localtest" or "beta-cdntest" to test that
> > pre-29 versions update correctly without anything odd happening
> 
> Currently ongoing.

The update testing was successfully performed on Windows 10 64bit, Windows 7 64bit, macOS 10.12 and 
Ubuntu 16.04 x64, using various pre-29 versions and locales combinations. Here are the results https://public.etherpad-mozilla.org/p/1335736. 
Please let me know if you have any feedback or questions about this report.
(In reply to Iulia Cristescu, QA [:IuliaC] from comment #18)
> (In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #17)
>   
> > > 1. QE does the testing on "beta-localtest" or "beta-cdntest" to test that
> > > pre-29 versions update correctly without anything odd happening
> > 
> > Currently ongoing.
> 
> The update testing was successfully performed on Windows 10 64bit, Windows 7
> 64bit, macOS 10.12 and 
> Ubuntu 16.04 x64, using various pre-29 versions and locales combinations.
> Here are the results https://public.etherpad-mozilla.org/p/1335736. 
> Please let me know if you have any feedback or questions about this report.

Thanks for all the work of this :IuliaC!
I'll go ahead and discuss these with lizzard and see if we can move forward with deleting it from 'beta' channel too.
Thanks again!
@lizzard: at first glance, those testing results look good to me, judging by the existing Balrog rules. You wanna double check that too or you wanna go over them together at some point?
Flags: needinfo?(lhenry)
Yes, the results look fine to me, let's go ahead deleting it from beta!
Flags: needinfo?(lhenry)
Flags: needinfo?(andrei.vaida)
I sheduled a deletion for RuleID 52 (Firefox-29.0b8-build1-schema2 watershed) for today (in ~10 hours) QE and RelMan.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #21)
> Yes, the results look fine to me, let's go ahead deleting it from beta!

Liz, please bless this with a sign-off to kill this with fire! :}
https://aus4-admin.mozilla.org/rules/scheduled_changes
Flags: needinfo?(lhenry)
Done.
Flags: needinfo?(lhenry)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #24)
> Done.

ACK that rule is dead. Thanks!
Should this be marked as resolved now?
Flags: needinfo?(mtabara)
(In reply to Rok Garbas [:garbas] from comment #26)
> Should this be marked as resolved now?

Good question. 302 to lizzard.
Flags: needinfo?(mtabara) → needinfo?(lhenry)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Flags: needinfo?(lhenry)
You need to log in before you can comment on or make changes to this bug.