include information on whether system is at risk of bug 1296630 in the update ping

RESOLVED FIXED in Firefox 50

Status

()

defect
P1
normal
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: dbaron, Assigned: rstrong)

Tracking

Trunk
mozilla52
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox50+ fixed, firefox51 fixed, firefox52 fixed)

Details

Attachments

(3 attachments, 4 obsolete attachments)

In bug 1296630 comment 87, :philipp suggested that we should have information in the update ping that tells us whether systems are susceptible to bug 1296630 (which is PGO-dependent, but occurs only on some microcode versions for a particular CPU family:  GenuineIntel family 6 model 61 stepping 4, with microcode version <= 0x19 (although we lack data on 0x1a..0x1c)).

This would allow us to hold a build back from this smallish set of users if a particular build that is urgent to ship turns out to be crashy on those machines.
This is similar to what was done for websense in bug 1298404.

Note also that code to detect the microcode version and include it in crash reports was added in 1305120.  (This would also need to be conditional on CPU model, though.)
Is this a reasonable thing to do?  If so, any ideas who could do it?
Flags: needinfo?(felipc)
Flags: needinfo?(benjamin)

Updated

3 years ago
Flags: needinfo?(benjamin) → needinfo?(robert.strong.bugs)
If this were to be done in product and added to the app update url it would be added to
https://dxr.mozilla.org/mozilla-central/source/toolkit/modules/UpdateUtils.jsm

If this were to be done like what was done with bug 1298404 then it would be done in an add-on and change a Firefox pref.

Are you looking to add this permanently?
Flags: needinfo?(robert.strong.bugs) → needinfo?(dbaron)
Note that it appears that many systems aren't getting the system add-on when it is pushed - bug 1307568
Ben, another heads up
Flags: needinfo?(bhearsum)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #5)
> Ben, another heads up

Thanks. Looks like we haven't decided exactly what it will look like, but if possible, something similar to what we did for websense would be good. eg: "(websense)" and "(nowebsense)". The parentheses make it easy to do positive and negative matching for either variant.
Flags: needinfo?(bhearsum)
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #2)
> Is this a reasonable thing to do?  If so, any ideas who could do it?

Yeah, reasonable definitely. Is there any urgent need to ship this to past versions? Writing the code from bug 1305120 in JS would be messy, and it'd need to be in JS if we ship it through a system add-on.


Recently, in bug 1271761, a new %SYSTEM_CAPABILITIES% parameter was added to the update url for exactly this sort of information.

My suggestion would be slightly refactor bug 1305120 to make that information easily exposed to JS, and then add that info in UpdateUtils.jsm :: gSystemCapabilities?
Flags: needinfo?(felipc)
Felipe, here is some js spaghetti code that checks "Update Revision" (Win8 and above) that I think will do the trick. Should be trivial to add "Update Signature" (Win 7).

var wrk = Components.classes["@mozilla.org/windows-registry-key;1"].createInstance(Components.interfaces.nsIWindowsRegKey);
wrk.open(wrk.ROOT_KEY_LOCAL_MACHINE, "HARDWARE\\DESCRIPTION\\System\\CentralProcessor\\0", wrk.ACCESS_READ);
var val = wrk.readBinaryValue("Update Revision");
if (val.length == 8) {
  var hex = [];
  for (var i = 0; i < 8; i++) {
    var c = val.charCodeAt(i).toString(16);
    if (c.length == 1) {
      c = "0" + c;
    }
    hex.unshift(c);
  }
  var hi = parseInt(hex.slice(0, 4).join(''), 16);
}


What do you think?
Flags: needinfo?(felipc)
Yeah, looks doable.

Marco, would you like to work on this? It'd be very similar to bug 1306081. I can guide you through the system add-on process and review the code.
Flags: needinfo?(felipc) → needinfo?(mcastelluccio)

Comment 10

3 years ago
(In reply to :Felipe Gomes (needinfo me!) from comment #7)
> (In reply to David Baron :dbaron: ⌚️UTC-7 from comment #2)
> > Is this a reasonable thing to do?  If so, any ideas who could do it?
> 
> Yeah, reasonable definitely. Is there any urgent need to ship this to past
> versions? Writing the code from bug 1305120 in JS would be messy, and it'd
> need to be in JS if we ship it through a system add-on.
I don't think that shipping to past versions would be necessary. this issue first popped up in some firefox 49 beta builds - 49.0.2 which was just released luckily seems unaffected by the pattern from bug 1296630 judged on early stability data. 
so the next time this might come in handy would be with a potential dot release for firefox 50 or the already planned firefox 50.1.
If we can avoid an add-on I'd prefer it since the ability to get them pushed to systems is being evaluated in bug 1307568.

Ben, since this will most likely be in product and there are other consumers of %OS_VERSION% I'd prefer adding this to %SYSTEM_CAPABILITIES% or possibly to %CUSTOM% which apps can set so it is easier to deploy these types of changes. Are you ok with either of these?
Flags: needinfo?(bhearsum)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #11)
> If we can avoid an add-on I'd prefer it since the ability to get them pushed
> to systems is being evaluated in bug 1307568.
> 
> Ben, since this will most likely be in product and there are other consumers
> of %OS_VERSION% I'd prefer adding this to %SYSTEM_CAPABILITIES% or possibly
> to %CUSTOM% which apps can set so it is easier to deploy these types of
> changes. Are you ok with either of these?

Either is workable, but I think %SYSTEM_CAPABILITIES% is probably better. If we add a new field, we need another watershed :(. Perhaps we should pre-emptively add %CUSTOM% for the future, though.
Flags: needinfo?(bhearsum)
If it's going to be a system add-on, I'm OK with working on it.
Flags: needinfo?(mcastelluccio)
Since this will be in product I would prefer it if this just sent the entire value from the registry just in case it becomes useful. Here is an example value from my system without commas separators.

000000001e000000

Would this suffice?
Flags: needinfo?(milan)
Note that we also need the CPU model (we want to block "GenuineIntel family 6 model 61 stepping 4" with those microcode versions, not all CPU models with those microcode versions).
So, will the value of the Identifier key be enough or does it need to be a combination of VendorIdentifier and Identifier?

From my system
Identifier = Intel64 Family 6 Model 60 Stepping 3
GenuineIntel = GenuineIntel
^^
Flags: needinfo?(mcastelluccio)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #14)
> Since this will be in product I would prefer it if this just sent the entire
> value from the registry just in case it becomes useful. Here is an example
> value from my system without commas separators.
> 
> 000000001e000000
> 
> Would this suffice?
Note: this is displayed in regedit as 00 00 00 00 1e 00 00 00
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #16)
> So, will the value of the Identifier key be enough or does it need to be a
> combination of VendorIdentifier and Identifier?
> 
> From my system
> Identifier = Intel64 Family 6 Model 60 Stepping 3
> GenuineIntel = GenuineIntel

I'm not sure if there is an AMD CPU that has the same identifier ("family 6 model 61 stepping 4"), I suppose not, but perhaps it's better to include them both to be safe.
Flags: needinfo?(mcastelluccio)
I agree with comment 19 (also not sure if there's Intel/AMD overlap), with the note that we *also* need the microcode version as noted in comment 14.

And I agree with comment 10 that this doesn't need to be shipped for past versions.  For a .0 release, we'd detect the problem on the beta channel when we push the RC, so we'd be able to respin the builds to get different PGO randomness.  The issue is that for a security update we might need to move faster than that, and thus have the ability to stop offering the update to the small percentage of our audience affected by the crash until we had new builds for them.

Presumably that means in-product is fine, and it sounds like that's a preferable approach?
Flags: needinfo?(dbaron)
David, the value from comment #14 of 000000001e000000 along with the values of Identifier and VendorIdentifier?

Ben, heads up about the values being requested. Would the following work for you for %SYSTEM_CAPABILITIES%? Strings are surrounded by parenthesis.

/SSE3 000000001e000000 (GenuineIntel) (Intel64 Family 6 Model 60 Stepping 3)/
Flags: needinfo?(dbaron)
Flags: needinfo?(bhearsum)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #21)
> David, the value from comment #14 of 000000001e000000 along with the values
> of Identifier and VendorIdentifier?

Yes.  Although it's possible we want to report the bytes in a different order, since such a value is usually described as 0x1e.

> Ben, heads up about the values being requested. Would the following work for
> you for %SYSTEM_CAPABILITIES%? Strings are surrounded by parenthesis.
> 
> /SSE3 000000001e000000 (GenuineIntel) (Intel64 Family 6 Model 60 Stepping 3)/

Note that the condition we'd be testing on would be a string-match on the parenthesized parts and a numeric <= match on the 0x1e.  (Though if needed we could just list out the 10 or so possibilities of the microcode version.)
Flags: needinfo?(dbaron)
A little more detail on the microcode version number.  The  Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference's definition of the CPUID instruction says:

For processors that support the microcode update facility, the IA32_BIOS_SIGN_ID
MSR is loaded with the update signature whenever CPUID executes. The signature is
returned in the upper DWORD. For details, see Chapter 9 in the Intel® 64 and IA-32
Architectures Software Developer’s Manual, Volume 3A.

So it's possible we'd want to report only the latter 32-bits of that 64-bit value, but treating it as a little-endian number.
OK, then %SYSTEM_CAPABILITIES% would look similar to
/SSE3 0x1e (GenuineIntel) (Intel64 Family 6 Model 60 Stepping 3)/

Ben, if you'd like different delimiters or a change in the ordering let me know.
Also, we'll do something similar as was done for SSE2 and above... if there is an error the value will be "error" and unknown if the binary value isn't set in the registry. Same thing for the strings.
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #20)
>...
> Presumably that means in-product is fine, and it sounds like that's a
> preferable approach?
The success rate of pushing add-ons to clients is being investigated since it appears to be rather low so in product is preferable at this time.
Posted patch patch in progress (obsolete) — Splinter Review
Assignee: nobody → robert.strong.bugs
Status: NEW → ASSIGNED
Note: the patch needs more error handling which I will add later.
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #22)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #21)
> > David, the value from comment #14 of 000000001e000000 along with the values
> > of Identifier and VendorIdentifier?
> 
> Yes.  Although it's possible we want to report the bytes in a different
> order, since such a value is usually described as 0x1e.
> 
> > Ben, heads up about the values being requested. Would the following work for
> > you for %SYSTEM_CAPABILITIES%? Strings are surrounded by parenthesis.
> > 
> > /SSE3 000000001e000000 (GenuineIntel) (Intel64 Family 6 Model 60 Stepping 3)/
> 
> Note that the condition we'd be testing on would be a string-match on the
> parenthesized parts and a numeric <= match on the 0x1e.  (Though if needed
> we could just list out the 10 or so possibilities of the microcode version.)

I want to make sure I understand the desired matching logic. It sounds like we need to either support comparsions of the microcode part, as described above or have a way to list out all of the unsupported versions.

I'm also getting the impression that we may need to match multiple, separate substrings within %SYSTEM_CAPABILITIES%. Eg: match the microcode part and possibly also the stepping.
Flags: needinfo?(robert.strong.bugs)
Flags: needinfo?(dbaron)
Flags: needinfo?(bhearsum)
(In reply to Ben Hearsum (:bhearsum) from comment #29)
> (In reply to David Baron :dbaron: ⌚️UTC-7 from comment #22)
> > (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> > comment #21)
> > > David, the value from comment #14 of 000000001e000000 along with the values
> > > of Identifier and VendorIdentifier?
> > 
> > Yes.  Although it's possible we want to report the bytes in a different
> > order, since such a value is usually described as 0x1e.
> > 
> > > Ben, heads up about the values being requested. Would the following work for
> > > you for %SYSTEM_CAPABILITIES%? Strings are surrounded by parenthesis.
> > > 
> > > /SSE3 000000001e000000 (GenuineIntel) (Intel64 Family 6 Model 60 Stepping 3)/
> > 
> > Note that the condition we'd be testing on would be a string-match on the
> > parenthesized parts and a numeric <= match on the 0x1e.  (Though if needed
> > we could just list out the 10 or so possibilities of the microcode version.)
> 
> I want to make sure I understand the desired matching logic. It sounds like
> we need to either support comparsions of the microcode part, as described
> above or have a way to list out all of the unsupported versions.
> 
> I'm also getting the impression that we may need to match multiple, separate
> substrings within %SYSTEM_CAPABILITIES%. Eg: match the microcode part and
> possibly also the stepping.
That is my understanding from the comments in this bug.
Flags: needinfo?(robert.strong.bugs)
(In reply to Ben Hearsum (:bhearsum) from comment #29)
> I want to make sure I understand the desired matching logic. It sounds like
> we need to either support comparsions of the microcode part, as described
> above or have a way to list out all of the unsupported versions.
> 
> I'm also getting the impression that we may need to match multiple, separate
> substrings within %SYSTEM_CAPABILITIES%. Eg: match the microcode part and
> possibly also the stepping.

So the concrete example that we might need for bug 1296630 is to *not* send a particular update to systems that meet *all* of the following conditions:
 * GenuineIntel
 * family 6 model 61 stepping 4
 * microcode version is 0x19 or less (or, alternatively, microcode version is one of: 0xe, 0x11, 0x12, 0x13, 0x16, 0x18, 0x19)
Flags: needinfo?(dbaron)
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #31)
> (In reply to Ben Hearsum (:bhearsum) from comment #29)
> > I want to make sure I understand the desired matching logic. It sounds like
> > we need to either support comparsions of the microcode part, as described
> > above or have a way to list out all of the unsupported versions.
> > 
> > I'm also getting the impression that we may need to match multiple, separate
> > substrings within %SYSTEM_CAPABILITIES%. Eg: match the microcode part and
> > possibly also the stepping.
> 
> So the concrete example that we might need for bug 1296630 is to *not* send
> a particular update to systems that meet *all* of the following conditions:
>  * GenuineIntel
>  * family 6 model 61 stepping 4
>  * microcode version is 0x19 or less (or, alternatively, microcode version
> is one of: 0xe, 0x11, 0x12, 0x13, 0x16, 0x18, 0x19)

OK, thanks for the clarifications. I'm not exactly sure what to do. This field currently does full string matching in Balrog, because we need to be able to match plain "SSE". Using a substring match obviously causes us to match any SSE level, and will break the SSE deprecation rules that we have in place.

Doing a full string match looks like it would have 28 unique values (7 microcode versions * 4 unique SSE values), which is well past what we can put in one Rule. I can't think of a way to implement this without it being extremely unwieldy.

The current model in Balrog only allows us to AND things that are in separate fields, so if these all got their own section in the update URL, this would be quite easy. I admit that's a pretty unpalatable solution though. In the medium/long term we might be able to do a little better if a field like this had a well defined format, but that's not something that's going to help us now.

Stepping back for a minute - how many crashes are we talking about? I don't want to be stop energy, but the new watersheds and websense Rules have already put us in a place where it's very difficult to reason about and correctly modify Rules. I don't think it's a good trade-off to add even more complexity if we're talking about handful of users. As we increase Rule complexity, it makes it harder for us to make quick changes (eg: when there's a chemspill or a startup crash or whatnot), and increases the risk that we mess them up when we do.
Ben, would putting this info in its own field help?
Flags: needinfo?(bhearsum)

Comment 34

3 years ago
(In reply to Ben Hearsum (:bhearsum) from comment #32)
> Stepping back for a minute - how many crashes are we talking about? 

in badly affected beta builds bug 1296630 can account for >10% of all browser crashes.

maybe not to increase complexity with balrog rules too much, could the logic (intel+family/model+microcode) be implemented client-side and only submit a simple indicator like "(bug1296630)" to the update server if all conditions are met?
(In reply to [:philipp] from comment #34)
> (In reply to Ben Hearsum (:bhearsum) from comment #32)
> > Stepping back for a minute - how many crashes are we talking about? 
> 
> in badly affected beta builds bug 1296630 can account for >10% of all
> browser crashes.
> 
> maybe not to increase complexity with balrog rules too much, could the logic
> (intel+family/model+microcode) be implemented client-side and only submit a
> simple indicator like "(bug1296630)" to the update server if all conditions
> are met?
I'm ok with this with the understanding that if more or less conditions are desired that we won't be able to do anything since the check will be in product and that pushing out system add-ons to change the behavior isn't a viable option since they appear to only reach about 55% of systems - bug 1307568
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #33)
> Ben, would putting this info in its own field help?

We'd have to put each section in its own field, unfortunately. So, 3 new fields.

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #35)
> (In reply to [:philipp] from comment #34)
> > (In reply to Ben Hearsum (:bhearsum) from comment #32)
> > > Stepping back for a minute - how many crashes are we talking about? 
> > 
> > in badly affected beta builds bug 1296630 can account for >10% of all
> > browser crashes.
> > 
> > maybe not to increase complexity with balrog rules too much, could the logic
> > (intel+family/model+microcode) be implemented client-side and only submit a
> > simple indicator like "(bug1296630)" to the update server if all conditions
> > are met?
> I'm ok with this with the understanding that if more or less conditions are
> desired that we won't be able to do anything since the check will be in
> product and that pushing out system add-ons to change the behavior isn't a
> viable option since they appear to only reach about 55% of systems - bug
> 1307568

This would work with the current version of Balrog. It's not great, but for my POV it's the best trade-off until we have something for robust on the server. I filed bug 1312499 for the specific case of multi-part matching in one field.
Flags: needinfo?(bhearsum)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #14)
> Since this will be in product I would prefer it if this just sent the entire
> value from the registry just in case it becomes useful. Here is an example
> value from my system without commas separators.
> 
> 000000001e000000

Is the question whether we should update bug 1305120 patch to show both numbers, instead of just the higher bits?  I don't really mind either way.  It sounded from the original description that those "will never be used", but we know how that can change.
Flags: needinfo?(milan)
Sorry, I should have removed the needinfo as the comments evolved. It seems that the complexity of this for balrog may be too much and the current state is as follows.

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #35)
> (In reply to [:philipp] from comment #34)
> > (In reply to Ben Hearsum (:bhearsum) from comment #32)
> > > Stepping back for a minute - how many crashes are we talking about? 
> > 
> > in badly affected beta builds bug 1296630 can account for >10% of all
> > browser crashes.
> > 
> > maybe not to increase complexity with balrog rules too much, could the logic
> > (intel+family/model+microcode) be implemented client-side and only submit a
> > simple indicator like "(bug1296630)" to the update server if all conditions
> > are met?
> I'm ok with this with the understanding that if more or less conditions are
> desired that we won't be able to do anything since the check will be in
> product and that pushing out system add-ons to change the behavior isn't a
> viable option since they appear to only reach about 55% of systems - bug
> 1307568

I'd like to get buy in for this if it is acceptable before implementing.
The only problem I can see with the proposed solution is that we would obviously not be able to block other configurations (e.g. Beta 9 is affected by bug 1312270, with "AuthenticAMD family 20 model 1 stepping 0" and "AuthenticAMD family 20 model 2 stepping 0", but not by bug 1296630).
Milan, can I get your or someone else that has worked on Bug 1296630 as to whether what I described in comment #35 is acceptable?

Marco, should this be added as well if we go with the simpler option?
Flags: needinfo?(milan)
Flags: needinfo?(mcastelluccio)
I'd like to tweak option 35 somewhat, and then I think we cover all the bases.  As suggested, have the check on the client side, to match the model string and check against a minimal version of the microcode.  However, specify the string and the minimal version as a preference.  We'd set a preference to something like (ignore the AMD part, but needed an example to allow multiple configurations "GenuineIntel family 6 model 61 | 0x1a; AuthenticAMD family 20 model 1 | 0x20".  We can then parse the preference, and if we don't like how the values compare to what we have, the client side makes the correct decision.

It would be a bit more work up front, to get the preference going, and we can certainly land it without that, while we only have a single example of wanting to do this.  The preference gives us a remote way of changing it, albeit only for those 55% of the systems.  Still, better than nothing.
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #41)
> I'd like to tweak option 35 somewhat, and then I think we cover all the
> bases.  As suggested, have the check on the client side, to match the model
> string and check against a minimal version of the microcode.  However,
> specify the string and the minimal version as a preference.  We'd set a
> preference to something like (ignore the AMD part, but needed an example to
> allow multiple configurations "GenuineIntel family 6 model 61 | 0x1a;
> AuthenticAMD family 20 model 1 | 0x20".  We can then parse the preference,
> and if we don't like how the values compare to what we have, the client side
> makes the correct decision.
> 
> It would be a bit more work up front, to get the preference going, and we
> can certainly land it without that, while we only have a single example of
> wanting to do this.  The preference gives us a remote way of changing it,
> albeit only for those 55% of the systems.  Still, better than nothing.

I think I would prefer going with a new field in the app.update.url preference and just have a system add-on in the release replace that fields when present.

Keep in mind that we are already working with the app.update.url preference so baking in logic to get these values, compare it with a new preference, and then replace a value from the app.update.url preference is less flexible since the comparison logic is baked in. This way the compare logic can be updated via the system add-on as well instead of just the string(s) that are used for the comparison.

Sound ok?
Flags: needinfo?(milan)
Ben, if you prefer the add-on can just replace an existing field as the websense add-on does. Would you be ok with a new field or an existing field?
Flags: needinfo?(bhearsum)
Felipe, considering your understanding of system add-ons and the desire to possibly update this code does the course I stated in comment #42 make sense to you?
Flags: needinfo?(felipc)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #43)
> Ben, if you prefer the add-on can just replace an existing field as the
> websense add-on does. Would you be ok with a new field or an existing field?

I have a slightly preference for using an existing field, unless there's a reason we need to match on this new one + the existing field we're planning to use. A new field requires pushing new Balrog code, and we've already missed the train for this week's scheduled push. We can do an unscheduled one if necessary - just let me know. If we do use a new field, I assume we'll implement %CUSTOM% as we talked about in comment #11?

And just to be 100% clear: we're talking about using a single value that represents all 3 of the parts from comment #21 - right? A new field doesn't help if we still need match each of those individually.
Flags: needinfo?(bhearsum)
(In reply to Ben Hearsum (:bhearsum) from comment #45)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #43)
> > Ben, if you prefer the add-on can just replace an existing field as the
> > websense add-on does. Would you be ok with a new field or an existing field?
> 
> I have a slightly preference for using an existing field, unless there's a
> reason we need to match on this new one + the existing field we're planning
> to use. A new field requires pushing new Balrog code, and we've already
> missed the train for this week's scheduled push. We can do an unscheduled
> one if necessary - just let me know. If we do use a new field, I assume
> we'll implement %CUSTOM% as we talked about in comment #11?
Not sure at this time what it will be called since this should use default prefs and CUSTOM can use user prefs. Won't be added as part of this bug.

> 
> And just to be 100% clear: we're talking about using a single value that
> represents all 3 of the parts from comment #21 - right? A new field doesn't
> help if we still need match each of those individually.
Yes, a single value that gives a yes/no answer as to whether it matches the conditions. We'll go with the same field used by the websense add-on.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #46)
> (In reply to Ben Hearsum (:bhearsum) from comment #45)
> > And just to be 100% clear: we're talking about using a single value that
> > represents all 3 of the parts from comment #21 - right? A new field doesn't
> > help if we still need match each of those individually.
> Yes, a single value that gives a yes/no answer as to whether it matches the
> conditions. We'll go with the same field used by the websense add-on.

WFM!
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #42)
> ...
> 
> Sound ok?

Love it.
Flags: needinfo?(milan)
So the idea would be that there's a new %CUSTOM% field in the update ping, to be updated at any given time by a system add-on tailored to a current need? I.e. we would be changing this over time as the need arises?

That sounds good to me. My suggestion is that instead of asking the system add-on to directly manipulate app.update.url, we should use another pref, like "app.update.custom". Then UpdateUtils.formatUpdateURL would read the value of that pref (from the default branch) and replace %CUSTOM% with it.

That will be less error prone and also make it easier to have a default value if custom is blank.


My only question is if the specific request on this bug ("include information on whether system is at risk of bug 1296630") is meant to be temporary or permanent. If it's temporary (i.e., we expect to find a fix for bug 1296630) then this is fine.
Flags: needinfo?(felipc)
I am planning on going with possible values of

(noBug1296630v1) - one of the conditions doesn't match

(yesBug1296630v1) - all of the conditions match

(errBug1296630v1) - one or more of the checks had an error and the remaining conditions (if any) matched

(unkBug1296630v1) - should never happen... I just initialized the variable for the value to this.

By specifying the bug number along with v1 (for version) it should be relatively simple to update this by pushing an update to this system add-on with a new value and detect if the client is using this version or an updated version.

If the client removes this add-on it should be relatively simple to determine it by checking the Firefox version that this add-on was included with.
Posted patch untested patch (obsolete) — Splinter Review
I'll request review after I check a few things
Attachment #8803580 - Attachment is obsolete: true
Posted patch patch rev1 (obsolete) — Splinter Review
Successfully tested this locally for all 3 conditions
Attachment #8805494 - Attachment is obsolete: true
Attachment #8805621 - Flags: review?(felipc)
Comment on attachment 8805621 [details] [diff] [review]
patch rev1

Review of attachment 8805621 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/extensions/aushelper/bootstrap.js
@@ +15,5 @@
> +  if (Services.appinfo.OS != "WINNT") {
> +    return;
> +  }
> +
> +  const regCPUPath = "HARDWARE\\DESCRIPTION\\System\\CentralProcessor\\0";

Could this key not exist for some reason? Would it be worth also wrapping this first block in a try catch?
Attachment #8805621 - Flags: review?(felipc) → review+
Posted patch patch rev2 (obsolete) — Splinter Review
Attachment #8805621 - Attachment is obsolete: true
Attachment #8805634 - Flags: review?(felipc)
Posted patch patch rev3Splinter Review
Actually, falling through is probably better for aus
Attachment #8805634 - Attachment is obsolete: true
Attachment #8805634 - Flags: review?(felipc)
Attachment #8805638 - Flags: review?(felipc)
Attachment #8805638 - Flags: review?(felipc) → review+
Clearing needinfo for the second question in comment #40 since time is short to get any additional checks in this patch.
Flags: needinfo?(mcastelluccio)

Comment 57

3 years ago
Pushed by rstrong@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/33c2241696c0
include information on whether system is at risk of bug 1296630 in the update ping. r=felipc
Ben / Rail, this will be in nightly builds shortly and will soon be in aurora and beta builds if the uplift request is approved.
Flags: needinfo?(rail)
Flags: needinfo?(bhearsum)
The solution looks fine to me!
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #58)
> Ben / Rail, this will be in nightly builds shortly and will soon be in
> aurora and beta builds if the uplift request is approved.

Thanks for the heads up. There won't be any code changes needed to Balrog to support this, so we should be good to go whenever we're ready to start blocking updates for these people.
Flags: needinfo?(bhearsum)
Ben, is there a bug on extending balrog to handle fields similar to the following as was proposed in this bug?
0x1e (GenuineIntel) (Intel64 Family 6 Model 60 Stepping 3)

Since I doubt the success rate of pushing system add-ons will ever be 100% it would be better if we could put the data in the url and have balrog make the decision. That will give us significantly more flexibility for bugs like this. Also, Bug 1312270 is an example of where this would come in handy.

I know it adds complexity to balrog but at least that can be 100% managed vs. having the complexity in Firefox which we will require pushing out code.

If there isn't a bug, could you file one?
Flags: needinfo?(bhearsum)
Comment on attachment 8805638 [details] [diff] [review]
patch rev3

Approval Request Comment
[Feature/regressing bug #]: This is a mitigation for bug 1296630
[User impact if declined]: Systems that are impacted by bug 1296630 will receive updates and crash
[Describe test coverage new/current, TreeHerder]: Manually tested.
[Risks and why]: Minimal in that if this code fails we should be where we were before this patch landed.
[String/UUID change made/needed]: None
Attachment #8805638 - Flags: approval-mozilla-beta?
Attachment #8805638 - Flags: approval-mozilla-aurora?
Hi Sylvestre, Julien, given that this change hasn't landed in Nightly yet (still inbound), I'd prefer to let this stabilize on Nightly and Aurora over the weekend. Could you please approve this request your Monday morning? Thanks a lot!
Flags: needinfo?(sledru)
Flags: needinfo?(jcristau)
Hi Robert, can you please comment on this bug with confirmation if you see this code working on Nightly/Aurora channels over the weekend? I would prefer to uplift a change that is stable *and* has been verified to detect the kind of information we are looking for from the update ping. I am assuming this verification can be done on Aurora/Nightly via update pings. Right?
Flags: needinfo?(robert.strong.bugs)
Not a problem whatsoever. Thanks!
Flags: needinfo?(robert.strong.bugs)
Comment on attachment 8805638 [details] [diff] [review]
patch rev3

Let's uplift to Aurora51 so it gets stabilized over the weekend. We plan to uplift to Beta50 Monday Paris morning if all goes well.
Attachment #8805638 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
[Tracking Requested - why for this release]: requesting tracking since this is a late in the cycle change to mitigate updating users to a build that would crash for them.

Pushed to mozilla-aurora
https://hg.mozilla.org/releases/mozilla-aurora/rev/0c44c99f7b57f8827429eb3634c406bfea2115a9
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #58)
> Ben / Rail, this will be in nightly builds shortly and will soon be in
> aurora and beta builds if the uplift request is approved.

Thanks for the heads up. I see it's already on aurora and we should get it in tomorrow's nightlies! \o/

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #50)
> I am planning on going with possible values of
> 
> (noBug1296630v1) - one of the conditions doesn't match 
> (yesBug1296630v1) - all of the conditions match
> (errBug1296630v1) - one or more of the checks had an error and the remaining
> conditions (if any) matched
> (unkBug1296630v1) - should never happen... I just initialized the variable
> for the value to this.

Sorry, I haven't been following the bug closely, do we need to tweak Balrog to handle these?
Flags: needinfo?(rail) → needinfo?(robert.strong.bugs)
Not sure if / when it will be needed... it is as I understand it precautionary. milan and dbaron would know more.

With that in mind, balrog would need a similar tweak to detect this as was done for websense. I'm not sure what would be done for the case where it wasn't possible to read the values which would be "(errBug1296630v1)" and will leave that to milan and dbaron. I hope that they should just be updated since it should be possible to read the values for the affected systems. They should be small in number as long as the registry checks that were asked for exist on our supported platforms.
Flags: needinfo?(robert.strong.bugs)
Thanks, makes sense.
Looks like xperf is complaining about this.
https://treeherder.mozilla.org/logviewer.html#?job_id=4017004&repo=mozilla-aurora
Flags: needinfo?(robert.strong.bugs)

Comment 74

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/33c2241696c0
https://hg.mozilla.org/mozilla-central/rev/2ac285d70ab2
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Make sure both patches get uplifted to Beta once it's approved to do so.
Flags: needinfo?(robert.strong.bugs)
Flags: needinfo?(felipc)
I made a very slight change to the logic in the original patch to handle the case when the value is larger or smaller than what it should be. This shouldn't happen but with over a 100 million systems I wouldn't be surprised if we see this in the wild so I'd prefer to handle this edge case.

I've also included the xperf_whitelist.json change that Ryan pushed in this patch.
Attachment #8805888 - Flags: review+
Comment on attachment 8805888 [details] [diff] [review]
patch for beta

Approval Request Comment
[Feature/regressing bug #]: This is a mitigation for bug 1296630
[User impact if declined]: Systems that are impacted by bug 1296630 will receive updates and crash
[Describe test coverage new/current, TreeHerder]: Manually tested.
[Risks and why]: Minimal in that if this code fails we should be where we were before this patch landed.
[String/UUID change made/needed]: None
Attachment #8805888 - Flags: approval-mozilla-beta?
I've tested the patch using the nightly build as well as the updated patch and it worked as expected. I've also tested the cases where it meets the conditions for Bug 1296630. This should be good to go. Thanks!
Comment on attachment 8805888 [details] [diff] [review]
patch for beta

Just like Ritu asked :)
Should be in the first RC.
Flags: needinfo?(sledru)
Flags: needinfo?(jcristau)
Attachment #8805888 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #61)
> Ben, is there a bug on extending balrog to handle fields similar to the
> following as was proposed in this bug?
> 0x1e (GenuineIntel) (Intel64 Family 6 Model 60 Stepping 3)
> 
> Since I doubt the success rate of pushing system add-ons will ever be 100%
> it would be better if we could put the data in the url and have balrog make
> the decision. That will give us significantly more flexibility for bugs like
> this. Also, Bug 1312270 is an example of where this would come in handy.
> 
> I know it adds complexity to balrog but at least that can be 100% managed
> vs. having the complexity in Firefox which we will require pushing out code.
> 
> If there isn't a bug, could you file one?

https://bugzilla.mozilla.org/show_bug.cgi?id=1312499

(In reply to Rail Aliiev [:rail] ⌚️ET from comment #68)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #50)
> > I am planning on going with possible values of
> > 
> > (noBug1296630v1) - one of the conditions doesn't match 
> > (yesBug1296630v1) - all of the conditions match
> > (errBug1296630v1) - one or more of the checks had an error and the remaining
> > conditions (if any) matched
> > (unkBug1296630v1) - should never happen... I just initialized the variable
> > for the value to this.
> 
> Sorry, I haven't been following the bug closely, do we need to tweak Balrog
> to handle these?

I assume we want to block users matching yesBug1296630v1 at some point. However, this should be filed as a separate bug and probably run through release-drivers, too.
Flags: needinfo?(bhearsum)
Attachment #8805887 - Flags: review?(felipc) → review+

Comment 84

3 years ago
Pushed by rstrong@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/751ba58e708f
Followup to Bug 1311515 - include information on whether system is at risk of bug 1296630 in the update ping. r=felipc
Comment on attachment 8805887 [details] [diff] [review]
followup patch for nightly and aurora

See comment #78

Approval Request Comment
[Feature/regressing bug #]: This is a followup to the mitigation for bug 1296630
[User impact if declined]: Aurora won't handle the additional failure case that is handled will soon be handled on Nightly and already landed for Beta
[Describe test coverage new/current, TreeHerder]: Manually tested.
[Risks and why]: Minimal in that if this code fails we should be where we were before this patch landed.
[String/UUID change made/needed]: None
Attachment #8805887 - Flags: approval-mozilla-aurora?
milan and dbaron, the code is in place for Firefox 50 to add the value to the url if it meets the conditions you've provided. If you need to prevent updating for the systems please file a Release Engineering bug and bhearsum or rail should be able to help you out. Thanks!
Sounds great. Thanks.
Comment on attachment 8805887 [details] [diff] [review]
followup patch for nightly and aurora

Fix a failure case. Take it in 51 aurora.
Attachment #8805887 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 91

3 years ago
Hi all. Read today about this article. I want to suggest to you not to use some pre-defined values in update urls, because it need to be supported like "forever". May be better to extend number of parameters, which send to balrog?
Depends on: 1315213
Hi guys, at the channel meeting today, Marco mentioned that there is a crash spike in a family of crash signature that seems correlated to AMD CPUs. One mitigation is to rebuild Firefox and push to release to prevent this crash signature. As such, we do not need another dot release on 50.1.0 as there are no release blockers. 

This bug specifically addresses a class of bugs where we want to stop specific (Intel CPU family) end-users from updating to a recent Firefox build. Can we do the same on 50.1.0 to avoid this crash on AMD CPU end-users by preventing them from updating to 50.1.0? Can that be done via a system add-on update? Thanks!

* https://crash-stats.mozilla.com/signature/?signature=js%3A%3Ajit%3A%3AIonBuilder%3A%3AinspectOpcode&date=%3E%3D2016-12-13T16%3A33%3A00.000Z&date=%3C2016-12-20T16%3A33%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#summary
** Signature correlated with AMD CPUs (the infamous AMD CPU bug)
Flags: needinfo?(robert.strong.bugs)
Flags: needinfo?(n.nethercote)
Flags: needinfo?(mcastelluccio)
Bug 1287087 has a list of the signatures. The CPU is AuthenticAMD family 20 model 1 or AuthenticAMD family 20 model 2.
Flags: needinfo?(mcastelluccio)
This bug won't cover anything but the criteria provided and specifically AMD cpu's.

To get a general fix first balrog will need to be fixed to be able to handle the general CPU information which is bug 1312499.

If you want to do a similar thing for AMD cpu's as was done for Intel cpu's in this bug a new bug with the exact checks that are needed would need to be filed.
Flags: needinfo?(robert.strong.bugs)
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #95)
> This bug won't cover anything but the criteria provided and specifically AMD
> cpu's.
s/AMD/Intel/ cpu's

> 
> To get a general fix first balrog will need to be fixed to be able to handle
> the general CPU information which is bug 1312499.
Note: a general fix should seldom need to be updated when crashes like the AMD bug happen. First balrog would need the capability and then we would add the general info to the update url.

> If you want to do a similar thing for AMD cpu's as was done for Intel cpu's
> in this bug a new bug with the exact checks that are needed would need to be
> filed.
Thanks RStrong. I commented in the other bug about the need for such a capability in balrog. I may not understand the entire scope of this change and possible impact of leaving some users on older versions (and unwanted orphaning side effects). But it seems like a valid crash mitigation strategy for some release users.
This bug is way outside my areas of expertise and I don't have anything useful to add :)
Flags: needinfo?(n.nethercote)
You need to log in before you can comment on or make changes to this bug.