Closed Bug 1210163 Opened 9 years ago Closed 5 years ago

API read-only access to AUS for tracking release throttle changes.

Categories

(Cloud Services :: Operations: Metrics/Monitoring, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INACTIVE

People

(Reporter: bobm, Unassigned)

References

Details

To whom it may concern:

The Cloud Services Operations team is requesting read-only API access to the AUS system for the purpose of tracking release throttle changes over time.  Overlaying these changes as events on our real time monitoring will help us distinguish anomalies from the normal traffic swings that result from Fx releases.  Also, we think it'll look pretty cool.
IIRC, any LDAP based account can access Balrog read-only. The routing/firewall may be tricky. You may need to use VPN with proper LDAP bits set.
(In reply to Rail Aliiev [:rail] from comment #1)
> IIRC, any LDAP based account can access Balrog read-only. 
Since we want to run this from a script, we'll probably need a dedicated LDAP account for grabbing these metrics.  Should we file another bug to have that user created?

However, perhaps we're going about this the wrong way.  Connecting periodically to Balrog to poll the throttle rate through the VPN with a dedicated LDAP account seems a bit heavy handed to me.  It might be easier for all involved to just submit a statsd gauge metric to our SAAS provider via their API only when it changes?  We could craft a bash script to accomplish that, if the outbound access is available.
... or we can publish events to pulse. This way multiple consumers can use the events.
(In reply to Rail Aliiev [:rail] from comment #3)
> ... or we can publish events to pulse. This way multiple consumers can use
> the events.

How are you proposing that we get events out of Pulse?
Depends on: 1213037
(In reply to Rail Aliiev [:rail] from comment #1)
> IIRC, any LDAP based account can access Balrog read-only. The
> routing/firewall may be tricky. You may need to use VPN with proper LDAP
> bits set.

We now have an LDAP account with all the appropriate access.  How do we actually query the current release channels for the release throttle?
You can query the rules by their IDs, for example, https://aus4-admin.mozilla.org/api/rules/8. We'll need to come up with a set of rule IDs to be checked.

There is also https://aus4-admin.mozilla.org/api/rules where you can see all rules.
(In reply to Rail Aliiev [:rail] from comment #6)
> You can query the rules by their IDs, for example,
> https://aus4-admin.mozilla.org/api/rules/8. We'll need to come up with a set
> of rule IDs to be checked.
> 
> There is also https://aus4-admin.mozilla.org/api/rules where you can see all
> rules.

Cool.  Is "backgroundRate" the throttle setting?  Is it typically a value of 0 to 100?  

We're mostly concerned about general audience.  Adding throttle rates for aurora and nightly release would be interesting, but not strictly essential for us to track.  I think what we need would be:

General Audience / Fx Release - rule_id: 145
Aurora - rule_id: 10 (or maybe 16?)
Nightly - rule_id: 3 

And anything else that is routinely throttled at release and might be interesting to track.
(In reply to Bob Micheletto [:bobm] from comment #7)
> (In reply to Rail Aliiev [:rail] from comment #6)
> > You can query the rules by their IDs, for example,
> > https://aus4-admin.mozilla.org/api/rules/8. We'll need to come up with a set
> > of rule IDs to be checked.
> > 
> > There is also https://aus4-admin.mozilla.org/api/rules where you can see all
> > rules.
> 
> Cool.  Is "backgroundRate" the throttle setting?  Is it typically a value of
> 0 to 100?  

Correct.

 
> We're mostly concerned about general audience.  Adding throttle rates for
> aurora and nightly release would be interesting, but not strictly essential
> for us to track.  I think what we need would be:
> 
> General Audience / Fx Release - rule_id: 145

Correct.

FTR, the rule ID is used by automation and specified in http://hg.mozilla.org/build/buildbot-configs/file/26bf785340a6/mozilla/release-firefox-mozilla-release.py.template#l121


> Aurora - rule_id: 10 (or maybe 16?)

It's 10. 16 is FirefoxMetro, which is history now, but we still keep the product around.


> Nightly - rule_id: 3 

Correct.

> And anything else that is routinely throttled at release and might be
> interesting to track.

There are a couple of patterns:

1) Release:
  * We usually release with background update rate set to 25% (Tuesday)
  * A day later we set it to 0% (Wednesday)
  * A couple day later we check the crash rates, and if everything look good we change the rate to 100% (on Friday)

2) Aurora
  * We "freeze" the updates a day before a release (Mon, merge day). This is done by changing rule's "mapping". Instead of pointing to Firefox-mozilla-aurora-nightly-latest, we change it to keep users on a particular build Firefox-mozilla-aurora-nightly-$BUILDID.
 * 4 days later (Fri), upon QE's approval we change the mapping back to Firefox-mozilla-aurora-nightly-latest
 * We don't usually manipulate the background update rate on aurora

In some cases we create custom rules. For example, when we released 40.0.1, we had 2 rules, one regular (id 145), with "What's New Page" enabled, and an additional one for 40.0 users only without that page enabled.

BTW, there is a special mapping, that we rarely use for releases. If we need to completely turn off updates (including background and active checks (from Help-About Firefox)), we change the mapping to point to "No-Update". You can see it for some desupported versions/platforms in https://aus4-admin.mozilla.org/api/rules.
:mostlygeek this sounds like something we can do now.  How much work would a feature like this take to get some graph annotations to this effect?
Component: Other → Operations: Metrics/Monitoring
Flags: needinfo?(bwong)
Product: Release Engineering → Cloud Services
QA Contact: mshal
Version: unspecified → other
With the right information I bet it'll take a day or two to dig/build/test.
Flags: needinfo?(bwong)
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.