Closed
Bug 1369498
Opened 8 years ago
Closed 6 years ago
[imminent] drop all things ADI
Categories
(Socorro :: General, task, P2)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: willkg, Assigned: willkg)
References
Details
Attachments
(1 file)
76.40 KB,
image/png
|
Details |
ADI (active daily installations) is a number that is pushed into Socorro postgres tables from a rooted xbox 360 that runs in a disused lavatory at the bottom of the stairs of a hidden base somewhere in the northern hemisphere.
Socorro uses this number to normalize the number of crashes for top crasher reports and probably some other things[1].
Other things at Mozilla normalize by usage kilohours which comes from Telemetry, so Socorro numbers don't line up with other numbers because the normalization is done differently. If we need normalization, maybe we can switch to that?
This tracker bug covers removing all things ADI from Socorro and either removing those things or replacing with better things as appropriate.
[1] I think I heard once that Lonnen likes to print these numbers out on thermal paper and use them as streamers at Mardi Gras in New Orleans every year.
Assignee | ||
Comment 1•8 years ago
|
||
First, I want to figure out how hard it'd be to get usage kilohours data into Socorro. Then we know what kind of option switching from ADI to usage kilohours will be.
Second, I want to make a list of all the things in Socorro that currently uses ADI. Then for each thing, we figure out whether we have to keep it or not, and if we do, can we switch to usage kilohours.
Feel free to chime in on things in Socorro that use ADI in the comments while I figure out the usage kilohours thing.
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Comment 2•8 years ago
|
||
To get the kilohours, you could create a Redash query and read its results from Socorro as JSON.
One problem I see with this is that khours are not yet as timely as ADI. Another problem is that many external tools that we currently have are dependent on ADI, so we can't drop them completely from Socorro unless we migrate them.
Assignee | ||
Comment 3•8 years ago
|
||
Marco: When you say "are dependent on ADI" are you saying that your external tools are pulling ADI data from Socorro? Or are they relying on reports and other things that Socorro generates?
Can you make a list of those external tools or tell me where to find them all?
Comment 4•8 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #3)
> Marco: When you say "are dependent on ADI" are you saying that your external
> tools are pulling ADI data from Socorro? Or are they relying on reports and
> other things that Socorro generates?
>
> Can you make a list of those external tools or tell me where to find them
> all?
IIRC, they are only pulling ADI from Socorro and not relying on reports and other things that Socorro generates.
I'll try to collect a list.
Flags: needinfo?(mcastelluccio)
Comment 5•8 years ago
|
||
the https://www.arewestableyet.com/ dashboard combines the two measurements (on the left ADI based - on the right kusage_hours). as things stand there at the moment, dropping the ADI based normalisation would introduce two more days of lag before you get a semi-reliable crash rate and would be able to spot and react to unexpected events, which i wouldn't look forward to.
Assignee | ||
Comment 6•8 years ago
|
||
Philip: Why two days? According to that site, it looks like one day of additional wait.
All: If ADI is required and it's just getting pulled from Socorro, is it possible to get it from another source?
Comment 7•8 years ago
|
||
if you look at the beta m+c-s graph for example, the values for 2017-05-31 & 30 are still way off and 2017-05-29 is the most recent somewhat reliable data point (somewhat, because in my experience that rate will continue to get slightly lower when you look again in a couple of days).
with ADIs on the other hand i'd already have a rate from yesterday that i can reliably compare with other channels or prior events and draw conclusions from (knowing its limitations like not covering things like content shutdownkills accurately, etc...)
Comment 8•8 years ago
|
||
Comment 9•8 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #1)
> First, I want to figure out how hard it'd be to get usage kilohours data
> into Socorro. Then we know what kind of option switching from ADI to usage
> kilohours will be.
>
> Second, I want to make a list of all the things in Socorro that currently
> uses ADI. Then for each thing, we figure out whether we have to keep it or
> not, and if we do, can we switch to usage kilohours.
>
Definitely here: https://github.com/mozilla-services/socorro/blob/340d9a43f896a6cfe07f810fc6beab7099ffbc3a/webapp-django/crashstats/crashstats/views.py#L179-L186
Which gets used by this view https://github.com/mozilla-services/socorro/blob/340d9a43f896a6cfe07f810fc6beab7099ffbc3a/webapp-django/crashstats/crashstats/views.py#L415 which results in https://crash-stats.mozilla.com/crashes-per-day/?p=Firefox
And here: https://github.com/mozilla-services/socorro/blob/340d9a43f896a6cfe07f810fc6beab7099ffbc3a/webapp-django/crashstats/home/static/home/js/home.js#L165-L177
which comes from https://github.com/mozilla-services/socorro/blob/340d9a43f896a6cfe07f810fc6beab7099ffbc3a/webapp-django/crashstats/home/jinja2/home/home.html#L49 which fetches from https://crash-stats.mozilla.com/api/#ADI to make https://crash-stats.mozilla.com/home/product/Firefox
Comment 10•8 years ago
|
||
We have https://github.com/mozilla/stab-crashes, http://arewestableyet.com/ and the explosiveness reports such as https://crash-analysis.mozilla.com/release-mgmt/2017-05-29/2017-05-29.firefox.53.explosiveness.html that are pulling ADI from Socorro.
There's also https://github.com/mozilla/platform-health.
Flags: needinfo?(mcastelluccio)
Comment 11•8 years ago
|
||
For what it's worth, platform-health gets its ADI numbers from https://crash-analysis.mozilla.com/release-mgmt/Firefox-release-bytype.json and https://crash-analysis.mozilla.com/rkaiser/FennecAndroid-release-bytype.json
I think it used to query the Socorro ADI endpoint but doesn't any more. https://github.com/mozilla/platform-health/blob/13800528b1d45be8c5a2d6b2f671b96fb5b7a216/src/crashes.js#L59-L63
Comment 12•8 years ago
|
||
Assignee | ||
Comment 13•7 years ago
|
||
I appreciate all the comments in this bug. Thank you!
I asked around and everything I'm seeing suggests the age of ADI is coming to an end. Magic is beginning to fade, Elves are boarding ships for lands beyond the horizon, and dwarves are delving deeper into the earth. The age of usage kilohours is almost at hand.
One of the big issues with usage kilohours is that it takes a few days for it to become concrete enough to use. As I understand it, this problem should be mostly/fully solved by the end of this quarter give or take a couple of weeks. Usage kilohours is being used exclusively in Mission Control (I don't have a good link to that project) and new dashboards elsewhere.
So, what's with the urgency? In our ongoing efforts to make Socorro easier to maintain and more like other Mozilla systems, Socorro is being moved from one infrastructure to a new one managed by the webops team. There is a much-neglected box filled with venomous spiders that sits in SCL3 which feeds Socorro ADI data. This box isn't maintained and doesn't work with the new infrastructure. Most of us don't even have access to it (and as near as I can tell, no one knows who has access to give us access). Bug #1365665 covers figuring out what to do about that box in the context of switching infrastructures. It sure would be a lot easier if Socorro wasn't involved in ADI at all.
So summary: The age of ADI is coming to an end and the big issue with usage kilohours should be alleviated/fixed by the end of the quarter and it behooves Socorro to be able to drop that weirdo box in SCL3 so we can continue forward with the infrastructure migration.
I'm putting this bug on hold until we find out what happens with usage kilohours. I'll pick it back up in July at which point we can see where usage kilohours is at, come back to this project, and figure out what the plan is.
Summary: [tracker] drop all things ADI → [tracker] drop all things ADI [on hold till July]
Comment 14•7 years ago
|
||
I am still relying on ADI to give me a quick (daily) idea of how quickly users are updating. The data is available the next morning while telmetry takes several days. It is extremely useful to me. Is there some other way we can see this data?
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 15•7 years ago
|
||
Updating the summary to reflect the status of this work.
Tagging Lonnen in since he knows more about the status of usage kilohours.
Flags: needinfo?(chris.lonnen)
Summary: [tracker] drop all things ADI [on hold till July] → [tracker] drop all things ADI [on hold till September]
Comment 16•7 years ago
|
||
I'm still figuring out a transition path here. Mission control is progressing but not ready or available as a replacement.
Flags: needinfo?(chris.lonnen)
Comment 17•7 years ago
|
||
We will make plans, but don't want to disrupt critical workflows during the 57 push.
Summary: [tracker] drop all things ADI [on hold till September] → [tracker] drop all things ADI [on hold till November]
Updated•7 years ago
|
Summary: [tracker] drop all things ADI [on hold till November] → [tracker] drop all things ADI
Assignee | ||
Comment 18•7 years ago
|
||
Unassigning myself from bugs I'm not immediately working on and/or have some meaningful progress on.
Assignee: willkg → nobody
Status: ASSIGNED → NEW
Comment 19•7 years ago
|
||
We can proceed on May 28. That should be enough time to conclude the world before EOQ and late enough for Mission Control to close out the remaining blockers (https://github.com/mozilla/missioncontrol/milestone/1) and even have a few weeks of testing.
Summary: [tracker] drop all things ADI → [may 28][tracker] drop all things ADI
Comment 20•6 years ago
|
||
The time is nearly here. Mission Control has taken over as the source of truth for rate-related queries. I've opened: https://github.com/mozilla-services/socorro/pull/4512
There will be follow up actions, including a migration and killing the old box in SCL3.
Summary: [may 28][tracker] drop all things ADI → [imminent][tracker] drop all things ADI
Comment 21•6 years ago
|
||
To be clear -- this will remove ADI from crash stats and our API. Dataviz and moz data collective will still have the number.
Assignee | ||
Comment 22•6 years ago
|
||
We're going to do the removal work in this bug rather than spin off other bugs for it.
Making this a P2 and grabbing it. Lonnen did the bulk of the work already--I'll finish it up.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: -- → P2
Summary: [imminent][tracker] drop all things ADI → [imminent] drop all things ADI
Comment 23•6 years ago
|
||
Commits pushed to master at https://github.com/mozilla-services/socorro
https://github.com/mozilla-services/socorro/commit/6c7f84f38608e7e2d9c959f6116d3ea35606fbfb
bug 1369498: remove ADI code - WIP
It does not (yet) have the necessary migrations.
follow up actions:
* remove or shutdown the VM running hive externally
https://github.com/mozilla-services/socorro/commit/3e9d5f9cf1c28619422cb012e532e776c4cc8b99
fix bug 1369498: remove adi-related tables and stored procedures
https://github.com/mozilla-services/socorro/commit/1d43d90fea6de28e21bdaf63b704c010cc77ee9e
Merge pull request #4512 from mozilla-services/1369498-remove-ADI-from-public-ADI
fix bug 1369498, 1477038: remove ADI
Updated•6 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 24•6 years ago
|
||
Reopening this to cover the work we need to do until this is in prod.
Current status:
1. Code has landed and is on -stage.
2. crontabber is broken. The infra PR has landed and I'm waiting on a stage deploy. Once that happens, I'll check crontabber again.
3. We need to let this code bake on -stage for at least a day or two and I want to go through and make sure everything is working.
Once all that happens, then we can push this to prod. Once we're happy with that, then we can remove the old stage and prod databases and clean up the box in SCL3 and all that.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 25•6 years ago
|
||
This is all done now. Marking as FIXED.
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•