Closed
Bug 1257531
Opened 9 years ago
Closed 6 years ago
[tracker] Stop saving crash data to postgresql
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: adrian, Assigned: willkg)
References
Details
Now that bug 1252461 has landed, we can change the configuration to stop saving the processed crashes in postgres.
Reporter | ||
Comment 1•9 years ago
|
||
Configuration change:
> [centos@ip-172-31-10-19 ~]$ consulate kv get socorro/processor/destination.storage_classes
> socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage, socorro.external.es.crashstorage.ESCrashStorage, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage
> [centos@ip-172-31-10-19 ~]$ consulate kv set socorro/processor/destination.storage_classes 'socorro.external.postgresql.crashstorage.PostgreSQLBasicCrashStorage, socorro.external.es.crashstorage.ESCrashStorage, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage'
r? anyone
Comment 2•9 years ago
|
||
this is not complete.
while this will set up the defaults for the destination PolyCrashStore correctly, there is another key that will override it back to the original value. In each of the separate numbered crashstores within the PolyCrashStore, we override the default with "socorro.external.statsd.statsd_base.StatsdBenchmarkingWrapper". Then we give the original intended crashstorage class to that class.
Look for:
destination.storage0.wrapped_crashstore
its current value is:
socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage
change it to:
socorro.external.postgresql.crashstorage.PostgreSQLBasicCrashStorage
If I were to start over with the configuration, and knowing that the benchmarking wrappers are permanent rather than temporary, I would have set the key "socorro/processor/destination.storage_classes" to be use the "socorro.external.statsd.statsd_base.StatsdBenchmarkingWrapper" 3 times. That way there would be less ambiguity that lead you to your proposal.
Reporter | ||
Comment 3•9 years ago
|
||
Thanks Lars! We clarified on IRC, the key is actually `wrapped_object_class` instead of `wrapped_crashstore` (which is legacy).
Verification:
> [centos@ip-172-31-10-19 ~]$ consulate kv get socorro/processor/destination.storage0.wrapped_object_class
> socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage
New proposal:
> $ consulate kv set socorro/processor/destination.storage_classes 'socorro.external.postgresql.crashstorage.PostgreSQLBasicCrashStorage, socorro.external.es.crashstorage.ESCrashStorage, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage'
> $ consulate kv set socorro/processor/destination.storage0.wrapped_object_class 'socorro.external.postgresql.crashstorage.PostgreSQLBasicCrashStorage'
Reporter | ||
Comment 4•9 years ago
|
||
I have just applied those changes to stage. I'm watching datadog and logs for the next 30 minutes, after what I'll be gone for the weekend. If things go wrong, here is the config rollback:
consulate kv set socorro/processor/destination.storage_classes 'socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage, socorro.external.es.crashstorage.ESCrashStorage, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage'
consulate kv set socorro/processor/destination.storage0.wrapped_object_class 'socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage'
Comment 5•9 years ago
|
||
Red Alert. We need to undo this. Turns out that the current correlations scripts, which we run on crash-analysis (the old dbaron scripts) RELY on processed_crashes (joined with reports). See
https://github.com/mozilla/socorro/blob/f9a84eb267c457c04adbec7891b7e4039f2da857/scripts/crons/cron_libraries.sh#L56
https://github.com/mozilla/socorro/blob/f9a84eb267c457c04adbec7891b7e4039f2da857/scripts/crons/cron_libraries.sh#L81
Basically, it does a join on `processed_crashes` AND `reports_$DATE`.
To quote Lars (we noticed this today whilst vidyo'ing), "That's what happens when you have too many moving parts."
I think we can leave the source code as is. Just need to make sure our config is set to to continue to write processed crashes to PG.
Comment 6•9 years ago
|
||
Once we've rewritten how correlations are aggregated and accumulated we should ideally not use PG at all. At all! (since we can get product versions from /api/ProductVersions/).
Lars first unfinished prototype of correlations_app.py (which I'm taking over slowly) relies on PG (for UUIDs) and S3 (for processed crash JSON blobs). To start with, we'll get UUIDs out of ES (SuperSearch maybe). Maybe we'll use ES for the processed crashes too and not even involve S3.
Reporter | ||
Comment 7•9 years ago
|
||
I have rolled back the configuration on stage, applying the commands I wrote in comment #4.
Reporter | ||
Comment 8•8 years ago
|
||
Peter, what's the status here? Do we still rely on the old correlation scripts?
Comment 9•8 years ago
|
||
(In reply to Adrian Gaudebert [:adrian] from comment #8)
> Peter, what's the status here? Do we still rely on the old correlation
> scripts?
I don't know. I'm certainly not working on it. For the moment we're still relying on pulling down the correlations from crash-analysis with http so they're not even stored in Postgres.
Comment 10•8 years ago
|
||
If cron_libraries.sh is still running on crash-analysis, then we still rely on this. There are three lines in there that rely on direct postgres access. Replace those with ES backed API calls, update crash-analysis, and we can go ahead with this.
Comment 11•8 years ago
|
||
Shouldn't be possible to replace with some python and requests.get()
https://github.com/mozilla/socorro/blob/394423bbd7da9445b341ef72be60ed602f6367e8/scripts/crons/cron_libraries.sh#L56
Updated•8 years ago
|
Component: Infra → General
Assignee | ||
Comment 12•8 years ago
|
||
What's the status of this bug? Do we still use the old correlation scripts mentioned in comment #6?
Comment 13•8 years ago
|
||
It's messy.
1) https://github.com/mozilla/socorro/blob/master/scripts/crons/cron_libraries.sh is still in git. It still depends on talking directly to postgres to query the `processed_crashes` table.
2) The cron job on crash-analysis is still going. Nobody updates as far as I know. And they shouldn't have a reason to either now since figuring out what the featured versions are is automated. Correlations are still generated daily. E.g. https://crash-analysis.mozilla.com/crash_analysis/20170427/
3) The webapp does NOT depend on crash-analysis.m.c. We used to pull on data from crash-analysis.m.o by requests.get() but we longer do. (:marco replaced this will a query to telemetry)
4) If we re-do the configuration so that our crashstorage stops saving processed crashes in PG, then the output on crash-analysis.m.o will cease. The socorro team certainly "doesn't care" since we neither use or depend on it for anything else in socorro. But other stability people might care.
Rough outline of a plan
-----------------------
1. Ask once and twice the stability list if people are dependent of the output on https://crash-analysis.mozilla.com/crash_analysis/*
2. If someone says they are; Ask :marco how we can guide these people away from it and start relying on Telemetry.
3. If nobody says they are;
3.1. make the change (again) mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1257531#c4 (and on prod too)
3.2 remove cron_libraries.sh from our code
3.3 talk to calixte if they still need the crash-analysis server at all and if we can recyle it
:marco, :calixte Do you know of anybody using the old crash-analysis.m.c?
Adrian, did I get this right?
Who wants to do the comms to find out if we can shut this down?
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(cdenizet)
Comment 14•8 years ago
|
||
:peterbe, I'm currently removing the stuff we've on c-a.m.c.
I think Marcia continues to use https://crash-analysis.mozilla.com/release-mgmt/0000.overview.html (Latest Daily Reports) but I'm rewriting this stuff and will put it on Heroku.
I'll ping you when everything will be finished (I hope very soonly).
I already filled a bug for that purpose: https://bugzilla.mozilla.org/show_bug.cgi?id=1315007
Flags: needinfo?(cdenizet)
Comment 15•8 years ago
|
||
(In reply to Calixte Denizet (:calixte) from comment #14)
> :peterbe, I'm currently removing the stuff we've on c-a.m.c.
> I think Marcia continues to use
> https://crash-analysis.mozilla.com/release-mgmt/0000.overview.html (Latest
> Daily Reports) but I'm rewriting this stuff and will put it on Heroku.
> I'll ping you when everything will be finished (I hope very soonly).
> I already filled a bug for that purpose:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1315007
That's nice but not answering the question :)
Do you know if anybody uses https://crash-analysis.mozilla.com/crash_analysis/
(Not; Do you use crash-analysis.mozilla.com)
There's a cron job that generates these .txt and .txt.gz files. We *used* download those from the Socorro webapp and display their content as part of the signature report and/or the report index pages. We don't any more. The question is; do you know anybody else who depends on those .txt (and .txt.gz) files at all?
Comment 16•8 years ago
|
||
:peterbe, sorry I missed that...
As far as I know, nobody use this. Maybe you should check the server logs to see if someone download this stuff.
Comment 17•8 years ago
|
||
The only thing I know of is DLL correlations with versions (see bug 1361396).
Flags: needinfo?(mcastelluccio)
Comment 18•7 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #15)
>
> There's a cron job that generates these .txt and .txt.gz files. We *used*
> download those from the Socorro webapp and display their content as part of
> the signature report and/or the report index pages. We don't any more. The
> question is; do you know anybody else who depends on those .txt (and
> .txt.gz) files at all?
We at Beijing office scraps these reports to find out if any of our extensions have unusual high correlation with high volumn crashes, it helped a few times in the past.
We just noticed we've been gotten empty results since 5 May, and it turns out since that day only the first character of the extension id was left. [1][2]
[1]: https://crash-analysis.mozilla.com/crash_analysis/20170504/20170504_Firefox_52.0.2-interesting-addons.txt
[2]: https://crash-analysis.mozilla.com/crash_analysis/20170505/20170505_Firefox_52.0.2-interesting-addons.txt
Assignee | ||
Comment 19•7 years ago
|
||
Hector: Can you write up a new bug for that? That's a different (but related) issue.
Reporter | ||
Comment 20•7 years ago
|
||
Hector, that problem very likely comes from us changing the format of the addons field. See bug 1250132 for context.
Comment 21•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #19)
> Hector: Can you write up a new bug for that? That's a different (but
> related) issue.
Filed as bug 1371574.
I'm also interested in knowing what can I do after bug 1361396.
Reporter | ||
Updated•7 years ago
|
Assignee: adrian → nobody
Reporter | ||
Updated•7 years ago
|
Assignee | ||
Updated•7 years ago
|
Summary: Stop saving processed crash in PostgreSQL → [tracker] Stop saving processed crash in PostgreSQL
Assignee | ||
Comment 23•7 years ago
|
||
Adjusting the summary a bit since the postgresql crash storage saves raw and processed crash data to a few different places.
Summary: [tracker] Stop saving processed crash in PostgreSQL → [tracker] Stop saving crash data to postgresql
Assignee | ||
Comment 24•6 years ago
|
||
The last changes removing all the bits from the db related to crashstorage landed today. We're all done here!
Assignee: nobody → willkg
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 25•6 years ago
|
||
We pushed it all out today in Socorro tag 319. That's the end of that.
You need to log in
before you can comment on or make changes to this bug.
Description
•