Closed Bug 1375434 Opened 7 years ago Closed 6 years ago

Write a cron job to populate the `signatures` table from Super Search data

Categories

(Socorro :: Backend, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: adrian, Assigned: willkg)

References

Details

Attachments

(3 files)

Feature URL
-----------

https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=54.0&days=7

Parts impacted
--------------

 * socorro, cron.jobs

Rationals
---------

 * the `signatures` table is currently populated by the `update_reports_clean` stored procedure, and we want to remove that (but we want to keep the content of the `signatures` table)
 * Elasticsearch, via Super Search, should be the only source of crash data and aggregations
Why do we need the signatures table?

I suspect the topcrashers page needs it. Perhaps it's because of the first_seen column. Is that the only reason to have that table?
Flags: needinfo?(adrian)
Yes, that's the table we use for the SignatureFirstDate service, which is used by topcrashers. I don't remember what else uses it. I was under the impression it was a generally useful service for our users, but you might want to challenge that assumption.
Flags: needinfo?(adrian)
The table looks like this:;

breakpad=> \d signatures
                                          Table "public.signatures"
    Column    |           Type           |                             Modifiers
--------------+--------------------------+-------------------------------------------------------------------
 signature_id | integer                  | not null default nextval('signatures_signature_id_seq'::regclass)
 signature    | text                     |
 first_report | timestamp with time zone |
 first_build  | numeric                  |
Assignee: nobody → peterbe
https://github.com/mozilla-services/socorro/pull/3984 is quite big and non-trivial. But I'm confident it's on the right path. 

We could solve this by a processor rule that has write access to postgres (adrian's idea) but it's a bit sad. The processor needs to be simple and have as few side-effects as possible. Not needing postgres would also mean we can cease to worry about making the database connection etc. Also, this feature (signatures' first date) does feel like something that belongs more towards the web app side of things. Granted, crontabber isn't the webapp but it just generally feels better of there as it's massaging the data. 

The PR needs unit tests and I don't think know the best way to test it but what I had in mind is that we install it on -stage admin's crontabber job list. Then we watch the logs to see that it's spotting signatures and it should also mention that it's not inserting them (because the stored procedure has already done it). But if you manually delete some signatures from the list (of newish ones) and wait, this crontabber app should populate it.
See https://github.com/mozilla-services/socorro/pull/3984#issuecomment-329838697 for why I'm un-assigning this from myself.
Assignee: peterbe → nobody
One thing I was thinking about is that maybe we should think about this as if it were a data flow. We have crashes flowing from Antenna -> Processor => various crash storage systems and by crash id already.

Maybe we should create a flow for signatures? Every time the processor computes a signature, it tosses it in a flow of signatures and we have a consumer pull those off and update Postgres.

We've got pieces of this in place already. We could use RabbitMQ which the processor already uses and write a RabbitMQ crash storage that tosses signatures into a queue. Then we also write a microservice component that consumes from the RabbitMQ queue and updates Postgres.

Seems like this would have similar properties as adding a processor rule. I wonder if there are other problems we could/should solve similarly.
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #6)
> One thing I was thinking about is that maybe we should think about this as
> if it were a data flow. We have crashes flowing from Antenna -> Processor =>
> various crash storage systems and by crash id already.
> 
> Maybe we should create a flow for signatures? Every time the processor
> computes a signature, it tosses it in a flow of signatures and we have a
> consumer pull those off and update Postgres.
> 
> We've got pieces of this in place already. We could use RabbitMQ which the
> processor already uses and write a RabbitMQ crash storage that tosses
> signatures into a queue. Then we also write a microservice component that
> consumes from the RabbitMQ queue and updates Postgres.
> 
> Seems like this would have similar properties as adding a processor rule. I
> wonder if there are other problems we could/should solve similarly.

I like that a lot more than a non-trivial crontabber app that spans ES and PG. 
However, it might be worth "making a deal". Switch to this app nowish, then you can much more confidently switch off storing anything related to crashes in PG and disable it from the PolyCrashStorage. Once that's in place it'll be easier to innovate on new cool Jansky solutions. In other words, a cryptic "side-step" just to be able to get rid of PG based crashstorage and then add it back in but in a better way.
I can't think of a reason why we couldn't do this now--I don't think we need to wait for Jansky. I think we can implement this and throw it in the mix and it'll work just fine with the existing stuff. Then once we're happy with it, we can remove the existing stuff.
Grabbing this to do now.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Unassigning myself from bugs I'm not immediately working on and/or have some meaningful progress on.
Assignee: willkg → nobody
Status: ASSIGNED → NEW
Bumping this up to P2 to get done soon.
Priority: -- → P2
Commits pushed to master at https://github.com/mozilla-services/socorro

https://github.com/mozilla-services/socorro/commit/2e477debca41a59270658abb0e2b7c3547326a3e
fix bug 1375434: redo first signature maintenance

This creates a new cron job that looks at data in Elasticsearch and updates
the first_build and first_report data in the signatures table accordingly.

This job is set up to run hourly and look at a 90 minute window. That'll
help it not miss things. It also keeps the Elasticsearch queries small enough
that it can work through them quickly.

This job is set up to be resilient. If it dies in the middle--no big deal.
If there are two running at the same time--no big deal. We get enough crash
data that if we miss some things--no big deal.

https://github.com/mozilla-services/socorro/commit/284a0ef5a1311855f7305237a5fcd22b920de527
Merge pull request #4434 from willkg/1375434-signatures

fix bug 1375434: redo first signature maintenance
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
We landed it, it deployed to stage, and it seems to be erroring out:

https://sentry.prod.mozaws.net/operations/socorro-new-stage/issues/4313552/

Reopening to look into that.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: nobody → willkg
Status: REOPENED → ASSIGNED
Commits pushed to master at https://github.com/mozilla-services/socorro

https://github.com/mozilla-services/socorro/commit/dd7ef21b273be0f178393fc4b9515e18603ef939
fix bug 1375434: fix build_id storage

build_id is stored as a numeric in the db, so we need to convert it
before saving it.

https://github.com/mozilla-services/socorro/commit/4c21df68b4f76f263d3f14c282f0b4107d631837
Merge pull request #4438 from willkg/1375434-fix-update-signatures

fix bug 1375434: fix build_id storage
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
I messed up. The problem wasn't that it was coercing types but rather that the build id was the empty string. I'll write some code to deal with that since not all crashes have a build id.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Commits pushed to master at https://github.com/mozilla-services/socorro

https://github.com/mozilla-services/socorro/commit/ddb0452680f99cda2dbfc1b1fcd2fc7b0fb65619
fix bug 1375434: handle crashes with no build id

Some crashes don't have a build id which is why they don't "intify" well.
This skips those because we don't consider them when figuring out
first build and first date.

https://github.com/mozilla-services/socorro/commit/51370966b5a54544547999031579f91490b07ed5
Merge pull request #4439 from willkg/1375434-empty-buildid

fix bug 1375434: handle crashes with no build id
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Everything is on -stage right now. I checked the crontabber logs and the job is running and updating/inserting signatures. I think we're good. Hooray!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: