Closed Bug 1528243 Opened 6 years ago Closed 6 years ago

write script to compare raw crashes to processed crashes for a day

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(1 file)

A while back, we had an outage with RabbitMQ and we spent some time verifying that incoming crashes were processed. That was a slightly involved operation between Brian and I.

I predict that we're going to want to do this sort of thing again. I think it's worth having a script or something that we can run against the crash reports bucket that builds a list of all the incoming crashes for a given day and generates a list of crash ids that were collected, but not processed.

If we made it a crontabber job, it could additionally send those crashes in for processing.

This bug covers that work.

Making this a P2. It'd be handy to have this script before we try to transition to a different queuing system.

Blocks: 1518281
Priority: -- → P2

I'm mostly done except for the "what do we do with missing crashes?" part. I was thinking the script would send them in the reprocessing queue, but there's no visibility for that. If the crash fails to process again, we'd never know.

Instead, I'm going to toss the crash ids in a Django-managed db. Then we have a log of them. We can reprocess, we can look into issues, etc.

willkg merged PR #4840: "fix bug 1528243: verify crashes are processed" in 9a8f182.

Going to wait for this to deploy to stage and verify it.

Depends on: 1534055

It deployed to stage and had permissions problems. Bug #1534055 covered fixing those. In the process of looking at that, we discovered that the multiprocessing code needs to be wrapped in Sentry error handling. That's covered in bug #1534402.

After the permissions issues were fixed, I checked the logs and we had this:

2019-03-11T19:42:18.854136 INFO socorro.cron.base.VerifyProcessedCronApp: All crashes for 20190310 were processed.

I checked the table in the Django admin and there aren't any missing crashes. I'll check that periodically for the rest of the week and we can see what happens.

But, looks like it's running.

Depends on: 1534402

I fixed error handling (I think). I also tweaked the number of workers and set it to 20. It runs much faster now.

I'll check the Django table periodically to see how things behave. Depending on how that goes, we'll figure out what we should do when/if there are missing processed crashes.

This is in production. I tweaked the cron jobs table and had it run for all the days since January 1st. It's working super. Takes about 150 seconds to do a single day.

Marking as FIXED.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: