Closed Bug 1403248 Opened 8 years ago Closed 8 years ago

implement reprocess script

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Unassigned)

Details

Attachments

(1 file)

We should be able to trivially reprocess batches of crashes and produce a log of what happened. Currently, the webapp interface lets you reprocess a single crash. There's an API endpoint that lets you reprocess a group of crashes, too. Adrian wrote a reprocessor cli: https://github.com/adngdb/reprocess Peter wrote one, too: https://github.com/peterbe/reprocess-supersearch Both of those deal with creating lists of crashes to reprocess and then submitting them for reprocessing. Neither of them produce logs of what actually got reprocessed. Neither of them are resilient to problems--if the script crashes, then you have to start over. That's totally fine for yesterday's needs, but today, we need a reprocessing script that is flexible, works with our local development environment, and is a first-class citizen in the Socorro tool chest.
I think it should work like the other scripts: takes a list of crash ids via stdin or args and reprocesses those. Given that we're doing reprocessing with large batches and it's easier to manipulate batches with include/exclude files, it probably behooves us to also support multiple "include the crashes in this file" and "exclude the crashes in this file" arguments. It should have a mode where it spits out to stdout crashids that were sent for reprocessing. In this way, we have a super flexible script that works with fetch_crashids.py and is resilient to crashing and provides a list of what actually got reprocessed. I know I would have loved to have had that when reprocessing devedition crashes last night. I'm making this a P2. With 57 coming, it behooves us to have a solid story for reprocessing large groups of things.
Priority: -- → P2
With chances of saying the obvious... Eons ago we had a good idea but it involved quite a lot of work so it stalled into nothingness. The ideal tool would be to combine a supersearch query with scroll. E.g. a button (if you have the permission) next to the supersearch UI results output. Scan and scroll is available all way back in version 1 of ES. [0] If it's tied to the supersearch web UI it'd be possible to implement "in pure Python" without needing to complicate things with /api/SuperSearch/ and /api/Reprocessing/ [0] https://www.elastic.co/guide/en/elasticsearch/guide/1.x/scan-scroll.html
Regardless of whether we have a web ui for it, I want to be able to reprocess from the command line. Right now we have almost everything I need except a reprocess.py script, but I think that should be pretty straight-forward. It's essentially what you and Adrian wrote, except it pulls crash ids from stdin or command line args. This doesn't preclude us from doing a web ui or doing an API endpoint that uses the Elasticsearch scan scroll or other options at some later point.
Commits pushed to master at https://github.com/mozilla-services/socorro https://github.com/mozilla-services/socorro/commit/52ae8790e1578cedb628ea0d01d8fe5484d65859 fixes bug 1403248 - add reprocess.py script This adds a reprocess script that works with the other scripts like fetch_crashids.py and sends specified crash ids to a server environment for reprocessing. This lets us reprocess individual crashes as well as groups of crashes that match criteria specified with a Super Search. https://github.com/mozilla-services/socorro/commit/35bb4d8837cf0a42985fe52cdb4119e454886b58 Merge pull request #4031 from willkg/1403248-reprocess fixes bug 1403248 - add reprocess.py script
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: