Closed
Bug 1403248
Opened 8 years ago
Closed 8 years ago
implement reprocess script
Categories
(Socorro :: General, task, P2)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: willkg, Unassigned)
Details
Attachments
(1 file)
We should be able to trivially reprocess batches of crashes and produce a log of what happened. Currently, the webapp interface lets you reprocess a single crash. There's an API endpoint that lets you reprocess a group of crashes, too.
Adrian wrote a reprocessor cli:
https://github.com/adngdb/reprocess
Peter wrote one, too:
https://github.com/peterbe/reprocess-supersearch
Both of those deal with creating lists of crashes to reprocess and then submitting them for reprocessing. Neither of them produce logs of what actually got reprocessed. Neither of them are resilient to problems--if the script crashes, then you have to start over.
That's totally fine for yesterday's needs, but today, we need a reprocessing script that is flexible, works with our local development environment, and is a first-class citizen in the Socorro tool chest.
Reporter | ||
Comment 1•8 years ago
|
||
I think it should work like the other scripts: takes a list of crash ids via stdin or args and reprocesses those.
Given that we're doing reprocessing with large batches and it's easier to manipulate batches with include/exclude files, it probably behooves us to also support multiple "include the crashes in this file" and "exclude the crashes in this file" arguments. It should have a mode where it spits out to stdout crashids that were sent for reprocessing. In this way, we have a super flexible script that works with fetch_crashids.py and is resilient to crashing and provides a list of what actually got reprocessed.
I know I would have loved to have had that when reprocessing devedition crashes last night.
I'm making this a P2. With 57 coming, it behooves us to have a solid story for reprocessing large groups of things.
Priority: -- → P2
Comment 2•8 years ago
|
||
With chances of saying the obvious...
Eons ago we had a good idea but it involved quite a lot of work so it stalled into nothingness.
The ideal tool would be to combine a supersearch query with scroll. E.g. a button (if you have the permission) next to the supersearch UI results output.
Scan and scroll is available all way back in version 1 of ES. [0]
If it's tied to the supersearch web UI it'd be possible to implement "in pure Python" without needing to complicate things with /api/SuperSearch/ and /api/Reprocessing/
[0] https://www.elastic.co/guide/en/elasticsearch/guide/1.x/scan-scroll.html
Reporter | ||
Comment 3•8 years ago
|
||
Regardless of whether we have a web ui for it, I want to be able to reprocess from the command line. Right now we have almost everything I need except a reprocess.py script, but I think that should be pretty straight-forward. It's essentially what you and Adrian wrote, except it pulls crash ids from stdin or command line args.
This doesn't preclude us from doing a web ui or doing an API endpoint that uses the Elasticsearch scan scroll or other options at some later point.
Reporter | ||
Comment 4•8 years ago
|
||
Comment 5•8 years ago
|
||
Commits pushed to master at https://github.com/mozilla-services/socorro
https://github.com/mozilla-services/socorro/commit/52ae8790e1578cedb628ea0d01d8fe5484d65859
fixes bug 1403248 - add reprocess.py script
This adds a reprocess script that works with the other scripts like
fetch_crashids.py and sends specified crash ids to a server environment for
reprocessing. This lets us reprocess individual crashes as well as groups of
crashes that match criteria specified with a Super Search.
https://github.com/mozilla-services/socorro/commit/35bb4d8837cf0a42985fe52cdb4119e454886b58
Merge pull request #4031 from willkg/1403248-reprocess
fixes bug 1403248 - add reprocess.py script
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•