Closed Bug 1705469 Opened 4 years ago Closed 4 years ago

faster update reprocessing support

Categories

(Socorro :: Processor, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

Attachments

(1 file)

The processing pipeline is built of a set of rules that get applied in serial. Of those rules, the BreakpadMinidumpStackwalkerRule2015 takes the bulk of the time for crash reports that have minidumps in them.

We often make changes that require some set of crash reports to be reprocessed in order to pick up signature generation changes. We sometimes reprocess crash reports to pick up minidump-stackwalk changes or newly uploaded symbols files. We sometimes reprocess crash reports because we've made changes to crash storage which happens after processing.

For two of these use cases, it'd be nice if we could skip minidump-stackwalk step.

Another thought I've had is being able to "reprocess" using alternate pipelines that provide additional information which we don't want to do for all crashes because maybe it's computationally intensive. I'd like to think about that here, too.

This bug covers figuring out what options we have and then maybe implementing one.

I'm going to make this a P2. Periodically, I have to reprocess to pick up signature generation changes and it takes me a day to do because I've got a lot of crash reports to process. Having a fast-update would be really helpful. Also, we're thinking about updating Elasticsearch and we've got a big migration coming up and being able to do a fast reprocess could be really handy.

Plus I'm going to grab it.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: -- → P2

The processing pipeline consists of crash ids like this:

4b17c70a-02ec-4e6a-ad7e-5c0210210421

I think I want to support multiple processing pipelines, but do it in a way that a pipeline could consist of a transformation of another pipeline. For example, I want to reprocess crash reports to pick up a new signature. The signature generation rule is the second to last rule today, but maybe we add other things later. I would like to specify the regenerate_signature pipeline as "default pipeline skipping to signature generation rule step and continue from there".

Then we'd specify which pipeline to use by tacking on the pipeline name to the crash id in the processing queue. Maybe something like this:

CRASHID:PIPELINE

For example:

4b17c70a-02ec-4e6a-ad7e-5c0210210421:regenerate_signature

Having something like this makes it easy to do pipelines that have additional processing later.

What should the processor do if the pipeline doesn't exist? I think the processor should log an error and move on to the next crash id.

How does the Reprocessing API know what pipelines exist and are valid? We could make it a setting. We'd have to add a new pipeline to two places. An alternative is to define pipelines in some place that's importable by both the processor and the webapp. Though that's really tricky because the rules in the pipeline require configuration so then we have to make sure the configuration exists in both places and I don't want to do that. So I think when we add pipelines, we'd have to add it in two places. Maybe the test harness can catch when the two places are out of sync.

What's the first alternate pipeline to implement? Definitely the regenerate_signature pipeline. That will be a big win.

I pushed this to prod today in bug #1708188. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: