Closed Bug 1117776 Opened 9 years ago Closed 8 years ago

document reprocessing

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rhelmer, Unassigned)

References

Details

Robert Helmer [:rhelmer]

Reporter

Description

•

9 years ago

There is a reprocessing crontabber job (socorro.cron.jobs.reprocessingjobs.ReprocessingJobsApp|5m) which reads the reprocessing_jobs table.

For instance:
insert into reprocessing_jobs
  select uuid from reports
  where date_processed > now() - '7 days'::interval;

This isn't currently documented, it should be.

Daniel Maher [:phrawzty]

Comment 1

•

9 years ago

I agree that this should be documented.  I further propose that a new section be added to the docs for common admin / maint / operational / etc tasks.

Daniel Maher [:phrawzty]

Updated

•

9 years ago

Depends on: 1117778

Lonnen :lonnen

Comment 2

•

9 years ago

Notice that this will pull processed crashes from the reports table. If you need to process recorded crashes that never went through processing to begin with you'll need to grab the relevant UUIDs from S3 and enter them that way. Command line is easier than cyberduck.

JP Schneider [:jp]

Comment 3

•

9 years ago

To retrieve the list of uuids, I'm running this:

aws s3 ls s3://org.mozilla.crash-stats.production.crashes/v1/raw_crash/ | grep 2015-07-0[6-7] > crashreports

However, it's sure taking quite a long time, so I'm hoping rhelmer may have some trick he can share that he does when he needs a list of uuids.  In the meantime, it's pulling at the rate of about 300 per minute.

Robert Helmer [:rhelmer]

Reporter

Comment 4

•

9 years ago

(In reply to JP Schneider [:jp] from comment #3)
> However, it's sure taking quite a long time, so I'm hoping rhelmer may have
> some trick he can share that he does when he needs a list of uuids.  In the
> meantime, it's pulling at the rate of about 300 per minute.

Listing the S3 bucket is going to be terribly slow and should be an absolute last resort - if there were crashes that were never written to the reports table that need reprocessing, one place to get them from would be the collector/crashmover logs.

S3 list by date would be much faster and more realistic if crashes were stored with the date in the prefix instead of at the very end, e.g.:

s3://bucket/v2/raw_crash/2015-07-06/...

JP Schneider [:jp]

Comment 5

•

9 years ago

Ah, shazbot, ok.  Well, we didn't grab those logs, so I'm doing this from an ec2 node in screen.


[centos@i-c5757232 ~]$ aws s3 --region us-west-2 ls s3://org.mozilla.crash-stats.production.crashes/v1/raw_crash/ | grep 2015-07-0[6-7] > crashreports

JP Schneider [:jp]

Comment 6

•

9 years ago

Moved the node:
[centos@i-101f1be6 ~]$ aws s3 --region us-west-2 ls s3://org.mozilla.crash-stats.production.crashes/v1/raw_crash/ | grep 2015-07-0[6-7] > crashreports

Peter Bengtsson [:peterbe]

Comment 7

•

8 years ago

I don't know where to put this but Lars shared this trick. He ssh'es into a node with consul and creates this bash script::

#/usr/bin/bash
. /data/socorro/socorro-virtualenv/bin/activate
envconsul -prefix socorro/common -prefix socorro/processor socorro submitter \
    --destination.crashstorage_class=socorro.external.rabbitmq.crashstorage.RabbitMQCrashStorage \
    --destination.routing_key=socorro.reprocessing \
    --producer_consumer.producer_consumer_class=socorrolib.lib.task_manager.TaskManager \
    --source.temporary_file_system_storage_path=/tmp \
    --new_crash_source.new_crash_source_class=socorro.collector.submitter_app.DBSamplingCrashSource \
    --new_crash_source.crash_id_query="select '$1'"

Then he runs that like this:

./reprocess.sh some-long-uuid-thing


Talking directory to RabbitMQ, instead of the PG reprocessing_jobs table, should be a priority since it goes straight to meat rather than having to involved Postgres and a crontabber app.

Peter Bengtsson [:peterbe]

Comment 8

•

8 years ago

We now have the "Reprocess" tool on report index. A tool is better than documentation :)

Also, I've updated the Mana documentation with the new headlines:
* "To re-process a UUID"
* "To re-process lots of UUIDs"

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

document reprocessing

Categories

(Socorro :: General, task)

Tracking

(Not tracked)

People

(Reporter: rhelmer, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8