Closed Bug 1122260 Opened 11 years ago Closed 10 years ago

Add processor rule to make MD5 sum of dumps

Categories

(Socorro :: Backend, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lars, Assigned: lars)

Details

add and deploy a new processor rule to make an md5sum for the dumps and save that info in the raw_crash.
here's the complication - the processors don't actually have the dumps in memory. Since a dump's primary use is as food for the stackwalker, the dumps get pushed to disk immediately on loading. They're gone from memory before the crash processing algorithm even starts. There are several ways forward: 1) go back to thinking about putting the code into the collector - with the caveat that it will slow down the collector. 2) make a processing rule that invokes the command line version of MD5SUM to use the dumps on the disk 3) make a processing rule that reads the dumps from disk and calculates the MD5 in memory 4) rework the crashstorage API such that fetching the dumps automatically calculates the md5 as it copies the dumps from external storage into the filesystem for stackwalker. This method has the advantage for S3 of being able to fetch the hash from S3 for that storage scheme. The other storage schemes would fallback to calucalating it while it is in memory for that brief moment during the write to disk. I favor method 4, as for us, it would cost no overhead when using S3. Our other storage methods would experience a minor cost.
(In reply to K Lars Lohn [:lars] [:klohn] from comment #1) > 4) rework the crashstorage API such that fetching the dumps automatically > calculates the md5 as it copies the dumps from external storage into the > filesystem for stackwalker. This method has the advantage for S3 of being > able to fetch the hash from S3 for that storage scheme. The other storage > schemes would fallback to calucalating it while it is in memory for that > brief moment during the write to disk. > > I favor method 4, as for us, it would cost no overhead when using S3. Our > other storage methods would experience a minor cost. #4 sounds good to me.
the collector now creates a hash of the dumps - there is a key in the raw_crash called "dump_checksums" - it contains a mapping of dump name to checksum.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.