Closed Bug 1098954 Opened 10 years ago Closed 10 years ago

rearrange crash filenames so they are sortable by prefix

Categories

(Socorro :: Backend, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rhelmer, Assigned: rhelmer)

References

Details

As far as I have read, there is no performance problem to store a large number of crashes in a single bucket, unless you want to list that bucket. However if prefixes are used, then bucket contents can be listed quickly.

Using "/" as delimiter and referring to prefixes as "pseudo-directories" seems common in AWS docs and elsewhere.

Right now our pseudo-directory structure looks like this:

{{bucket}}/{{env}}/{{crash_id}}.{{type}}

For example:

crashstats/stage/fff13cf0-5671-4496-ab89-47a922141114.dump

Date is suffixed to the {{crash_id}} currently.

We should consider instead something like:

{{bucket}}/{{env}}/{{date}}/{{type}}/{{crash_id}}

This would look like:

crashstats/stage/141114/dump/fff13cf0-5671-4496-ab89-47a922

Now it would be relatively quick to list all crashes of a given type on a certain date.

See also http://aws.amazon.com/articles/1904 "Improving PUT and GET Throughput" specifically.
Reference for the assertion that keeping everything in a bucket is not a performance problem per se:

http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html

"There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether you use many buckets or just a few."
PR: https://github.com/mozilla/socorro/pull/2484
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
We don't want to depend on the date being embedded in the UUID so this isn't workable as-is. We'll need to use Postgres tables as an index into S3.

If we change our minds later we can restructure the S3 store.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
I take that back - we do want to do at least this much, so we can be expiration policy per different crash type (we store processed crashes longer than raw, for instance):

{{bucket}}/{{env}}/{{type}}/{{crash_id}}

This leaves the date alone, it's still at the end of the crash_id.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/f1a92dbf5892fb1213ac6c6b6d58c19dc9164afe
fix bug 1098954 - store crashes in S3 so they can be quickly listed by file type later

https://github.com/mozilla/socorro/commit/86ce6b5f534458ba1826bf27633982cccdace18c
bug 1098954 - also add a version to S3 pseudo-directory structure

https://github.com/mozilla/socorro/commit/72817faf8fb966ca6fd8c73bf0d66ddcec617363
Merge pull request #2485 from rhelmer/bug1098954-rearrange-s3-prefix

fix bug 1098954 - store crashes in S3 so they can be quickly listed by f...
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 111
You need to log in before you can comment on or make changes to this bug.