Closed
Bug 1098954
Opened 10 years ago
Closed 10 years ago
rearrange crash filenames so they are sortable by prefix
Categories
(Socorro :: Backend, task)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
FIXED
111
People
(Reporter: rhelmer, Assigned: rhelmer)
References
Details
As far as I have read, there is no performance problem to store a large number of crashes in a single bucket, unless you want to list that bucket. However if prefixes are used, then bucket contents can be listed quickly. Using "/" as delimiter and referring to prefixes as "pseudo-directories" seems common in AWS docs and elsewhere. Right now our pseudo-directory structure looks like this: {{bucket}}/{{env}}/{{crash_id}}.{{type}} For example: crashstats/stage/fff13cf0-5671-4496-ab89-47a922141114.dump Date is suffixed to the {{crash_id}} currently. We should consider instead something like: {{bucket}}/{{env}}/{{date}}/{{type}}/{{crash_id}} This would look like: crashstats/stage/141114/dump/fff13cf0-5671-4496-ab89-47a922 Now it would be relatively quick to list all crashes of a given type on a certain date. See also http://aws.amazon.com/articles/1904 "Improving PUT and GET Throughput" specifically.
Assignee | ||
Comment 1•10 years ago
|
||
Reference for the assertion that keeping everything in a bucket is not a performance problem per se: http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html "There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether you use many buckets or just a few."
Assignee | ||
Comment 2•10 years ago
|
||
PR: https://github.com/mozilla/socorro/pull/2484
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•10 years ago
|
||
We don't want to depend on the date being embedded in the UUID so this isn't workable as-is. We'll need to use Postgres tables as an index into S3. If we change our minds later we can restructure the S3 store.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Assignee | ||
Comment 4•10 years ago
|
||
I take that back - we do want to do at least this much, so we can be expiration policy per different crash type (we store processed crashes longer than raw, for instance): {{bucket}}/{{env}}/{{type}}/{{crash_id}} This leaves the date alone, it's still at the end of the crash_id.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 5•10 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/f1a92dbf5892fb1213ac6c6b6d58c19dc9164afe fix bug 1098954 - store crashes in S3 so they can be quickly listed by file type later https://github.com/mozilla/socorro/commit/86ce6b5f534458ba1826bf27633982cccdace18c bug 1098954 - also add a version to S3 pseudo-directory structure https://github.com/mozilla/socorro/commit/72817faf8fb966ca6fd8c73bf0d66ddcec617363 Merge pull request #2485 from rhelmer/bug1098954-rearrange-s3-prefix fix bug 1098954 - store crashes in S3 so they can be quickly listed by f...
Updated•10 years ago
|
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Target Milestone: --- → 111
You need to log in
before you can comment on or make changes to this bug.
Description
•