Closed Bug 1448421 Opened 7 years ago Closed 7 years ago

redo keys for S3

Categories

(Socorro :: General, task, P1)

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: willkg, Unassigned)

Details

Our crash ingestion pipeline stores crash data in S3 through a series of keys like this: v2/raw_crash/000/20160513/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump_names/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump/00007bd0-2d1c-4865-af09-80bc02160513 v1/processed_crash/00007bd0-2d1c-4865-af09-80bc02160513 v1/upload_file_minidump_browser/00007bd0-2d1c-4865-af09-80bc02160513 v1/upload_file_minidump_content/00007bd0-2d1c-4865-af09-80bc02160513 v1/upload_file_minidump_flash1/00007bd0-2d1c-4865-af09-80bc02160513 v1/upload_file_minidump_flash2/00007bd0-2d1c-4865-af09-80bc02160513 Our S3 bucket suffers from keys not having enough entropy early in the pseudo-filename. Because of that, we have data centralized in a few partitions rather than spread out evenly across many partitions. Because of that, things like "let's switch to a new S3 bucket" and other projects are *seriously* hampered. This bug covers rethinking keys to improve entropy towards the beginning of the pseudo-filename and thus spreading our data across more partitions.
I'm making this a P1 since the sooner we do it, the better off we are since we're going to have to move data or wait for it to expire both of which take time. When I transitioned the raw crashes to a new key structure, I implemented key builders. We can take advantage of that to easily provide alternate locations for data which will make the transition painless.
Priority: -- → P1
Looking at the above, I think I want to make the following changes: * change "raw_crash" to "raw" -- moves entropy up 6 characters * change "processed_crash" to "proc" -- moves entropy up 11 characters * change "upload_file_minidump_*" to "dump_*" -- moves entropy up 15 characters I think this is easy to implement, it'll improve our partition usage dramatically, and it won't adversely affect the bucket structure in ways that cause us to rethink how things work.
There aren't many published resources about this. What I can find are https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/ https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html It sounds like AWS reparititons based on request rate (ours is low) and key count (ours seems high to me, but i dunno about AWS's standards). So we would start out with one partition, 'v'. As our request rate or key count increased, they would split us into two partitions, 'v1' and 'v2'. Etc. So I wonder if moving the entropy up a bit helps when its still going to be 6 characters or more into the key. Let's run any proposed changes by AWS support first, to make sure we'll benefit from them.
That's the documentation I was looking at when I did the v2 keys (bug #1266827). We were told at the time that more entropy closer to the beginning of the string generally made for better keys. AWS can change their partition algorithms, so going with the general rule was future-proofish. I'm definitely game for running proposals by AWS support first since they can see the partitions. The changes I proposed are pretty easy for us to make.
Tagging Brian to talk to AWS TAM and run the proposal by them to see if it'll help. We'd go from keys like this: v2/raw_crash/000/20160513/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump_names/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump/00007bd0-2d1c-4865-af09-80bc02160513 v1/upload_file_minidump_browser/00007bd0-2d1c-4865-af09-80bc02160513 v1/upload_file_minidump_content/00007bd0-2d1c-4865-af09-80bc02160513 v1/upload_file_minidump_flash1/00007bd0-2d1c-4865-af09-80bc02160513 v1/upload_file_minidump_flash2/00007bd0-2d1c-4865-af09-80bc02160513 v1/processed_crash/00007bd0-2d1c-4865-af09-80bc02160513 to keys like this: v2/raw/000/20160513/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump_names/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump_browser/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump_content/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump_flash1/00007bd0-2d1c-4865-af09-80bc02160513 v1/dump_flash2/00007bd0-2d1c-4865-af09-80bc02160513 v1/proc/00007bd0-2d1c-4865-af09-80bc02160513 where "00007bd0-2d1c-4865-af09-80bc02160513" is a unique crash id of which we get 13,000 per hour or so.
Flags: needinfo?(bpitts)
We've emailed our TAM for advice.
Flags: needinfo?(bpitts)
The TAM emailed us back: > So getting back to your proposed changes, shortening the fixed prefix won’t help. Whether the > prefix is v1/dump_names or v1/d, it’ll have the same issue and need to be repartitioned by the service team. That suggests we shouldn't go forward with the proposed changes. Further, Brian pointed out that our current key structure doesn't hinder normal operations. The partitioning issue was a problem when we were switching buckets. Given that, seems prudent to WONTFIX this.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.