Closed Bug 1271790 Opened 9 years ago Closed 9 years ago

estimate processed crash storage metrics

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lonnen, Assigned: adrian)

References

Details

I'd like to be able to answer the following about our processed crash storage. Some of it we can get precisely, for hard stuff sampling is good enough: * how much are we storing in TB? * how many weeks of data does that represent? * how many objects? * what is the five number summary over the size of the objects?
Assigned this to adrian who knows so much more about the ES storage. What does "the five number summary over the size of the objects" mean? Don't we also still store processed crashes in PG?
Assignee: nobody → adrian
5 number summary or Tukey's 5 numbers are a summary of a distribution the sample minimum (smallest observation) the lower quartile or first quartile the median (middle value) the upper quartile or third quartile the sample maximum (largest observation) We could do this for a day or a week or whatever and that would be good enough for me. I'm interested in the whole processed crash, rather than the partially redacted stuff.
I downloaded 10,000 processed crashes from our stage S3 onto my file system on May 12. Hopefully that's a big enough set, because it took a full night to download all that data. :) Note that documents include sensitive data and the full json dump. Raw crash data is not included. Here's an analysis of the size of those documents (numbers in kilobytes): count 10000 mean 486 std 1170 min 1 25% 177 50% 250 75% 348 max 18944 Then, to find the average number of processed crashes per week, I used this query: https://crash-stats.mozilla.com/api/SuperSearch/?_results_number=0&_facets=_histogram.date&_histogram_interval.date=1w&date=%3E2016-01-04&date=%3C2016-05-16 (count of documents per week between the beginning of the year and now). Using the above average document size, I got those results: Average number of documents per week: 2,533,611 Estimated number of documents for 26 weeks: 65,873,886 Estimated size of 1 week of data: 1,174 Gb Estimated size of 26 weeks of data: 29 Tb Lonnen, is that what you were expecting? Is something missing?
Thanks. This is exactly what I was after!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 1273657
You need to log in before you can comment on or make changes to this bug.