Closed Bug 1265015 Opened 5 years ago Closed 5 years ago
Create or confirm ability to generate datasets for e10s memory crashes
Filters: crashes with release channel == "beta" and process type == "content" Datapoints: crash ID signature OS OOMAllocationSize HasMemoryReport SystemMemoryUsePercentage TotalVirtualMemory AvailableVirtualMemory TotalPageFile AvailablePageFile TotalPhysicalMemory AvailablePhysicalMemory largest_free_vm_block tiny_block_size write_combine_size
44 bytes, text/x-github-pull-request
|Details | Review|
Starting next Monday/Tuesday, we're going to release FF47b1 which will have extensive additional instrumentation for OOM issues in content crashes. In order to measure things, we need a dataset that includes a bunch of different metrics, some of which may not be currently recorded in supersearch. I'm writing proposed details in the user story and asking mccr8 and kanru to confirm the dataset and note any additional fields.
(Commenting on User Story) > largest_free_vm_block > tiny_block_size > write_combine_size We won't have these three because we don't have bug 1264242 on beta, and they're derived from the data that includes in the minidump.
I'm not exactly sure what you are asking, but those all look like good measures to have for investigating OOM crashes.
We're planning to uplift 1264242 for 47b1.
Apart from the 3 fields Ted mentioned in comment#1, all of those are available through SuperSearch. For example: https://crash-stats.mozilla.com/api/SuperSearch/?product=Firefox&_columns=uuid&_columns=signature&_columns=platform&_columns=oom_allocation_size&_columns=contains_memory_report&_columns=system_memory_use_percentage&_columns=total_virtual_memory&_columns=available_virtual_memory&_columns=total_page_file&_columns=available_page_file&_columns=total_physical_memory&_columns=available_physical_memory Is that helpful? If not, can you be more precise about what you need?
As comment 4, we already could access the listed data through SuperSearch. I think we are all set. In bug 1258312 we changed some IPC error signature to OOM. Some will have the 'OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | Pickle::Resize| signature, some will become 'OOM | small', for both cases it would be great if we could aggregate them by IPDL Protocol and Message.
Adrian, can we get those three fields into supersearch by next Monday? largest_free_vm_block is especially important. Is it possible to use supersearch to get this list of fields for large numbers of crashes (more than 100 at a time)? Or a script that will re-combine a bunch of pages back into a single list?
By next Monday, I'm afraid it won't be possible. The only I can see for it to happen that quickly would be for us to start indexing the entire json_dump again, and that's something we cannot afford (it would consume too much disk space in our Elasticsearch cluster). I have filed bug 1266099 to discuss possible solutions to this problem. Right now, the only way to get those three fields is from S3. I don't much about how to retrieve data from S3 though, so I'll forward this to Peter.
Flags: needinfo?(adrian) → needinfo?(peterbe)
Benjamin, I think I can help you script something to extract from S3 and stuff. But you I know you're equally capable with a python in your hand. Ping me in IRC if you need any help.
After talking with Lonnen, we decided to add those three keys you need to the processed crash. We are going to change the way we redact the json_dump to let just those three fields pass by. This way you should be able to get them via SuperSearch. That might serve as a stepping stone for bug 1266099. It is very unlikely that it will be done by Monday though.
Assignee: nobody → adrian
Status: NEW → ASSIGNED
can we get tracking-e10s added to this product?
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/f90c71ccb0e20ee3b514bfaa25b80d91874be931 Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES. (#3298) * Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES. * small improvements from review
Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/60a6a48330833a265ed39999383e237a5a20cd15 Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES." https://github.com/mozilla/socorro/commit/0d69e0f4365b61fdecf265d3d98f9a7909abeeda Merge pull request #3300 from mozilla/revert-3298-1265015-json-dump-parts-in-es Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES."
Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/fdf5de670a9c7b49c96f290e3781f82dc671e5df Revert "Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES."" https://github.com/mozilla/socorro/commit/b6d756814be576040076150cac95d2c58f7837a3 Merge pull request #3303 from mozilla/revert-3300-revert-3298-1265015-json-dump-parts-in-es Revert "Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES.""
Changes to our stage configuration I have just applied: > consulate kv set socorro/processor/destination.storage_classes 'socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage, socorro.external.es.crashstorage.ESCrashStorageRedactedJsonDump, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage' > consulate kv set socorro/processor/destination.storage1.wrapped_object_class 'socorro.external.es.crashstorage.ESCrashStorageRedactedJsonDump' I have been monitoring this for about 20 minutes now, everything looks good. I'll check again in a few hours to see if everything is fine, but I'm confident. Fun fact: we were still storing the entire JSON dump in our stage Elasticsearch database. If there ever is a problem, here is what you need to run to rollback the change: > consulate kv set socorro/processor/destination.storage_classes 'socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage, socorro.external.es.crashstorage.ESCrashStorage, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage' > consulate kv set socorro/processor/destination.storage1.wrapped_object_class 'socorro.external.es.crashstorage.ESCrashStorage'
(In reply to Jim Mathies [:jimm] from comment #11) > can we get tracking-e10s added to this product? done (extended scope from socorro::general to all socorro components). in future it's best if you file a bug in bugzilla.mozilla.org::administration (i'm moving out of the bmo team, and a bug ensures a speedy response in cases where a single person is on pto).
This has been turned on in our production system. From today, 14:30 UTC, those 3 values are available in Super Search: https://crash-stats.mozilla.com/search/?product=Firefox&date=%3E2016-04-26T14%3A29%3A25&_facets=write_combine_size&_facets=largest_free_vm_block&_facets=tiny_block_size&_columns=date&_columns=write_combine_size&_columns=largest_free_vm_block&_columns=tiny_block_size#crash-reports Benjamin, does that make this bug fixed?
ddurst is taking the next step. I've verified that we're getting useful data in small scale. Other fields that would be helpful: * shutdown progess * ipc_channel_error * user comments
Assignee: adrian → ddurst
To clarify, does dataset in the title of the bug refer to a dataset in the s.t.m.o tooling?
That wasn't strictly necessary but it's the best solution and it's what David is implementing.
Is this closed now due to the existence of crash_stats_oom_v1 in Presto?
Let me rephrase: This is now closed due to the existence of the crash_stats_oom_v1 table in Presto, thanks to rvitillo. The data is augmented daily via a spark job. Table structure: uuid varchar date timestamp signature varchar platform varchar contains_memory_report boolean oom_allocation_size bigint system_memory_use_percentage bigint total_virtual_memory bigint available_virtual_memory bigint total_page_file bigint available_page_file bigint total_physical_memory bigint available_physical_memory bigint largest_free_vm_block varchar largest_free_vm_block_int bigint tiny_block_size bigint write_combine_size bigint shutdown_progress varchar ipc_channel_error varchar user_comments varchar submission varchar This should have all the fields requested, plus I added 'largest_free_vm_block_int' (just because I thought it might be useful to see that as an int rather than hex and easier to convert prior to SQL). Data in 'user_comments' have been modified, replacing \n with |. Submission is the partition key, and its format is dd/mm/yy.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.