Closed
Bug 1265015
Opened 9 years ago
Closed 9 years ago
Create or confirm ability to generate datasets for e10s memory crashes
Categories
(Socorro :: Data request, task)
Socorro
Data request
Tracking
(e10s+)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
e10s | + | --- |
People
(Reporter: benjamin, Assigned: ddurst)
References
Details
User Story
Filters: crashes with release channel == "beta" and process type == "content" Datapoints: crash ID signature OS OOMAllocationSize HasMemoryReport SystemMemoryUsePercentage TotalVirtualMemory AvailableVirtualMemory TotalPageFile AvailablePageFile TotalPhysicalMemory AvailablePhysicalMemory largest_free_vm_block tiny_block_size write_combine_size
Attachments
(1 file)
Starting next Monday/Tuesday, we're going to release FF47b1 which will have extensive additional instrumentation for OOM issues in content crashes.
In order to measure things, we need a dataset that includes a bunch of different metrics, some of which may not be currently recorded in supersearch. I'm writing proposed details in the user story and asking mccr8 and kanru to confirm the dataset and note any additional fields.
Flags: needinfo?(kchen)
Flags: needinfo?(continuation)
Comment 1•9 years ago
|
||
(Commenting on User Story)
> largest_free_vm_block
> tiny_block_size
> write_combine_size
We won't have these three because we don't have bug 1264242 on beta, and they're derived from the data that includes in the minidump.
Comment 2•9 years ago
|
||
I'm not exactly sure what you are asking, but those all look like good measures to have for investigating OOM crashes.
Flags: needinfo?(continuation)
Reporter | ||
Comment 3•9 years ago
|
||
We're planning to uplift 1264242 for 47b1.
Reporter | ||
Updated•9 years ago
|
User Story: (updated)
Comment 4•9 years ago
|
||
Apart from the 3 fields Ted mentioned in comment#1, all of those are available through SuperSearch. For example:
https://crash-stats.mozilla.com/api/SuperSearch/?product=Firefox&_columns=uuid&_columns=signature&_columns=platform&_columns=oom_allocation_size&_columns=contains_memory_report&_columns=system_memory_use_percentage&_columns=total_virtual_memory&_columns=available_virtual_memory&_columns=total_page_file&_columns=available_page_file&_columns=total_physical_memory&_columns=available_physical_memory
Is that helpful? If not, can you be more precise about what you need?
Comment 5•9 years ago
|
||
As comment 4, we already could access the listed data through SuperSearch. I think we are all set.
In bug 1258312 we changed some IPC error signature to OOM. Some will have the 'OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | Pickle::Resize| signature, some will become 'OOM | small', for both cases it would be great if we could aggregate them by IPDL Protocol and Message.
Flags: needinfo?(kchen)
Reporter | ||
Comment 6•9 years ago
|
||
Adrian, can we get those three fields into supersearch by next Monday? largest_free_vm_block is especially important.
Is it possible to use supersearch to get this list of fields for large numbers of crashes (more than 100 at a time)? Or a script that will re-combine a bunch of pages back into a single list?
Flags: needinfo?(adrian)
Comment 7•9 years ago
|
||
By next Monday, I'm afraid it won't be possible. The only I can see for it to happen that quickly would be for us to start indexing the entire json_dump again, and that's something we cannot afford (it would consume too much disk space in our Elasticsearch cluster). I have filed bug 1266099 to discuss possible solutions to this problem.
Right now, the only way to get those three fields is from S3. I don't much about how to retrieve data from S3 though, so I'll forward this to Peter.
Flags: needinfo?(adrian) → needinfo?(peterbe)
Comment 8•9 years ago
|
||
Benjamin, I think I can help you script something to extract from S3 and stuff. But you I know you're equally capable with a python in your hand. Ping me in IRC if you need any help.
Flags: needinfo?(peterbe)
Comment 9•9 years ago
|
||
After talking with Lonnen, we decided to add those three keys you need to the processed crash. We are going to change the way we redact the json_dump to let just those three fields pass by. This way you should be able to get them via SuperSearch. That might serve as a stepping stone for bug 1266099.
It is very unlikely that it will be done by Monday though.
Assignee: nobody → adrian
Status: NEW → ASSIGNED
Comment 10•9 years ago
|
||
Comment 12•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/f90c71ccb0e20ee3b514bfaa25b80d91874be931
Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES. (#3298)
* Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES.
* small improvements from review
Comment 13•9 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/60a6a48330833a265ed39999383e237a5a20cd15
Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES."
https://github.com/mozilla/socorro/commit/0d69e0f4365b61fdecf265d3d98f9a7909abeeda
Merge pull request #3300 from mozilla/revert-3298-1265015-json-dump-parts-in-es
Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES."
Comment 14•9 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/fdf5de670a9c7b49c96f290e3781f82dc671e5df
Revert "Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES.""
https://github.com/mozilla/socorro/commit/b6d756814be576040076150cac95d2c58f7837a3
Merge pull request #3303 from mozilla/revert-3300-revert-3298-1265015-json-dump-parts-in-es
Revert "Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES.""
Comment 15•9 years ago
|
||
Changes to our stage configuration I have just applied:
> consulate kv set socorro/processor/destination.storage_classes 'socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage, socorro.external.es.crashstorage.ESCrashStorageRedactedJsonDump, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage'
> consulate kv set socorro/processor/destination.storage1.wrapped_object_class 'socorro.external.es.crashstorage.ESCrashStorageRedactedJsonDump'
I have been monitoring this for about 20 minutes now, everything looks good. I'll check again in a few hours to see if everything is fine, but I'm confident. Fun fact: we were still storing the entire JSON dump in our stage Elasticsearch database.
If there ever is a problem, here is what you need to run to rollback the change:
> consulate kv set socorro/processor/destination.storage_classes 'socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage, socorro.external.es.crashstorage.ESCrashStorage, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage'
> consulate kv set socorro/processor/destination.storage1.wrapped_object_class 'socorro.external.es.crashstorage.ESCrashStorage'
Comment 16•9 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #11)
> can we get tracking-e10s added to this product?
done (extended scope from socorro::general to all socorro components).
in future it's best if you file a bug in bugzilla.mozilla.org::administration (i'm moving out of the bmo team, and a bug ensures a speedy response in cases where a single person is on pto).
Flags: needinfo?(glob)
Comment 17•9 years ago
|
||
This has been turned on in our production system. From today, 14:30 UTC, those 3 values are available in Super Search: https://crash-stats.mozilla.com/search/?product=Firefox&date=%3E2016-04-26T14%3A29%3A25&_facets=write_combine_size&_facets=largest_free_vm_block&_facets=tiny_block_size&_columns=date&_columns=write_combine_size&_columns=largest_free_vm_block&_columns=tiny_block_size#crash-reports
Benjamin, does that make this bug fixed?
![]() |
||
Updated•9 years ago
|
tracking-e10s:
--- → +
Flags: needinfo?(benjamin)
Reporter | ||
Comment 18•9 years ago
|
||
ddurst is taking the next step. I've verified that we're getting useful data in small scale.
Other fields that would be helpful:
* shutdown progess
* ipc_channel_error
* user comments
Assignee: adrian → ddurst
Flags: needinfo?(benjamin)
Comment 19•9 years ago
|
||
To clarify, does dataset in the title of the bug refer to a dataset in the s.t.m.o tooling?
Flags: needinfo?(benjamin)
Reporter | ||
Comment 20•9 years ago
|
||
That wasn't strictly necessary but it's the best solution and it's what David is implementing.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 21•9 years ago
|
||
Is this closed now due to the existence of crash_stats_oom_v1 in Presto?
Flags: needinfo?(benjamin)
Assignee | ||
Comment 22•9 years ago
|
||
Let me rephrase:
This is now closed due to the existence of the crash_stats_oom_v1 table in Presto, thanks to rvitillo. The data is augmented daily via a spark job.
Table structure:
uuid varchar
date timestamp
signature varchar
platform varchar
contains_memory_report boolean
oom_allocation_size bigint
system_memory_use_percentage bigint
total_virtual_memory bigint
available_virtual_memory bigint
total_page_file bigint
available_page_file bigint
total_physical_memory bigint
available_physical_memory bigint
largest_free_vm_block varchar
largest_free_vm_block_int bigint
tiny_block_size bigint
write_combine_size bigint
shutdown_progress varchar
ipc_channel_error varchar
user_comments varchar
submission varchar
This should have all the fields requested, plus I added 'largest_free_vm_block_int' (just because I thought it might be useful to see that as an int rather than hex and easier to convert prior to SQL).
Data in 'user_comments' have been modified, replacing \n with |.
Submission is the partition key, and its format is dd/mm/yy.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: needinfo?(benjamin)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•