Closed Bug 1265015 Opened 5 years ago Closed 5 years ago

Create or confirm ability to generate datasets for e10s memory crashes

Categories

(Socorro :: Data request, task)

task
Not set
normal

Tracking

(e10s+)

RESOLVED FIXED
Tracking Status
e10s + ---

People

(Reporter: benjamin, Assigned: ddurst)

References

Details

User Story

Filters:
crashes with release channel == "beta" and process type == "content"

Datapoints:
crash ID
signature
OS
OOMAllocationSize
HasMemoryReport
SystemMemoryUsePercentage
TotalVirtualMemory
AvailableVirtualMemory
TotalPageFile
AvailablePageFile
TotalPhysicalMemory
AvailablePhysicalMemory
largest_free_vm_block
tiny_block_size
write_combine_size

Attachments

(1 file)

Starting next Monday/Tuesday, we're going to release FF47b1 which will have extensive additional instrumentation for OOM issues in content crashes.

In order to measure things, we need a dataset that includes a bunch of different metrics, some of which may not be currently recorded in supersearch. I'm writing proposed details in the user story and asking mccr8 and kanru to confirm the dataset and note any additional fields.
Flags: needinfo?(kchen)
Flags: needinfo?(continuation)
Blocks: e10s-oom
(Commenting on User Story)
> largest_free_vm_block
> tiny_block_size
> write_combine_size

We won't have these three because we don't have bug 1264242 on beta, and they're derived from the data that includes in the minidump.
I'm not exactly sure what you are asking, but those all look like good measures to have for investigating OOM crashes.
Flags: needinfo?(continuation)
We're planning to uplift 1264242 for 47b1.
User Story: (updated)
As comment 4, we already could access the listed data through SuperSearch. I think we are all set.

In bug 1258312 we changed some IPC error signature to OOM. Some will have the 'OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | Pickle::Resize| signature, some will become 'OOM | small', for both cases it would be great if we could aggregate them by IPDL Protocol and Message.
Flags: needinfo?(kchen)
Adrian, can we get those three fields into supersearch by next Monday? largest_free_vm_block is especially important.

Is it possible to use supersearch to get this list of fields for large numbers of crashes (more than 100 at a time)? Or a script that will re-combine a bunch of pages back into a single list?
Flags: needinfo?(adrian)
By next Monday, I'm afraid it won't be possible. The only I can see for it to happen that quickly would be for us to start indexing the entire json_dump again, and that's something we cannot afford (it would consume too much disk space in our Elasticsearch cluster). I have filed bug 1266099 to discuss possible solutions to this problem. 

Right now, the only way to get those three fields is from S3. I don't much about how to retrieve data from S3 though, so I'll forward this to Peter.
Flags: needinfo?(adrian) → needinfo?(peterbe)
Benjamin, I think I can help you script something to extract from S3 and stuff. But you I know you're equally capable with a python in your hand. Ping me in IRC if you need any help.
Flags: needinfo?(peterbe)
After talking with Lonnen, we decided to add those three keys you need to the processed crash. We are going to change the way we redact the json_dump to let just those three fields pass by. This way you should be able to get them via SuperSearch. That might serve as a stepping stone for bug 1266099. 

It is very unlikely that it will be done by Monday though.
Assignee: nobody → adrian
Status: NEW → ASSIGNED
can we get tracking-e10s added to this product?
Flags: needinfo?(glob)
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/f90c71ccb0e20ee3b514bfaa25b80d91874be931
Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES. (#3298)

* Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES.

* small improvements from review
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/60a6a48330833a265ed39999383e237a5a20cd15
Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES."

https://github.com/mozilla/socorro/commit/0d69e0f4365b61fdecf265d3d98f9a7909abeeda
Merge pull request #3300 from mozilla/revert-3298-1265015-json-dump-parts-in-es

Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES."
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/fdf5de670a9c7b49c96f290e3781f82dc671e5df
Revert "Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES.""

https://github.com/mozilla/socorro/commit/b6d756814be576040076150cac95d2c58f7837a3
Merge pull request #3303 from mozilla/revert-3300-revert-3298-1265015-json-dump-parts-in-es

Revert "Revert "Bug 1265015 - Added a new crashstore to put subset of the json_dump in ES.""
Changes to our stage configuration I have just applied: 

> consulate kv set socorro/processor/destination.storage_classes 'socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage, socorro.external.es.crashstorage.ESCrashStorageRedactedJsonDump, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage'
> consulate kv set socorro/processor/destination.storage1.wrapped_object_class 'socorro.external.es.crashstorage.ESCrashStorageRedactedJsonDump'

I have been monitoring this for about 20 minutes now, everything looks good. I'll check again in a few hours to see if everything is fine, but I'm confident. Fun fact: we were still storing the entire JSON dump in our stage Elasticsearch database. 

If there ever is a problem, here is what you need to run to rollback the change: 

> consulate kv set socorro/processor/destination.storage_classes 'socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage, socorro.external.es.crashstorage.ESCrashStorage, socorro.external.boto.crashstorage.BotoS3CrashStorage, socorro.external.statsd.crashstorage.StatsdCrashStorage'
> consulate kv set socorro/processor/destination.storage1.wrapped_object_class 'socorro.external.es.crashstorage.ESCrashStorage'
(In reply to Jim Mathies [:jimm] from comment #11)
> can we get tracking-e10s added to this product?

done (extended scope from socorro::general to all socorro components).

in future it's best if you file a bug in bugzilla.mozilla.org::administration (i'm moving out of the bmo team, and a bug ensures a speedy response in cases where a single person is on pto).
Flags: needinfo?(glob)
tracking-e10s: --- → +
Flags: needinfo?(benjamin)
ddurst is taking the next step. I've verified that we're getting useful data in small scale.

Other fields that would be helpful:

* shutdown progess
* ipc_channel_error
* user comments
Assignee: adrian → ddurst
Flags: needinfo?(benjamin)
To clarify, does dataset in the title of the bug refer to a dataset in the s.t.m.o tooling?
Flags: needinfo?(benjamin)
Depends on: 1271450
That wasn't strictly necessary but it's the best solution and it's what David is implementing.
Flags: needinfo?(benjamin)
Is this closed now due to the existence of crash_stats_oom_v1 in Presto?
Flags: needinfo?(benjamin)
Let me rephrase:

This is now closed due to the existence of the crash_stats_oom_v1 table in Presto, thanks to rvitillo. The data is augmented daily via a spark job.

Table structure:

uuid                           varchar
date                           timestamp
signature                      varchar
platform                       varchar
contains_memory_report         boolean
oom_allocation_size            bigint
system_memory_use_percentage   bigint
total_virtual_memory           bigint
available_virtual_memory       bigint
total_page_file                bigint
available_page_file            bigint
total_physical_memory          bigint
available_physical_memory      bigint
largest_free_vm_block          varchar
largest_free_vm_block_int      bigint
tiny_block_size                bigint
write_combine_size             bigint
shutdown_progress              varchar
ipc_channel_error              varchar
user_comments                  varchar
submission                     varchar

This should have all the fields requested, plus I added 'largest_free_vm_block_int' (just because I thought it might be useful to see that as an int rather than hex and easier to convert prior to SQL).

Data in 'user_comments' have been modified, replacing \n with |.

Submission is the partition key, and its format is dd/mm/yy.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(benjamin)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.