Closed Bug 1303555 Opened 8 years ago Closed 8 years ago

Add the "modules" field to the json_dump on Telemetry

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: marco, Assigned: adrian)

References

Details

Attachments

(1 file)

Like the "addons" field, it would be useful to have a way to query the modules.

This would make it possible to add DLLs to my correlation tool.
That data is in the json_dump field, and we don't store it in our Elasticsearch database. I'll need to balance the impact of storing such data in ES (disk space it will take) and the impact it would have. Note that such data will be available in telemetry.
It would definitely be helpful for my correlation tool (as currently I can't run correlations on DLLs). It would also be useful for manually analyzing the DLLs occurring in a signature and their versions (we currently have a script to do that, but it has to download every crash report to look at the modules), I've had to do that a few times over the last weeks (when Firefox is, sadly often, affected by some crash caused by external software).
(In reply to Marco Castelluccio [:marco] from comment #2)
> It would definitely be helpful for my correlation tool (as currently I can't
> run correlations on DLLs). It would also be useful for manually analyzing
> the DLLs occurring in a signature and their versions (we currently have a
> script to do that, but it has to download every crash report to look at the
> modules), I've had to do that a few times over the last weeks (when Firefox
> is, sadly often, affected by some crash caused by external software).

We should sync up so I can introduce you to the crash data that is available in Telemetry (e.g. https://sql.telemetry.mozilla.org/queries/1194)

But, almost ironically, we don't have json_dump data in Telemetry either. The reason for that is a PR that has been reviewed but just not merged yet. But in a couple of days (I hope), you'd be able to write Presto SQL to query EVERYTHING that is in the json_dump and do really fancy queries and conclusions. 

HOWEVER, this is only enabled in stage. Meaning, since stage gets about 15% of the crashes sent to prod, the Presto database will only have about 15% of the prod crashes. 

I know this sounds like a "weird ask" (considering that it's not even ready yet) but the Socorro team really needs help to test that the crash data we send to Telemetry is good and sane. 

PS. I don't know how to write Presto SQL myself. 

PS2. We only started sending data from stage to telemetry a few short weeks ago and the json_dump should start arriving near the end of this week.
(In reply to Peter Bengtsson [:peterbe] from comment #3)
> (In reply to Marco Castelluccio [:marco] from comment #2)
> > It would definitely be helpful for my correlation tool (as currently I can't
> > run correlations on DLLs). It would also be useful for manually analyzing
> > the DLLs occurring in a signature and their versions (we currently have a
> > script to do that, but it has to download every crash report to look at the
> > modules), I've had to do that a few times over the last weeks (when Firefox
> > is, sadly often, affected by some crash caused by external software).
> 
> We should sync up so I can introduce you to the crash data that is available
> in Telemetry (e.g. https://sql.telemetry.mozilla.org/queries/1194)
> 
> But, almost ironically, we don't have json_dump data in Telemetry either.
> The reason for that is a PR that has been reviewed but just not merged yet.
> But in a couple of days (I hope), you'd be able to write Presto SQL to query
> EVERYTHING that is in the json_dump and do really fancy queries and
> conclusions. 
> 
> HOWEVER, this is only enabled in stage. Meaning, since stage gets about 15%
> of the crashes sent to prod, the Presto database will only have about 15% of
> the prod crashes. 
> 
> I know this sounds like a "weird ask" (considering that it's not even ready
> yet) but the Socorro team really needs help to test that the crash data we
> send to Telemetry is good and sane. 
> 
> PS. I don't know how to write Presto SQL myself. 
> 
> PS2. We only started sending data from stage to telemetry a few short weeks
> ago and the json_dump should start arriving near the end of this week.

I think I can move my correlation tool to use the data that you upload to S3.
Do you know when it will be enabled on prod? When will the json_dump be complete (it doesn't seem to have 'modules' for now)?
(In reply to Marco Castelluccio [:marco] from comment #4)
> (In reply to Peter Bengtsson [:peterbe] from comment #3)
> > (In reply to Marco Castelluccio [:marco] from comment #2)
> > > It would definitely be helpful for my correlation tool (as currently I can't
> > > run correlations on DLLs). It would also be useful for manually analyzing
> > > the DLLs occurring in a signature and their versions (we currently have a
> > > script to do that, but it has to download every crash report to look at the
> > > modules), I've had to do that a few times over the last weeks (when Firefox
> > > is, sadly often, affected by some crash caused by external software).
> > 
> > We should sync up so I can introduce you to the crash data that is available
> > in Telemetry (e.g. https://sql.telemetry.mozilla.org/queries/1194)
> > 
> > But, almost ironically, we don't have json_dump data in Telemetry either.
> > The reason for that is a PR that has been reviewed but just not merged yet.
> > But in a couple of days (I hope), you'd be able to write Presto SQL to query
> > EVERYTHING that is in the json_dump and do really fancy queries and
> > conclusions. 
> > 
> > HOWEVER, this is only enabled in stage. Meaning, since stage gets about 15%
> > of the crashes sent to prod, the Presto database will only have about 15% of
> > the prod crashes. 
> > 
> > I know this sounds like a "weird ask" (considering that it's not even ready
> > yet) but the Socorro team really needs help to test that the crash data we
> > send to Telemetry is good and sane. 
> > 
> > PS. I don't know how to write Presto SQL myself. 
> > 
> > PS2. We only started sending data from stage to telemetry a few short weeks
> > ago and the json_dump should start arriving near the end of this week.
> 
> I think I can move my correlation tool to use the data that you upload to S3.
> Do you know when it will be enabled on prod? When will the json_dump be
> complete (it doesn't seem to have 'modules' for now)?

This is always exactly what we send to the Telemetry platform
https://raw.githubusercontent.com/mozilla/socorro/master/socorro/schemas/crash_report.json

If modules isn't in there, we're not sending it. 

But Adrian might know why we're sending the json_dump blob but filter out the list of modules.
Flags: needinfo?(adrian)
Assignee: nobody → adrian
Flags: needinfo?(adrian)
Thanks Adrian. I'm morphing this bug since we're not going to add the `modules` field to SuperSearch, but we're adding it to the crash reports on Telemetry.
Blocks: 1273657
Summary: Add a "modules" parameter to the SuperSearch API → Add the "modules" field to the json_dump on
Summary: Add the "modules" field to the json_dump on → Add the "modules" field to the json_dump on Telemetry
Blocks: 1311648
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/ceab46345de13dc21450b0953c2d06e85ccdeccd
Fixes bug 1303555 - Added modules to the JSON dump in our crash report schema. (#3541)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
I have verified that modules now appear in our S3 store.
Status: RESOLVED → VERIFIED
Adrian, if I run 'describe socorro_crash' on re:dash I don't see the modules in json_dump.

Here's the output for the `json_dump` field:
row(crash_info row(address varchar, crashing_thread integer, type varchar), crashing_thread row(frames array(row(file varchar, frame integer, function varchar, function_offset varchar, line integer, module varchar, module_offset varchar, offset varchar)), threads_index integer, total_frames integer), largest_free_vm_block varchar, main_module integer, status varchar, system_info row(cpu_arch varchar, cpu_count integer, cpu_info varchar, os varchar, os_ver varchar), thread_count integer, threads array(row(frame_count integer, frames array(row(file varchar, frame integer, function varchar, function_offset varchar, line integer, module varchar, module_offset varchar, offset varchar)))), tiny_block_size varchar, write_combine_size varchar)
Flags: needinfo?(adrian)
I confirm the ``modules`` are in the data in our S3 telemetry bucket (verified on prod and stage). I suppose the problem comes from the other side of the stack. Peter, can I let you handle this?
Status: VERIFIED → REOPENED
Flags: needinfo?(adrian) → needinfo?(peterbe)
Resolution: FIXED → ---
Mark, 

I'll spare you having to read this bug. Here's the problem in bullet point format...

* We can't use "modules" in redash
* Here's a UUID that was processed Nov 2 that definitely has json_dump.modules https://crash-stats.allizom.org/api/ProcessedCrash/?crash_id=e499a1da-4169-4554-af85-247392161102
* Test with https://sql.telemetry.mozilla.org/queries/1591/source (see error when you Execute)
* json_dump.modules key was ADDED to crash_report.json 14 days ago https://github.com/mozilla/socorro/commit/ceab46345de13dc21450b0953c2d06e85ccdeccd

I'm almost done with the code that uploads crash_report.json into the same S3 bucket (awaiting code review) but perhaps you've cached the JSON schema too much or it's not "evolving" with new keys being added.
Flags: needinfo?(peterbe) → needinfo?(mreid)
The handling of "live" schema changes will land with bug 1314120. We can test that the modules are available in the WIP data in the meantime. 

Anthony, would you mind processing the data for 20161102 using the latest crash_report schema, and making sure that the modules for that particular crash show up?
Depends on: 1314120
Flags: needinfo?(mreid) → needinfo?(amiyaguchi)
Here's the notebook [1] that processes the data for 20161102. I've verified that the field `json_dump.modules` exists and have done a simple query on it.

[1] https://gist.github.com/acmiyaguchi/80b4cd75e9783a0e1a748105d8a83fe2
Flags: needinfo?(amiyaguchi)
It's working now.
I'm using it for correlations and in https://github.com/marco-c/missing_symbols/blob/master/modules-with-missing-symbols.ipynb.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: