Provide some way to get data equivalent to Ted's dump-lookup tool

NEW
Unassigned

Status

Socorro
General
6 years ago
10 months ago

People

(Reporter: ted, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

6 years ago
I have this tool: http://hg.mozilla.org/users/tmielczarek_mozilla.com/dump-lookup/

You give it a minidump and it scans the stack of the crashing thread and prints everything that looks like it could possibly be a return address. For crashes with horrible stacks this can be very useful. See bug 817946 comment 6 for an example of the output.

I run this occasionally locally, but it requires downloading the minidump and then downloading the matching symbols, which is quite a pain. It would be awesome if there was some way to have Socorro run this on-demand for me, since it has access to all these things already. I wouldn't want to run it automatically since it's not necessary in most cases, but providing the ability for logged-in users to click a button and get the output would be really handy.

Comment 1

6 years ago
There's nothing private about the results, right? As long as we hid it behind a POST so that webcrawlers didn't hose our server, we could probably do this on-demand in the middleware.
(Reporter)

Comment 2

6 years ago
No, the output is totally safe, it's just module, function, source info. I'd only be worried about people DOSing us.
This seems like a priority job for a processor. 

Does that sound ok to you, :lars?

Comment 4

5 years ago
We could hook it up to the raw dump tab in the webapp, and run it on demand, so we'd need a mware service.

The binary needs to run somewhere that has access to HBase and symbols. This could work via a priority job - I like that idea.

Once we do it we should probably save and cache the result, somewhere - PostgreSQL maybe?
Assignee: nobody → sdeckelmann
Target Milestone: --- → 55

Comment 5

5 years ago
We could also just hook this up so it's included in the default output of minidump-stackwalk, either right now or after we have JSON output. How's that JSON output coming?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> We could also just hook this up so it's included in the default output of
> minidump-stackwalk, either right now or after we have JSON output. How's
> that JSON output coming?

Patch is on dev!  We've got about 800 raw crashes in there now. Will be on stage next week.
Need to revisit this -- unsure what I'm supposed to be doing on this.

Was the JSON raw_crash enough?
Target Milestone: 55 → 56
(Reporter)

Comment 8

5 years ago
If we switch to the JSON-producing minidump_stackwalk we could pretty easily include this output there, but it doesn't currently exist. I think laura assigned this to you to investigate the feasibility of just stuffing the output of this tool (a wall of text) into Postgres.
Target Milestone: 56 → 57

Comment 9

5 years ago
perhaps an error, but according to "[tools-socorro] Socorro 57 Released" this landed in production for 57.  where is this in UI?  I'm looking at https://crash-stats.mozilla.com/report/index/327c0b39-6655-4ac6-adf6-96a112130829 for example
Flags: needinfo?
It didn't land with 57.
Flags: needinfo?
Target Milestone: 57 → 58
Target Milestone: 58 → 59
I believe this depends on us turning json_minidump_stalkwalk on. Happy to put it into the database once that's done.
Target Milestone: 59 → ---
(Reporter)

Comment 12

5 years ago
Sort of, in that we can shoehorn this data into the JSON output, but I don't know that we want to by default. This tool can be pretty verbose, and it's not necessary for 99+% of crashes.

Comment 13

5 years ago
How big is it? Certainly always including that data would be easier than reprocessing dumps to get the data later.

Comment 14

5 years ago
Well, I guess when breakpad can flawlessly walk the frames without guessing the processor probably could omit running this.
And that said, we want processing to stay really fast, esp. as we are looking forward to processing 100% of all collected crashes for getting their classifications so we can message back to the users (we still would only put the full data for 10% of release crashes in the DB for analysis and store only the classification for user feedback for the rest).

Comment 15

5 years ago
Here's my UI spec and suggested implementation of this:

* Do not run dump-lookup by default, but will run it on-demand and store the results for a period of time.
* On the report/index page there should be a new tab "Stack Lookup"
* If a stack lookup is available, it should be displayed to everyone
* If a stack lookup has not been run but we still have the minidump, there should be a button for logged-in users "Request Stack Lookup"
* Requested stack lookups should be run asynchronously as a priority job
* Stack lookups should be saved in hbase in a new field such as processed_lookup:txt with an expiration of 90 days

I realize this involves some moving parts and so it's not a trivial change, but this would help normal Mozilla engineers a lot.
implementation of the backend of this could be sped up significantly if the output of the "dump-lookup" could be saved in the processed_crash itself and share its retention policy.  

I'm imagining this implementation:

  1) middleware has method to:
  1.1) fetch a raw_crash with a given 'crash_id'
  1.2) add a 'dump-lookup' flag to it
  1.3) resave the raw_crash
  1.4) put the 'crash_id' into the reprocessing queue


  2) processor
  2.1) reprocess normally with all the standard rules
  2.2) using a new rule, if 'dump-lookup' flag is present in the raw crash, invoke 'dump-lookup' tool and save results in a new 'stack-lookup' key in the processed crash
  2.3) save the processed_crash normally

in the UI:
   if the "stack-lookup" is present in the processed crash, display it.

this same method can be used to solve Bug 977778
Depends on: 1121462
Depends on: 1121469
Assignee: sdeckelmann → nobody

Updated

2 years ago
Flags: needinfo?(chris.lonnen)

Updated

2 years ago
Flags: needinfo?(chris.lonnen)

Comment 17

10 months ago
discussing possible processor changes with team, sec, etc. we may need to consider what possible processes we run more carefully if we pursue this
You need to log in before you can comment on or make changes to this bug.