Closed Bug 1559223 Opened 6 years ago Closed 6 years ago

Add ability to search crash reports by module

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: janet, Assigned: willkg)

Details

Attachments

(1 file)

pr 4996: bug 1559223, 1542964: add modules_in_stack to processed crash 6 years ago Will Kahn-Greene [:willkg] ET needinfo? me 53 bytes, text/x-github-pull-request		Details \| Review

janet [:janet]

Reporter

Description

•

6 years ago

With the impending release of Fenix - we want to be able to easily find and triage crashes that are reported to Socorro, caused by the application services shared library.

At the moment, the only way for us to find these is to click through the search list and look at the stack trace of the reports, and see if it has libfenix.so (the name of the shared library produced by https://github.com/mozilla/application-services for Fenix) in the "Module" column of any of the frames on the stack of the crashing thread.

Ideally - we'd be able to search crash reports by module. Alternatively, if it were possible to filter for crashes originating directly in the module in question, which is likely to still catch most of our problems since we statically link almost everything.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 1

•

6 years ago

There's a proto_signature field, but that's function names--not modules. There's bug #1542964 which covers something related--being able to reprocess crash reports where a specific module/debugid is in the stack. I think the two are likely related enough that they can be solved together.

Also related is that I'm reworking a bunch of the Elasticsearch code so we can upgrade from a really old Elasticsearch to a current one. That'll definitely affect what we can do search-wise.

Janet: Is this urgent? Could we live without it for a month? Pretend I don't know what the Fenix or application services schedules look like.

Flags: needinfo?(jdragojevic)

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 2

•

6 years ago

Making this a P2 so it's on my radar.

Priority: -- → P2

janet [:janet]

Reporter

Comment 3

•

6 years ago

I think a month or so seems perfectly reasonable. The big 2.0 release should be October, so if we are getting crashes and have some time before then to get the info we need to find and resolve them, that would be great.

janet [:janet]

Reporter

Updated

•

6 years ago

Flags: needinfo?(jdragojevic)

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 4

•

6 years ago

Grabbing this to work on now.

Assignee: nobody → willkg

Status: NEW → ASSIGNED

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 5

•

6 years ago

I ran out of time to do the Elasticsearch upgrade. We can't index all the frames for all the stacks for all the crash reports--there are too many problems with doing that and I wouldn't be able to get that to work in the next week. So I'm looking at ways to cheat that get you answers to the questions you want.

When you say "module", are you talking the same kind of modules in the wiki here?: https://wiki.mozilla.org/Modules/All

If so, what module are you working on and do you have any example crash reports where that module shows up in the stack?

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 6

•

6 years ago

Janet: I messed up. I see what you mean by module now. Sorry about that.

So, I'm thinking of doing something like this:

Write a processor rule that creates a modules_in_stack field. It'd walk the stack of the crashing thread and pull out modules. The value would be a ; separated set of module/debugid strings. Since it's a set, if a module shows up twice in the stack, it only shows up once in the set.
Add modules_in_stack to the search fields using the semicolon_analyzer. Then each module/debugid would be a term. It should be searchable using beings, ends, contains, and other strings things. It should be facetable.

Then we should should be able to do searches like:

find all crashes with module libfenix.so (this bug)
find all crashes with libc.so/037B12F7F23D7AD7A9262CB5A6ACCDA10 (bug #1542964)

I looked at 11,000 crash reports across Fenix and Firefox.

median number of module/debugid items in the set: 3
mean number of module/debugid items in the set: 4.7
95% of module/debugid items in the set: 13
max number of module/debugid items in the set: 19

If each module/debugid is 100 characters, then this is at most 2k to add per crash report. I think this is ok. We get a lot of utility out of "reprocess all the crash reports with module/debugid"--that comes up periodically.

Lonnen, Peter, Adrian, John: What do you think? Is this a good-enough stop-gap fix to cover some use cases now? Does this sound like a bad idea based on experience with previous attempts at similar things?

Flags: needinfo?(peterbe)

Flags: needinfo?(jwhitlock)

Flags: needinfo?(chris.lonnen)

Flags: needinfo?(adrian)

Peter Bengtsson [:peterbe]

Updated

•

6 years ago

Flags: needinfo?(peterbe)

John Whitlock [:jwhitlock]

Comment 7

•

6 years ago

•

Edited

This sounds reasonable to me. I wasn't familiar with our semicolon_analyzer, but it is being used already for three other fields (app_init_dlls, topmost_filenames, and useragent_locale), so it should be safe to use it on a new fourth field.

Flags: needinfo?(jwhitlock)

Lonnen :lonnen

Comment 8

•

6 years ago

I think this sounds useful. If we're concerned about load we could introduce it behind a sample rate, and crank that sample rate up over the course of a week.

Flags: needinfo?(chris.lonnen)

[DEACTIVATED] Adrian Gaudebert

Comment 9

•

6 years ago

I think this sounds quite fine to me. No alarms went up in my head while reading your description. :-)

Flags: needinfo?(adrian)

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 10

•

6 years ago

Attached file pr 4996: bug 1559223, 1542964: add modules_in_stack to processed crash — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 11

•

6 years ago

willkg merged PR #4996: "bug 1559223, 1542964: add modules_in_stack to processed crash" in 03ffe0f.

I just pushed it out to prod in 2019.07.26. I'll keep an eye on Elasticsearch memory usage.

I'm going to keep this open until Monday. We need to wait for a new Elasticsearch index to be created in prod before searching will work and I want to verify it then.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 12

•

6 years ago

The processor has created a new Elasticsearch index with the mapping we need, so "modules_in_stack" should be searchable for crashes received and processed today and going forward.

I'm using this query:

https://crash-stats.allizom.org/search/?modules_in_stack=%5Elibfenix.so&date=%3E%3D2019-07-22T00%3A00%3A00.000Z&date=%3C2019-07-29T23%3A59%3A00.000Z&_facets=signature&page=1&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

It's a "starts_with" query because the term will be something like "libfenix.so/ABCDEF123456778890" and we need to match just the first part.

That's currently bringing up no results. I think that's because there haven't been any crash reports reported and processed in the last 12 hours that have "libfenix.so" in the stack. If we search for "libxul.so", then it brings up tons of stuff. So while I can't verify the scenario underlying this bug, I'm pretty sure it should work fine once we get the requisite crash reports in.

I'm going to mark it FIXED. If there are still issues, please reopen. Hope that helps!

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Add ability to search crash reports by module

Categories

(Socorro :: General, enhancement, P2)

Tracking

(Not tracked)

People

(Reporter: janet, Assigned: willkg)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Attachment

General

Description

File Name

Content Type