Request for changes to Socorro signature filter for modules without symbols

RESOLVED INCOMPLETE

Status

RESOLVED INCOMPLETE
9 years ago
7 years ago

People

(Reporter: smichaud, Unassigned)

Tracking

Trunk
All
Mac OS X

Firefox Tracking Flags

(Not tracked)

Details

The following "signatures" in stack traces tell very little about the
crashes with which they're associated.  (These correspond to commonly
used system libraries (on OS X), so crashes in them can have many
different causes.)

libclient.dylib@0xnnnnnnnn
libawt.jnilib@0xnnnnnnnn
libobjc.A.dylib@0xnnnnnnnn

I'd like to change the Socorro signature filter (specifically the
prefixSignatureRegEx variable) so that any stack trace beginning with
'libclient.dylib@', 'libawt.jnilib@' or 'libobjc.A.dylib@' is given a
compound signature.

https://bugzilla.mozilla.org/show_bug.cgi?id=540870#c10 tells me that
'libobjc.A.dylib@0x1568.' is already in prefixSignatureRegEx.  But
this is too specific -- I'd like it changed to 'libobjc.A.dylib@.'.

So, in summary, I'd like the following changes to
prefixSignatureRegEx:

1) Change 'libobjc.A.dylib@0x1568.' to 'libobjc.A.dylib@.'

2) Add 'libclient.dylib@.'

3) Add 'libawt.jnilib@.'

Thanks very much in advance!
(Reporter)

Updated

9 years ago
Blocks: 564625
my take on this:

from prefixSignatureRegEx in the processor configuration:
remove: "|libobjc.A.dylib@0x1568."
and then add: "|libobjc\.A\.dylib@.*|libclient\.dylib@.*|libawt\.jnilib@.*"
(Reporter)

Comment 2

9 years ago
Sounds reasonable to me.

The current libobjc.A.dylib entry (in prefixSignatureRegEx) seems to
"work", even though it doesn't escape its '.'s.  But that's probably
just an accident.

http://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&platform=mac&range_value=1&range_unit=weeks&date=05%2F12%2F2010+12%3A47%3A29&query_search=signature&query_type=contains&query=libobjc&build_id=&process_type=all&do_query=1
(Reporter)

Comment 3

9 years ago
Don't know if the trailing '*' is really necessary, though.
(Reporter)

Comment 4

9 years ago
> Don't know if the trailing '*' is really necessary, though.

Oops, I take that back.  Now I see that it *is* necessary.

The current libobjc.A.dylib entry ('libobjc.A.dylib@0x1568.') only
matches "signatures" of the form 'libobjc.A.dylib@0x1568N'.
My concern about doing this wholesale for Mac OS X libraries is that we'll end up with signatures that have 5 frames or more of libobjc+address before we finally hit something else that will hopefully be useful.  I don't know if there are limits on the length of signatures or not, but if there are, we open ourselves up to a situation where we exhaust the signature length prepending unresymbolized OS library frames and never end up with a useful frame in the signature.

I think with libobjc we'll be OK, for the most part (though I do know first-hand that libobjc.A.dylib@0xeca0 [aka _objc_error] crashes typically have 2-3 frames before they switch to a framework or hit other code.  libclient crashes look like they could also be in this range.

But I think we should move carefully before we start expanding this to other (larger) unresymbolized Mac OS X core Cocoa and Carbon libraries and frameworks, though.  If we find certain high-frequency addresses in these other frameworks that have varied stacks below them, we should start by just setting those addresses to prepend (like we did for the handful of libobjc.A.dylib@0x1568* [objc_msgSend] frames two years ago), rather than jumping straight to "prepend any unresymbolized frame in this framework."
(Reporter)

Comment 6

9 years ago
> My concern about doing this wholesale for Mac OS X libraries is that
> we'll end up with signatures that have 5 frames or more of
> libobjc+address before we finally hit something else that will
> hopefully be useful.

That'll make things no worse off than they are now.

The "signatures" of the form 'library@0xnnnnnnnn' are almost
completely useless.  Occasionally we'll find that some of the crashes
at a particular address really do have something in common, but that's
very much the exception.  Even then, the "same" crash will have
different addresses on different platforms (PPC versus Intel) and
different OS versions.  (For example, bug 564625's crash at
libclient.dylib@0x8b560 happens only on OS X 10.5.8 PPC.)

I'm more concerned with the possible impact on Socorro performance of
my requested changes.

Please, let's take my requested changes as soon as possible.  Then, if
they don't cause performance problems, let's keep adding new compound
signatures until we have reasonably complete coverage of the more
commonly used system libraries.
Alternately someone could come up with a fix for bug 422527.
(Reporter)

Comment 8

9 years ago
> Alternately someone could come up with a fix for bug 422527.

But that bug's been open for two years now, and is unlikely to be fixed anytime soon.

Let's take my requested changes now.  They're easy to make ... and easy to back out if/when they're no longer needed.
Sure, I'm just noting that the underlying problem you're working around is "we don't have symbols for OS X system libraries".
And there's another problem:

Even a fix for bug 422527 wouldn't help with Java libraries (like libclient.dylib and libawt.jnilib), which have most of their symbols stripped.
Just to make clear -- by "my requested changes" I mean what Lars suggests in comment #1 (that is, "my requested changes as corrected by Lars").

Comment 12

9 years ago
This seems to align somewhat with my recent exchange with tomcat about Mac modules and skiplists.  He suggested [1] we discuss my question at June 1 12:30pm PTD crashkill meeting. https://wiki.mozilla.org/CrashKill/2010-06-01 (dont see notes there yet)

[1] posted 2010-05-12 http://groups.google.com/group/mozilla.dev.quality/browse_thread/thread/6c057556c8df8c82 which sadly garnered no responses (perhaps I posted in the wrong newsgroup?) ...

Do we need skiplists to get stack sigs for some Mac crashes which correlate to windows crashes sigs, like ...

https://crash-stats.mozilla.com/report/index/383f6b03-28ff-4a8e-b616-0ed7c2100512
0    libxpcom_core.dylib    libxpcom_core.dylib@0x34c1   
1    thunderbird-bin    thunderbird-bin@0x9a9740   
2    thunderbird-bin    nsMsgCopyService::ClearRequest mailnews/base/src/nsMsgCopyService.cpp:216
3    thunderbird-bin    mailnews/base/src/nsMsgCopyService.cpp:664
4    thunderbird-bin    mailnews/imap/src/nsImapMailFolder.cpp:7960
5    thunderbird-bin    mailnews/imap/src/nsImapMailFolder.cpp:5256

https://crash-stats.mozilla.com/report/index/03c933cb-8fca-43c2-b6d9-958932100511
0        @0x1   
1    thunderbird-bin    thunderbird-bin@0x9a9740   
2    thunderbird-bin    nsMsgCopyService::ClearRequest mailnews/base/src/nsMsgCopyService.cpp:216
3    thunderbird-bin    thunderbird-bin@0x9a9921   
4    thunderbird-bin    nsMsgCopyService::Release mailnews/base/src/nsMsgCopyService.cpp:424
5    libxpcom_core.dylib    nsCOMPtr_base::assign_with_AddRef xpcom/glue/nsCOMPtr.h:456

https://crash-stats.mozilla.com/report/index/f7f7b024-6978-4a05-b937-7bd5f2100512
0        @0x0   
1    thunderbird-bin    thunderbird-bin@0x9a9740   
2    thunderbird-bin    nsMsgCopyService::ClearRequest mailnews/base/src/nsMsgCopyService.cpp:216
3    thunderbird-bin    nsMsgCopyService::NotifyCompletion mailnews/base/src/nsMsgCopyService.cpp:664
4    thunderbird-bin    nsImapMailFolder::OnCopyCompleted mailnews/imap/src/nsImapMailFolder.cpp:7960
Summary: Request for changes to Socorro signature filter → Request for changes to Socorro signature filter for modules without symbols
Wayne, I don't think it's appropriate to skip frames that are in binaries we are shipping, even if they don't have symbols. If they're legitimately missing symbols for some reason then it's either a bug (and we should fix it) or a crash that goes through a non-function address, which is perfectly valid.
(In reply to comment #13)
> crash that goes through a non-function address, which is perfectly valid.

I think it most cases it's this. We regularly have crashes in Camino where we're in some CSS code and a couple of frames are way-out-of-wack hex (bp-e19b7563-f00c-404b-8d4d-7defa2100531 is an example, though it's much worse than usual; you can see from the rest of the frames--as you can from the stacks in comment 12--that we do have symbols for those binaries); early on, smorgan looked at a couple of these and went through the effort to figure out what the values for the frames might be (I'm fuzzy on the details now), so unless we have one of these sorts of stacks that's not involving the CSS code, I don't worry too much. ;)
(Assignee)

Updated

7 years ago
Component: Socorro → General
Product: Webtools → Socorro

Updated

7 years ago
Component: General → Infra

Comment 15

7 years ago
This has been lingering for a long time and bug 422527 has been fixed in the mean time, so I'm resolving this. Please file new bugs for skiplist additions that need to be made, and be precise on what is needed there.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.