Closed Bug 1716611 Opened 3 years ago Closed 3 years ago

[siglist] Add __pthread_kill to the prefix list

Categories

(Socorro :: Signature, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: smichaud, Assigned: willkg)

References

Details

Crash Data

Attachments

(1 file)

__pthread_kill is called from pthread_kill. When this happens, the reported signature is [@ __pthread_kill | pthread_kill ], which is useless for diagnostic purposes (it should be longer).

Recent examples:

bp-54c86813-e07c-4e41-8642-729520210615
bp-5809e6c9-58fb-405c-840c-5e4e20210615

As of bug 1716603, macOS crash stacks are being reported more accurately -- specifically their elements pertaining to macOS system calls. abort calls pthread_kill, which in turn calls __pthread_kill. But previously (and still in some cases) a crash stack will begin with __pthread_kill | abort. The new (and more correct) behavior seems to be confusing Socorro.

As best I can tell, __pthread_kill is already in the skip list. I'm guessing it should also be in the prefix list. But maybe some other solution is needed.

For a while at least, Socorro will need to continue to also support the "inaccurate" crash stacks -- those that start with __pthread_kill | abort.

Depends on: 1716603
Blocks: 1716603
No longer depends on: 1716603
Crash Signature: [@ __pthread_kill | pthread_kill ]

As I mentioned above, pthread_kill can be called from abort. It can also be called from __abort or raise.

The number of crashes with this bug's "signature" is enormous. And as you can see from the search below, most of them should really have the signature [@ __pthread_kill | abort | gpusGenerateCrashLog.cold.1 ], or something like it.

https://crash-stats.mozilla.org/search/?signature=~__pthread_kill%20%7C%20pthread_kill&platform=Mac%20OS%20X&date=%3E%3D2021-06-10T15%3A17%3A00.000Z&date=%3C2021-06-17T15%3A17%3A00.000Z&_facets=signature&_facets=proto_signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-proto_signature

For some reason, crashes with this bug's (false) signature don't show up in the list of top crashes. So if you look there, and in the "crash data" for bugs whose signatures include [@ __pthread_kill | abort | gpusGenerateCrashLog.cold.1 ], it seems like the number of these crashes has gone down dramatically over the last few days. They haven't!

This is very misleading, and at some point people will notice it. At that point it will become rather urgent to fix this bug.

The person who would normally handle this bug, Will Kahn-Green, is on PTO for the next few weeks. Gabriele, is there someone else who can deal with it before then, if need be?

Flags: needinfo?(gsvelto)

Another problem is that, in lists where this bug's signature does show up, there's no reference to this bug, even though it's "signature" field does include [@ __pthread_kill | pthread_kill ]:

https://crash-stats.mozilla.org/search/?platform=Mac%20OS%20X&date=%3E%3D2021-06-10T15%3A29%3A00.000Z&date=%3C2021-06-17T15%3A29%3A00.000Z&_facets=signature&_facets=proto_signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Edit: So not only is it difficult for people to find this bug, but we can't work around it by adding [@ __pthread_kill | pthread_kill ] to other bugs' signature fields.

See Also: → 1692399

For some reason, crashes with this bug's (false) signature don't show up in the list of top crashes.

It still doesn't show up in the list of Mac topcrashers on trunk. But now it does show up in the lists on beta and release. So I'll go ahead and add [@ __pthread_kill | pthread_kill ] to the most important bugs with [@ __pthread_kill | abort | gpusGenerateCrashLog.cold.1 ] already in their signature lists.

Edit: So now it's rather less urgent to fix this bug.

Flags: needinfo?(gsvelto)

(In reply to Steven Michaud [:smichaud] (Retired) from comment #3)

Another problem is that, in lists where this bug's signature does show up, there's no reference to this bug, even though it's "signature" field does include [@ __pthread_kill | pthread_kill ]:

Now that [@ __pthread_kill | pthread_kill ] has been added to other bugs' signature lists, they do show up in these lists. I suppose the reason this bug doesn't show up is that it's not technically a "bug".

So once again, fixing this bug has become less urgent.

Grabbing this to look at this week.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: -- → P2

So, __pthread_kill is in the prefix list already. Signature generation hits that frame and continues to the next one. pthread_kill is not in the prefix list. We can add that.

If I add pthread_kill to the prefix list, then signature generation will continue to the next frame. Here's an example run:

app@socorro:/app$ socorro-cmd signature ca9e27c5-0239-479c-a271-033e90210713
Crash id: ca9e27c5-0239-479c-a271-033e90210713
Original: __pthread_kill | pthread_kill
New:      __pthread_kill | pthread_kill | abort | gpusGenerateCrashLog.cold.1
Same?:    False

Steven: Does that look like what you're going for?

Flags: needinfo?(smichaud)

Looks fine to me. Thanks!

Flags: needinfo?(smichaud)

When will this merge take effect? There are still crashes showing up with the signature [@ __pthread_kill | pthread_kill ]:

bp-cd4afdee-5f3c-496a-b93f-ecd7b0210715
bp-65a2f764-c633-4c05-9d76-50cbb0210715

It's on stage, but not on prod, yet. I've got some stuff to do before a prod deploy, so I'll do the prod deploy next week. When I do a prod deploy, I mark it in the comments in the bug and then mark the bug as FIXED.

I deployed this to prod in bug #1721613 just now.

I reprocessed cd4afdee-5f3c-496a-b93f-ecd7b0210715 and the signature changed to __pthread_kill | pthread_kill | abort | gpusGenerateCrashLog.cold.1.

Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: