Closed
Bug 1402037
Opened 7 years ago
Closed 7 years ago
Improve granularity of shutdownhang signatures
Categories
(Socorro :: Backend, task, P1)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jimm, Assigned: willkg)
References
()
Details
Attachments
(3 files)
Calls to add:
0 ntdll.dll KiFastSystemCallRet
1 ntdll.dll ZwWaitForKeyedEvent
2 ntdll.dll RtlSleepConditionVariableCS
3 kernel32.dll SleepConditionVariableCS
Assignee | ||
Comment 1•7 years ago
|
||
Oh wow--that wiki page is super out of date.
To clarify, you want to add these four strings to the prefix list which causes signature generation to continue to the next frame. Is that right?
Is this a good example crash?: https://crash-stats.mozilla.com/report/index/1a796172-3898-4f5b-a699-4da0c0170514
Flags: needinfo?(jmathies)
Reporter | ||
Comment 2•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #1)
> Oh wow--that wiki page is super out of date.
>
> To clarify, you want to add these four strings to the prefix list which
> causes signature generation to continue to the next frame. Is that right?
>
> Is this a good example crash?:
> https://crash-stats.mozilla.com/report/index/1a796172-3898-4f5b-a699-
> 4da0c0170514
I believe so yes. In addition to the first list -
KiFastSystemCallRet
ZwWaitForKeyedEvent
RtlSleepConditionVariableCS
SleepConditionVariableCS
from bug 1389021, lets also add:
NtWaitForAlertByThreadId -
https://crash-stats.mozilla.com/report/index/866ff980-2871-4ed9-8b7f-3a5ec0170921
NtWaitForKeyedEvent -
https://crash-stats.mozilla.com/report/index/d7a8fc7d-06b5-4001-99cf-278190170921
ZwWaitForAlertByThreadId -
https://crash-stats.mozilla.com/report/index/9acb7c15-10b5-4a58-a439-bfbaa0170921
Ted, would you mind sanity checking these? Benjamin would normally looked these over. :/
Flags: needinfo?(jmathies) → needinfo?(ted)
Assignee | ||
Comment 3•7 years ago
|
||
We have a fancy-schmantzy new signature generation tester, so I can show you what happens to signatures for x crashes for each string you want to try if that helps. It's super easy.
Assignee | ||
Comment 4•7 years ago
|
||
Looking at the prefix list, we have a .*WaitFor.* rule which should cover:
ZwWaitForKeyedEvent
NtWaitForAlertByThreadId
NtWaitForKeyedEvent
ZwWaitForAlertByThreadId
We also have an Rtl.* rule which should cover:
RtlSleepConditionVariableCS
That leaves:
KiFastSystemCallRet
SleepConditionVariableCS
Crash d7a8fc7d-06b5-4001-99cf-278190170921 is stopped on NtQueryPerformanceCounter which isn't in the list of things to add above.
Adding SleepConditionVariableCS should fix these crashes that you mentioned:
https://crash-stats.mozilla.com/report/index/866ff980-2871-4ed9-8b7f-3a5ec0170921
https://crash-stats.mozilla.com/report/index/9acb7c15-10b5-4a58-a439-bfbaa0170921
https://crash-stats.mozilla.com/report/index/1a796172-3898-4f5b-a699-4da0c0170514
Here's a super search for the last 6 months for KiFastSystemCallRet:
https://crash-stats.mozilla.com/search/?signature=~KiFastSystemCallRet&product=Firefox&date=%3E%3D2017-03-21T23%3A39%3A15.000Z&date=%3C2017-09-21T23%3A39%3A15.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
One group of these has a single frame which won't be affected by these changes. Like this crash:
https://crash-stats.mozilla.com/report/index/fb2c2169-65b6-436a-9c9c-597c20170917
The other group are all shutdown hangs and the ones I looked at have three frames, so those will improve. Like this crash:
https://crash-stats.mozilla.com/report/index/d57ed082-7de4-4b5e-ac2d-4f9350170802
So I suggest only adding these two:
KiFastSystemCallRet
SleepConditionVariableCS
Comment 5•7 years ago
|
||
I would suggest putting KiFastSystemCallRet in irrelevant_signature_re.txt. It doesn't really tell you anything of value, it's just "this thread is in a system call".
Flags: needinfo?(ted)
Comment 6•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #1)
> Oh wow--that wiki page is super out of date.
I replaced it with a link to the Socorro signature generation docs.
Reporter | ||
Comment 7•7 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] (Away Sep 23rd-Oct 1st) from comment #5)
> I would suggest putting KiFastSystemCallRet in irrelevant_signature_re.txt.
> It doesn't really tell you anything of value, it's just "this thread is in a
> system call".
sound good to me, over to Will.
Assignee | ||
Comment 8•7 years ago
|
||
Assignee | ||
Comment 9•7 years ago
|
||
KiFastSystemCallRet was already in the irrelevant list, so I did a PR for adding SleepConditionVariableCS to the prefix list.
Jim, Ted: Does this output look like what you were hoping for?:
https://github.com/mozilla-services/socorro/pull/4013#issuecomment-331439138
Flags: needinfo?(ted)
Flags: needinfo?(jmathies)
Reporter | ||
Comment 10•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #9)
> KiFastSystemCallRet was already in the irrelevant list, so I did a PR for
> adding SleepConditionVariableCS to the prefix list.
>
> Jim, Ted: Does this output look like what you were hoping for?:
>
> https://github.com/mozilla-services/socorro/pull/4013#issuecomment-331439138
What does this warning mean?
"SignatureTool: signature truncated due to length"
Are we losing granularity here due to the length of the resulting sagnatures?
Flags: needinfo?(willkg)
Flags: needinfo?(ted)
Flags: needinfo?(jmathies)
Assignee | ||
Comment 11•7 years ago
|
||
It means that the signature length exceeded 255 characters and the signature generation system truncated it.
Yes, we're definitely losing granularity by truncating, but I think beyond 255 characters, the granularity is too small for bucketing to be useful.
If you see other things we can do to make those signatures better, it's definitely worth opening up bugs for those changes.
Does that output look like what you were hoping for?
Flags: needinfo?(willkg) → needinfo?(jmathies)
Reporter | ||
Comment 12•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #11)
> It means that the signature length exceeded 255 characters and the signature
> generation system truncated it.
>
> Yes, we're definitely losing granularity by truncating, but I think beyond
> 255 characters, the granularity is too small for bucketing to be useful.
>
> If you see other things we can do to make those signatures better, it's
> definitely worth opening up bugs for those changes.
>
> Does that output look like what you were hoping for?
Don't think it'll do any good as is, we end up with generic signatures here due to the truncation. Can we force the removal of some of these frames to shorten things up? For example:
nsEventQueue::GetEvent
nsThread::nsChainedEventQueue::GetEvent
Currently we end up with:
(generic frames) | +
nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent | nsThread::GetEvent ...
nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent | nsThread::GetEvent ...
nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent | nsThread::GetE...
nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent | nsThread::GetEvent ...
nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent | nsThread::GetEvent ...
nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent | nsThread::GetE...
Flags: needinfo?(jmathies)
Assignee | ||
Comment 13•7 years ago
|
||
I think adding SleepConditionVariableCS to the prefix list is better, but the signatures still aren't helpful.
I'm going to land my PR and get that pushed out. I'm also going to change the scope of this bug to "make shutdownhang signatures better".
I'll keep working on this today. I want to look at shutdownhang bugs and get a feel for what they look like and patterns and a better handle on "what is the interesting part of the stack that makes these actionable".
Summary: [skiplist] Add SleepConditionVariableCS related frames to the skiplist → make shutdownhang signatures better
Comment 14•7 years ago
|
||
Commit pushed to master at https://github.com/mozilla-services/socorro
https://github.com/mozilla-services/socorro/commit/7414176075dd19a6d0db755b02f3b120ba8a49f1
fixes bug 1402037 - add SleepConditionVariableCS to prefix list (#4013)
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 15•7 years ago
|
||
The change for SleepConditionVariableCS went to -prod an hour ago.
I'm almost out of time for today. I'll get to more shutdownhang work on Monday.
Assignee | ||
Comment 16•7 years ago
|
||
Reopening this because it's not done, yet. Sorry about that.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → willkg
Status: REOPENED → ASSIGNED
Reporter | ||
Comment 18•7 years ago
|
||
We're still missing soem granularity here, maybe we can do something to address it. For example this signature:
shutdownhang | NtWaitForAlertByThreadId | RtlSleepConditionVariableCS | SleepConditionVariableCS | mozilla::detail::ConditionVariableImpl::wait | mozilla::CondVar::Wait | mozilla::ThreadEventQueue<T>::GetEvent
https://crash-stats.mozilla.com/signature/?product=Firefox&version=58.0a1&signature=shutdownhang%20%7C%20NtWaitForAlertByThreadId%20%7C%20RtlSleepConditionVariableCS%20%7C%20SleepConditionVariableCS%20%7C%20mozilla%3A%3Adetail%3A%3AConditionVariableImpl%3A%3Await%20%7C%20mozilla%3A%3ACondVar%3A%3AWait%20%7C%20mozilla%3A%3AThreadEventQueue%3CT%3E%3A%3AGetEvent&date=%3E%3D2017-09-16T20%3A20%3A11.000Z&date=%3C2017-09-23T20%3A20%3A11.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports
Is a bucket of various timeout, for example:
mozilla::layers::CompositorThreadHolder::Shutdown()
https://crash-stats.mozilla.com/report/index/6d5cf3e2-2ab6-4063-b8be-f49fc0170923
mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe()
https://crash-stats.mozilla.com/report/index/e623401f-d13e-4ed5-89ee-f3aeb0170923
mozilla::SharedThreadPoolShutdownObserver::Observe()
https://crash-stats.mozilla.com/report/index/d2024669-7473-4300-a697-2ccbe0170923
We could add to the ignore list I think to help here:
1) conditional variable frames like:
mozilla::detail::ConditionVariableImpl
mozilla::CondVar::Wait
2) default thread event processing
mozilla::ThreadEventQueue
nsThread::ProcessNextEvent
NS_ProcessNextEvent
Reporter | ||
Comment 19•7 years ago
|
||
Or we could ignore the left hand side, frames like
NtWaitForAlertByThreadId | RtlSleepConditionVariableCS | SleepConditionVariableCS
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 20•7 years ago
|
||
Another possible improvement - strip out namespace information to shorten signatures.
Reporter | ||
Updated•7 years ago
|
Summary: make shutdownhang signatures better → Improve granularity of shutdownhang signatures
Assignee | ||
Comment 21•7 years ago
|
||
I wrote up comment #13, but over the weekend became more aware that this is one of a myriad of similar bugs all driving at a similar purpose and that a lot of people might be interested. If I had know that earlier, I would have written a less terse comment #13. Sorry about that!
I purposely didn't use the word "granularity" in the summary of this bug. Shutdown hangs aren't really crashes. The stack for a shutdown hang situation is generally a bunch of uninteresting stuff and then the set of interesting/relevant frames. We're not looking for the point at which the code died, but rather what was going on while it was hanging. I don't think it makes sense to frame this around granularity (increasing the number of things we include) but rather framing this around relevancy (making sure signatures have the interesting/relevant frame as left-most as possible).
The signatures we were talking about prior to comment #13 all suffered from the same problem--there was so much generic stuff that the interesting/relevant bits get truncated. I don't know if that's because this bug was originally about signatures with SleepConditionVariableCS in them or whether this is true of many/most shutdownhang bugs.
I talked to Ted briefly on Friday before writing up comment #13 and we threw around some ideas:
1. maintain a separate irrelevant list for shutdownhang
This would work like the other irrelevant list and frames that were in the list would get skipped and not be part of the signature.
That might work fine, but might be a lot of maintenance.
2. maintain a separate sentinel list for shutdownhang
This would work like the other sentinel list and frames in this list determine the first frame we should look at. Anything before that would get dropped.
Maybe it's possible for us to add an item to this list that fixes shutdownhangs but isn't shutdownhang-specific.
3. some shutdownhang-specific behavior encoded in Python
Jim's idea to strip out namespace information would be in this category.
I need to spend some time looking at all the recent shutdownhang signatures and summarize what we've got. Then I'll be in a better spot to talk about suggestions.
Assignee | ||
Comment 22•7 years ago
|
||
I spent the bulk of today looking at like 16,000 shutdownhang crashes and I think going with something outlined in comment #18 and comment #19 seems most helpful.
I was thinking maybe we had to do something shutdownhang specific, but I looked at these parts and we pretty much only see them in shutdownhang crashes.
I want to add the following to the irrelevant list:
mozilla::CondVar::Wait
mozilla::detail::ConditionVariableImpl.*
mozilla::ThreadEventQueue.*
nsEventQueue::GetEvent
nsThread::GetEvent
nsThread::nsChainedEventQueue::GetEvent
nsThread::ProcessNextEvent
NS_ProcessNextEvent
NtWaitForKeyedEvent
RtlSleepConditionVariableCS
RtlSleepConditionVariableSRW
SleepConditionVariableCS
SleepConditionVariableSRW
ZwWaitForKeyedEvent
I'll attach the change in signatures particularly ones that we want to target which are getting truncated because they're too long.
Does this make the shutdownhangs situation better? Should we make this change, see what happens, and then do another round later this week?
Also, should we reprocess shutdownhang crashes with signatures that got truncated over the last week?
Assignee | ||
Comment 23•7 years ago
|
||
Shows changes in signatures for 1000 crashes from 2017-09-24.
Reporter | ||
Comment 24•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #22)
> I spent the bulk of today looking at like 16,000 shutdownhang crashes and I
> think going with something outlined in comment #18 and comment #19 seems
> most helpful.
>
> I was thinking maybe we had to do something shutdownhang specific, but I
> looked at these parts and we pretty much only see them in shutdownhang
> crashes.
>
> I want to add the following to the irrelevant list:
>
> mozilla::CondVar::Wait
> mozilla::detail::ConditionVariableImpl.*
> mozilla::ThreadEventQueue.*
> nsEventQueue::GetEvent
> nsThread::GetEvent
> nsThread::nsChainedEventQueue::GetEvent
> nsThread::ProcessNextEvent
> NS_ProcessNextEvent
> NtWaitForKeyedEvent
> RtlSleepConditionVariableCS
> RtlSleepConditionVariableSRW
> SleepConditionVariableCS
> SleepConditionVariableSRW
> ZwWaitForKeyedEvent
SFTM.
> Does this make the shutdownhangs situation better? Should we make this
> change, see what happens, and then do another round later this week?
Yes.
> Also, should we reprocess shutdownhang crashes with signatures that got
> truncated over the last week?
Na, volume tends to refresh things pretty quickly.
Reporter | ||
Comment 25•7 years ago
|
||
bleh - s/SFTM/SGTM
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #23)
> Created attachment 8911968 [details]
> new_signatures_20170925.txt
>
> Shows changes in signatures for 1000 crashes from 2017-09-24.
These look great!
Comment 26•7 years ago
|
||
That looks really good!
Could you also add mozilla::SpinEventLoopUntil<T> to the prefix list? There are a bunch of call sites that would be nice to distinguish. http://searchfox.org/mozilla-central/search?q=symbol:_ZN7mozilla18SpinEventLoopUntilEOT0_P9nsIThread&redirect=false
Assignee | ||
Comment 27•7 years ago
|
||
Andrew: Do you have an example crash for that?
Also, just to clarify, do you want to add mozilla::SpinEventLoopUntil<T> to the prefix list (signature continues past that frame) or the irrelevant list (frame is ignored and doesn't show up in the signature)?
Comment 28•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #27)
> Andrew: Do you have an example crash for that?
Here's one: bp-fd4655f3-d0f8-47dc-aff5-c9ecc0170926
> Also, just to clarify, do you want to add mozilla::SpinEventLoopUntil<T> to
> the prefix list (signature continues past that frame) or the irrelevant list
> (frame is ignored and doesn't show up in the signature)?
I think the prefix list makes sense. That way you can tell at a glance that the event loop is being spun. Though if you do that, you'd also want mozilla::SharedThreadPool::SpinUntilEmpty() (which is rarer, but is seen for example here bp-182340c8-fbd3-4d94-b2e6-1207f0170924 )
Either that or they could be added to the irrelevant signature list, but nsThread::ProcessNextEvent or some other event thing could be on the prefix list instead of the irrelevant signature list.
Nathan, do you have any suggestions about which frame of the "spinning the event loop" boilerplate is kind of the most canonical? We'd want that on the skip list, and the rest of the boilerplate on the irrelevant list, so we could see at a glance that a hang was happening while spinning the event loop, and where it was happening, but without cluttering the signature too much.
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 30•7 years ago
|
||
Here's a crash that has a truncated signature that isn't fixed by the changes in comment #22:
10767d8d-e9c5-47c0-88cb-66e980170925
shutdownhang | NtWaitForAlertByThreadId | RtlpWaitOnAddressWithTimeout | RtlpWaitOnAddress | RtlpWaitOnCriticalSection | RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | je_calloc | _PR_AttachThread | PR_NativeCreateThread | PR_CreateThrea...
Is that signature truncated before the interesting part?
Here's another one:
b0a9831f-1282-46f3-86b1-745810170925
shutdownhang | NtWaitForAlertByThreadId | RtlpWaitOnAddressWithTimeout | RtlpWaitOnAddress | RtlpWaitOnCriticalSection | RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | PR_Lock | nssTrustDomain_RemoveTokenCertsFromCache | nssToken_NotifyC...
Is that signature truncated before the interesting part?
Comment 31•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #28)
> I think the prefix list makes sense. That way you can tell at a glance that
> the event loop is being spun. Though if you do that, you'd also want
> mozilla::SharedThreadPool::SpinUntilEmpty() (which is rarer, but is seen
> for example here bp-182340c8-fbd3-4d94-b2e6-1207f0170924 )
>
> Nathan, do you have any suggestions about which frame of the "spinning the
> event loop" boilerplate is kind of the most canonical? We'd want that on the
> skip list, and the rest of the boilerplate on the irrelevant list, so we
> could see at a glance that a hang was happening while spinning the event
> loop, and where it was happening, but without cluttering the signature too
> much.
Did you mean "skip list" or "prefix list" for the canonical entry point(s)?
I think you want mozilla::SpinEventLoopUntil<T> and NS_ProcessNextEvent on a list that will show up in the signatures, so the event loop spinning (or at least waiting) is pretty obvious, and everything that would occur beneath those (nsThread::ProcessNextEvent, nsEventQueue::GetEvent, etc.) on the irrelevant list.
What would be *splendid* would be for these sorts of annotations to appear in the code themselves, and then processed out of the code for consumption by socorro, so we wouldn't have the problem of renaming things and having people panic because new crashes started appearing out of nowhere, and so people would have an inkling that these functions are important when they get renamed or potentially removed or what have you. If we could produce some sort of build artifact with this sort of information, is it technically feasible for that to be imported into socorro and used for builds with particular build IDs?
Flags: needinfo?(nfroyd) → needinfo?(willkg)
Assignee | ||
Comment 32•7 years ago
|
||
Assignee | ||
Comment 33•7 years ago
|
||
> If we could produce some sort of build artifact with this sort of information, is it technically feasible for that to be imported into socorro and used for builds with particular build IDs?
I don't see why not, but maybe this is a case of "the devil is in the details"? If we want to explore this, we'd definitely want to do it in a separate bug.
Flags: needinfo?(willkg)
Comment 34•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #30)
> shutdownhang | NtWaitForAlertByThreadId | RtlpWaitOnAddressWithTimeout |
> RtlpWaitOnAddress | RtlpWaitOnCriticalSection |
> RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | je_calloc |
> _PR_AttachThread | PR_NativeCreateThread | PR_CreateThrea...
>
> Is that signature truncated before the interesting part?
I think a lot of that could be added to the irrelevant list. NtWaitForAlertByThreadId RtlpWaitOnAddressWithTimeout RtlpWaitOnAddress RtlpWaitOnCriticalSection RtlpEnterCriticalSectionContended RtlEnterCriticalSection to start with. Maybe some of the NSPR thread creation stuff.
> shutdownhang | NtWaitForAlertByThreadId | RtlpWaitOnAddressWithTimeout |
> RtlpWaitOnAddress | RtlpWaitOnCriticalSection |
> RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | PR_Lock |
> nssTrustDomain_RemoveTokenCertsFromCache | nssToken_NotifyC...
>
> Is that signature truncated before the interesting part?
This looks like it would benefit from the same list of things being added to the irrelevant signature list.
Comment 35•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #33)
> > If we could produce some sort of build artifact with this sort of information, is it technically feasible for that to be imported into socorro and used for builds with particular build IDs?
>
> I don't see why not, but maybe this is a case of "the devil is in the
> details"? If we want to explore this, we'd definitely want to do it in a
> separate bug.
We are in violent agreement on this point. I will file.
Assignee | ||
Comment 36•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #34)
> (In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #30)
> > shutdownhang | NtWaitForAlertByThreadId | RtlpWaitOnAddressWithTimeout |
> > RtlpWaitOnAddress | RtlpWaitOnCriticalSection |
> > RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | je_calloc |
> > _PR_AttachThread | PR_NativeCreateThread | PR_CreateThrea...
> >
> > Is that signature truncated before the interesting part?
>
> I think a lot of that could be added to the irrelevant list.
> NtWaitForAlertByThreadId RtlpWaitOnAddressWithTimeout RtlpWaitOnAddress
> RtlpWaitOnCriticalSection RtlpEnterCriticalSectionContended
> RtlEnterCriticalSection to start with. Maybe some of the NSPR thread
> creation stuff.
I'll do that in the next pass.
Assignee | ||
Comment 37•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #35)
> (In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #33)
> > > If we could produce some sort of build artifact with this sort of information, is it technically feasible for that to be imported into socorro and used for builds with particular build IDs?
> >
> > I don't see why not, but maybe this is a case of "the devil is in the
> > details"? If we want to explore this, we'd definitely want to do it in a
> > separate bug.
>
> We are in violent agreement on this point. I will file.
Awesome--thank you!
Assignee | ||
Comment 38•7 years ago
|
||
So, to sum up where things are at here because I like summing things up:
1. PR 4017 (https://github.com/mozilla-services/socorro/pull/4017) covers the changes in comment #22. That'll get reviewed, land, and pushed to -prod soon.
2. In comment #26 and #28, Andrew suggested we add mozilla::SpinEventLoopUntil<T> to the prefix list. That needs to be worked on.
3. In comment #31, Nathan suggested annotating the code and generating a build artifact that Socorro could use which would reduce some of the manual maintenance we're doing to the lists. That'll be worked on in a separate bug.
4. In comment #34, Andrew suggested we add NtWaitForAlertByThreadId RtlpWaitOnAddressWithTimeout RtlpWaitOnAddress RtlpWaitOnCriticalSection RtlpEnterCriticalSectionContended RtlEnterCriticalSection to the irrelevant list. That needs to be worked on.
That comment also says "Maybe some of the NSPR thread creation stuff." That's cool, but I'd need someone to expand on that. We can do that in another pass, too.
Two things I want to mention. First, we're not going to solve all possible problems with this bug--we just want to make the situation better than it is now. I don't think anyone would argue with that, but it leads to my second issue. Second, I got a lot of other things I need to work on and I claim we're going to hit the 80/20 rule soon here. At that point, I think it's prudent to mark this FIXED (hopefully with raucous applause) and defer any outstanding work to new bugs.
Reporter | ||
Comment 39•7 years ago
|
||
Comment on attachment 8912298 [details] [review]
pr 4017 - add irrelevant strings
Review of attachment 8912298 [details] [review]:
-----------------------------------------------------------------
::: socorro/siglists/irrelevant_signature_re.txt
@@ +59,5 @@
> RtlpAdjustHeapLookasideDepth
> +RtlSleepConditionVariableCS
> +RtlSleepConditionVariableSRW
> +SleepConditionVariableCS
> +SleepConditionVariableSRW
These look like thy can be combined assuming these filter rules work:
(Rtl|)SleepConditionVariable(CS|SRW)
Attachment #8912298 -
Attachment is patch: true
Attachment #8912298 -
Attachment mime type: text/x-github-pull-request → text/plain
Reporter | ||
Comment 40•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #38)
> So, to sum up where things are at here because I like summing things up:
>
> 1. PR 4017 (https://github.com/mozilla-services/socorro/pull/4017) covers
> the changes in comment #22. That'll get reviewed, land, and pushed to -prod
> soon.
Yeah lets land what you have and resolve this bug out. We've done enough work here.
> 2. In comment #26 and #28, Andrew suggested we add
> mozilla::SpinEventLoopUntil<T> to the prefix list. That needs to be worked
> on.
-> follow up bug
> 3. In comment #31, Nathan suggested annotating the code and generating a
> build artifact that Socorro could use which would reduce some of the manual
> maintenance we're doing to the lists. That'll be worked on in a separate bug.
-> long term project, follow up bug
> 4. In comment #34, Andrew suggested we add NtWaitForAlertByThreadId
> RtlpWaitOnAddressWithTimeout RtlpWaitOnAddress RtlpWaitOnCriticalSection
> RtlpEnterCriticalSectionContended RtlEnterCriticalSection to the irrelevant
> list. That needs to be worked on.
-> low hanging fruit, follow up bug
> That comment also says "Maybe some of the NSPR thread creation stuff."
> That's cool, but I'd need someone to expand on that. We can do that in
> another pass, too.
-> lets see if we need it after we land current changes.
> Two things I want to mention. First, we're not going to solve all possible
> problems with this bug--we just want to make the situation better than it is
> now. I don't think anyone would argue with that, but it leads to my second
> issue. Second, I got a lot of other things I need to work on and I claim
> we're going to hit the 80/20 rule soon here. At that point, I think it's
> prudent to mark this FIXED (hopefully with raucous applause) and defer any
> outstanding work to new bugs.
Sounds good, thank Will!
Assignee | ||
Comment 41•7 years ago
|
||
Jim: We do reviews in github--not in Bugzilla comments. I can reply to it there.
Also, it looks like you broke the github link for that comment by changing the mime type. Can you change that back, please?
Reporter | ||
Updated•7 years ago
|
Attachment #8912298 -
Attachment is patch: false
Attachment #8912298 -
Attachment mime type: text/plain → text/x-github-pull-request
Comment 42•7 years ago
|
||
Commit pushed to master at https://github.com/mozilla-services/socorro
https://github.com/mozilla-services/socorro/commit/072704ba0079c47f63e909119f5ca53035e9662e
bug 1402037 - improve shutdownhang signatures (#4017)
* bug 1402037 - improve shutdownhang signatures
We have a number of shutdownhang signatures where the stack is rife with
"uninteresting" bits such that the signature gets truncated before getting to
something interesting or relevant or actionable.
This adds a bunch of things to the irrelevant list such that they don't show up
in the signature.
I did some regression testing and these strings only show up in shutdownhang
crashes. If that changes, we should rethink them on a string-by-string basis.
* Collapse two lines into one
Comment 43•7 years ago
|
||
this is how the distribution of shutdownhang signatures is looking now :-)
https://crash-stats.mozilla.com/search/?signature=%5Eshutdownhang&date=%3E%3D2017-09-27T18%3A00%3A00.000Z&_facets=signature&_facets=release_channel#facet-signature
Reporter | ||
Comment 44•7 years ago
|
||
(In reply to [:philipp] from comment #43)
> this is how the distribution of shutdownhang signatures is looking now :-)
> https://crash-stats.mozilla.com/search/
> ?signature=%5Eshutdownhang&date=%3E%3D2017-09-27T18%3A00%3A00.
> 000Z&_facets=signature&_facets=release_channel#facet-signature
Filtering out thunderbird -
https://crash-stats.mozilla.com/search/?signature=%5Eshutdownhang&product=Firefox&date=%3E%3D2017-09-27T18%3A00%3A00.000Z&date=%3C2017-09-28T11%3A36%3A00.000Z&_sort=-date&_facets=signature&_facets=release_channel&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
These signatures look &*%@! awesome! Thanks Will!
Assignee | ||
Comment 45•7 years ago
|
||
Bug #1403255 covers Nathan's idea of producing a build artifact that Socorro can use to augment its signature generation related lists.
I think if signatures are much better now, then we should celebrate and close this bug out. The nature of this problem is that things change over time. When things get rough again, we can open up a new bug, see what's going on, and address those observations then.
I wrote up some other bugs to fix issues with the signature generation cli tooling to make analysis and iterating over changes faster/easier. I'll work on those over the next few months. That'll make future work like this easier.
Given that, I'm going to mark this FIXED. We can open up new bugs as needed. Thank you everyone!
Status: ASSIGNED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Comment 46•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #38)
> 4. In comment #34, Andrew suggested we add NtWaitForAlertByThreadId
> RtlpWaitOnAddressWithTimeout RtlpWaitOnAddress RtlpWaitOnCriticalSection
> RtlpEnterCriticalSectionContended RtlEnterCriticalSection to the irrelevant
> list. That needs to be worked on.
Everything before RtlEnterCriticalSection is definitely uninteresting details of the Windows kernel in that stack you pointed out. I don't know if that wants to be a sentinel or what the best way to deal with that is.
You need to log in
before you can comment on or make changes to this bug.
Description
•