Crash in shutdownhang | RtlAnsiStringToUnicodeString | MsgWaitForMultipleObjectsEx | CCliModalLoop::BlockFn
Categories
(Core :: Disability Access APIs, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | wontfix |
firefox-esr60 | --- | wontfix |
firefox-esr102 | --- | wontfix |
firefox57 | --- | wontfix |
firefox58 | --- | wontfix |
firefox59 | --- | wontfix |
firefox60 | --- | wontfix |
firefox61 | --- | wontfix |
firefox62 | --- | wontfix |
firefox63 | --- | wontfix |
firefox64 | --- | wontfix |
firefox65 | --- | wontfix |
firefox66 | --- | wontfix |
firefox67 | - | wontfix |
firefox68 | + | wontfix |
firefox69 | --- | wontfix |
firefox70 | --- | wontfix |
firefox113 | --- | fixed |
firefox114 | --- | fixed |
firefox115 | --- | fixed |
People
(Reporter: davidb, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: crash, regression, Whiteboard: a11y:crash-win)
Crash Data
This is #13 in 57 release. This bug was filed from the Socorro interface and is report bp-25f8e0bb-b99a-4242-a3a2-0fe5a0171120. ============================================================= Top 10 frames of crashing thread: 0 ntdll.dll NtWaitForMultipleObjects 1 kernelbase.dll RtlAnsiStringToUnicodeString 2 kernel32.dll WaitForMultipleObjectsExImplementation 3 user32.dll RealMsgWaitForMultipleObjectsEx 4 user32.dll MsgWaitForMultipleObjectsEx 5 ole32.dll CCliModalLoop::BlockFn 6 ole32.dll ModalLoop 7 ole32.dll CChannelHandle::RestoreToken 8 ole32.dll `AsyncStubInvoke'::`1'::filt$0 9 ole32.dll ChannelProcessUninitialize =============================================================
Reporter | ||
Comment 1•6 years ago
|
||
NI some usual suspects for awareness.
Reporter | ||
Comment 2•6 years ago
|
||
This could be related to a lot of the freezes users are reporting. 100.0% in signature vs 16.53% overall) moz_crash_reason = MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.)
Comment 3•6 years ago
|
||
Yeah, based on the samples that I've seen with this sig, it's an "unconventional" a11y client that is doing a bunch of stuff, which ends up being so slow that everything just grinds to a halt, hence the forced shutdown.
Comment 4•6 years ago
|
||
Aggregating that crash signature on "Accessibility client" shows some useful info.
There are some crashes in 58 as well but none yet in 59.
![]() |
||
Comment 6•6 years ago
|
||
100% have accessibility turned on. About 60% have an a11y instantiator string set. Nearly 100% are UNKNOWN (MSAA) access based. https://crash-stats.mozilla.com/search/?signature=%3Dshutdownhang%20%7C%20RtlAnsiStringToUnicodeString%20%7C%20MsgWaitForMultipleObjectsEx%20%7C%20CCliModalLoop%3A%3ABlockFn&accessibility=__true__&product=Firefox&date=%3E%3D2017-11-13T12%3A55%3A10.000Z&date=%3C2017-11-20T12%3A55%3A10.000Z&_sort=-date&_facets=signature&_facets=accessibility_client&_facets=accessibility_in_proc_client&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-accessibility_in_proc_client
![]() |
||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 7•5 years ago
|
||
https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Move_fix-optionals
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 8•5 years ago
|
||
the [@ shutdownhang | NtUserMsgWaitForMultipleObjectsEx | CCliModalLoop::BlockFn ] signature is jumping up on windows 10 since 2018-09-11 across release channels. many user comments indicate that users aren't perceiving this as a crash on shutdown but a hang/freeze during browsing. the signature accounts for 2.5% of browser crashes on release in the past few days.
Comment 9•5 years ago
|
||
could someone look into that recent crash spike?
Updated•5 years ago
|
Comment 10•5 years ago
|
||
Taking NI from davidb, since I manage a11y now. I actually spent a while looking into this last week, but didn't turn up anything useful. :( It doesn't seem that this is related to a specific third party accessibility client. One important point is that it seems many of these stacks have something in msctf.dll (which I think is associated with Microsoft Text Services Framework) calling IAccessible::accRole on an oleacc proxy for an accessible in a content process. The oleacc proxy then queries for some interface (maybe IAccIdentity? I need to check that), at which point things freeze. Note that this isn't true for all stacks; some have msctf calling AccessibleObjectFromEvent, which in turns calls accChild, which then freezes. Regardless, these stacks suggest the content process stopped responding when (or before?) an a11y query was made, but what I don't understand is why. And without a stack from the content process (or a way to reproduce this), that's going to be nearly impossible to figure out. Theoretically, content might not even be freezing in the a11y call; it might have frozen before that point. I also can't see anything that landed in the Gecko accessibility module around the time this spike started that seems like it might have any impact on this kind of thing. I'll keep digging.
Comment 11•5 years ago
|
||
>I also can't see anything that landed in the Gecko accessibility module around the time this spike started that seems >like it might have any impact on this kind of thing.
the spike happened across different release channels at the same time, so it's unlikely that some code-change from our side caused it. timing wise the spike took place on the same day as the monthly windows patchday, so there might be some connection and a change in the platform triggered this.
asian locale builds (zh-cn, ja, ko) seem to be slightly overrepresented in these crash reports for what it's worth...
Marking this wontfix for 62/63, but we could still potentially take a fix for 64. The crash volume is very high on 62 on release, with over 5000 crashes in the last week.
Marking fix-optional for 64. We could still take a patch for 65, and if it's verified and doesn't seem risky, could still take fixes for 64 as well.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 15•5 years ago
|
||
Adding 66 as affected. In 64 release shutdownhang | NtUserMsgWaitForMultipleObjectsEx | CCliModalLoop::BlockFn is #10 overall.
Looks like something in the 20181224215145 build may have decreased the crash volume in Nightly.
Comment 17•4 years ago
|
||
Not going to make 65 at this point, but leaving it set to fix-optional as a possible dot release ride-along should a low-risk patch be available.
Tracking for 66 to keep an eye on this in beta.
Comment 19•4 years ago
|
||
Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See How Do You Triage for more information
Comment 20•4 years ago
|
||
Marking 67 as affected. #6 overall browser crash in 65 release.
Comment 21•4 years ago
|
||
Adding another signature, Windows 10.
Reporter | ||
Comment 22•4 years ago
|
||
NI Aaron in case he has any quick hunches.
Comment 23•4 years ago
•
|
||
It would be really helpful if we also had dumps from the content process side of things.
By virtue of the fact that we already know which child process we're waiting on (since we're resolving a child id that has a content process id encoded into it), it would be nice if we had some kind of RAII mechanism to say, "Yo, hang reporter, if you get any hangs while I'm on the stack, take a paired minidump with this process, okay?"
Sorry I don't have a better answer atm, but we need more data :-)
Aaron, do you know who might work on getting us that sort of information? is that something for work on the crash reporter itself?
Comment 25•4 years ago
|
||
(In reply to Liz Henry (:lizzard) (use needinfo) from comment #24)
Aaron, do you know who might work on getting us that sort of information? is that something for work on the crash reporter itself?
I think it probably involves the hang reporter more than anything. IIRC Gabriele has recently worked on the hang monitor.
Gabriele, is my proposal in Comment 23 reasonable? Do you or somebody you know have cycles to implement such an API?
Comment 26•4 years ago
|
||
Yes, we could modify the shutdown terminator with something like that. The only issue that would need to be addressed is that we're currently just calling MOZ_CRASH() to generate the crash report; we would need to issue the action to take a paired minidump manually. It's not a huge deal but I'm swamped with urgent stuff and won't have free cycles before the end of next week.
Comment 27•4 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #26)
Yes, we could modify the shutdown terminator with something like that. The only issue that would need to be addressed is that we're currently just calling MOZ_CRASH() to generate the crash report; we would need to issue the action to take a paired minidump manually. It's not a huge deal but I'm swamped with urgent stuff and won't have free cycles before the end of next week.
I filed bug 1527668 for this part.
Discussed with Gabriele in email; other projects have priority for this week.
Updated•4 years ago
|
Reporter | ||
Comment 29•4 years ago
|
||
Jamie we need an assignee on tracked bugs.
Comment 30•4 years ago
|
||
Since this needs an owner, I'll "own" this, but it's important to note that right now, it isn't actionable without more data/a reproduceable test case.
Updated•4 years ago
|
Comment 31•4 years ago
|
||
We just had 50+ crashes over the week end on Nightly 67, Jamie is there something that landed recently that could explain this sudden spike on Nightly?
Comment 32•4 years ago
|
||
This is only a hunch, but next gen storage just got re-enabled on Nightly in bug 1517090. It was disabled originally due to a11y hangs; see bug 1516136. The hang was believed to be fixed (and I couldn't reproduce it any more with brief testing), but perhaps there's still an edge case?
Comment 33•4 years ago
|
||
This is now the #2 browser crash on 67 nightly (shutdownhang | NtUserMsgWaitForMultipleObjectsEx | CCliModalLoop::BlockFn).
Comment 34•4 years ago
|
||
The deadlock described in comment 32 was fixed in bug 1534208. However, there seems to be another one: bug 1535221.
It'd be interesting to know whether the spike is caused by builds without the fix for bug 1534208 or whether the spike is still occurring even with that fix.
Comment 35•4 years ago
|
||
(In reply to James Teh [:Jamie] from comment #34)
It'd be interesting to know whether the spike is caused by builds without the fix for bug 1534208 or whether the spike is still occurring even with that fix.
the crash volume went down considerably in builds 20190311215435 & later that have the fix (from 20-50 crashes per build to 1-10 now)
Comment 36•4 years ago
|
||
(In reply to James Teh [:Jamie] from comment #32)
This is only a hunch, but next gen storage just got re-enabled on Nightly in bug 1517090. It was disabled originally due to a11y hangs; see bug 1516136. The hang was believed to be fixed (and I couldn't reproduce it any more with brief testing), but perhaps there's still an edge case?
The volume is very high on beta with about 100 crashes per day, so setting as P1. LSNG is delayed to 68 and will be deactivated in the next beta (67.0b9), which should tell you if the theory is correct.
Comment 37•4 years ago
|
||
Crashes went down by more than an order of magnitude on beta after we deactivated LSNG (310 crashes in beta 8 vs 22 in beta 10) so no need to track it anymore for 67 and marking as fix-optional for the release, but tracking for 68.
Comment 38•4 years ago
|
||
Hi Jamie, looks like this is related to LSNG (which is probably being enabled again for 68). Any updates? (tracking this in release triage for 68). Thanks!
Comment 39•4 years ago
|
||
LSNG issue was fixed in bug 1535221.
Comment 40•4 years ago
|
||
I think we're now back to the original situation here; i.e. not LSNG, thus much lower volume, but without repro or data and thus little hope of a diagnosis or fix. :( See comment 10, comment 23, comment 26, comment 27.
Updated•4 years ago
|
Comment 41•4 years ago
|
||
I might be able to reproduce.
Updated•4 years ago
|
Comment 42•4 years ago
|
||
Martijn: Were you able to reproduce this issue? Thanks.
Comment 43•4 years ago
|
||
wontfix for 68 based on comment 40.
Also marking this stalled based on comment 40. Happy to take a patch though or otherwise help out.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 46•11 months ago
|
||
Problem with Firefox 101.0: Crash Report [@ shutdownhang | NtUserMsgWaitForMultipleObjectsEx | CCliModalLoop::BlockFn ]
https://crash-stats.mozilla.org/report/index/daf246d5-6b58-4387-b612-5e8340220606#tab-bugzilla
Comment 47•8 months ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Updated•8 months ago
|
Updated•20 days ago
|
Comment 48•18 days ago
|
||
This is resolved by Cache the World, which is enabled by default in Firefox 113.
Comment 49•18 days ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.
Updated•17 days ago
|
Description
•