<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Description

•

8 years ago

This is #13 in 57 release. This bug was filed from the Socorro interface and is report bp-25f8e0bb-b99a-4242-a3a2-0fe5a0171120. ============================================================= Top 10 frames of crashing thread: 0 ntdll.dll NtWaitForMultipleObjects 1 kernelbase.dll RtlAnsiStringToUnicodeString 2 kernel32.dll WaitForMultipleObjectsExImplementation 3 user32.dll RealMsgWaitForMultipleObjectsEx 4 user32.dll MsgWaitForMultipleObjectsEx 5 ole32.dll CCliModalLoop::BlockFn 6 ole32.dll ModalLoop 7 ole32.dll CChannelHandle::RestoreToken 8 ole32.dll `AsyncStubInvoke'::`1'::filt$0 9 ole32.dll ChannelProcessUninitialize =============================================================

Reporter

Comment 1

•

8 years ago

NI some usual suspects for awareness.

Flags: needinfo?(jteh)

Flags: needinfo?(jmathies)

Flags: needinfo?(aklotz)

Reporter

Comment 2

•

8 years ago

This could be related to a lot of the freezes users are reporting. 100.0% in signature vs 16.53% overall) moz_crash_reason = MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.)

Comment 3

•

8 years ago

Yeah, based on the samples that I've seen with this sig, it's an "unconventional" a11y client that is doing a bunch of stuff, which ends up being so slow that everything just grinds to a halt, hence the forced shutdown.

Flags: needinfo?(aklotz)

Comment 4

•

8 years ago

Aggregating that crash signature on "Accessibility client" shows some useful info.

Comment 5

•

8 years ago

There are some crashes in 58 as well but none yet in 59.

status-firefox57: --- → affected

status-firefox58: --- → affected

status-firefox59: --- → ?

Keywords: regression

Jim Mathies [:jimm]

Comment 6

•

8 years ago

100% have accessibility turned on. About 60% have an a11y instantiator string set. Nearly 100% are UNKNOWN (MSAA) access based. https://crash-stats.mozilla.com/search/?signature=%3Dshutdownhang%20%7C%20RtlAnsiStringToUnicodeString%20%7C%20MsgWaitForMultipleObjectsEx%20%7C%20CCliModalLoop%3A%3ABlockFn&accessibility=__true__&product=Firefox&date=%3E%3D2017-11-13T12%3A55%3A10.000Z&date=%3C2017-11-20T12%3A55%3A10.000Z&_sort=-date&_facets=signature&_facets=accessibility_client&_facets=accessibility_in_proc_client&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-accessibility_in_proc_client

Flags: needinfo?(jmathies)

Jim Mathies [:jimm]

Updated

•

8 years ago

status-firefox57: affected → wontfix

status-firefox58: affected → fix-optional

status-firefox59: ? → ---

Priority: -- → P3

Firefox Bug Husbandry Bot

Updated

•

8 years ago

Comment 7

•

8 years ago

https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Move_fix-optionals

status-firefox58: fix-optional → wontfix

status-firefox59: --- → ?

Trevor Skywalker

Updated

•

8 years ago

Depends on: 1405290

Trevor Skywalker

Updated

•

8 years ago

Depends on: 1437575

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

status-firefox60: --- → affected

status-firefox61: --- → affected

status-firefox-esr52: --- → wontfix

Marion Daly [:mdaly]

Updated

•

8 years ago

status-firefox60: affected → fix-optional

status-firefox61: affected → fix-optional

alexander :surkov (:asurkov)

Updated

•

8 years ago

Whiteboard: a11y:crash-win

Comment 8

•

7 years ago

the [@ shutdownhang | NtUserMsgWaitForMultipleObjectsEx | CCliModalLoop::BlockFn ] signature is jumping up on windows 10 since 2018-09-11 across release channels. many user comments indicate that users aren't perceiving this as a crash on shutdown but a hang/freeze during browsing. the signature accounts for 2.5% of browser crashes on release in the past few days.

Comment 9

•

7 years ago

could someone look into that recent crash spike?

status-firefox59: ? → wontfix

status-firefox60: fix-optional → wontfix

status-firefox61: fix-optional → wontfix

status-firefox62: --- → affected

status-firefox63: --- → affected

status-firefox64: --- → affected

status-firefox-esr60: --- → affected

Flags: needinfo?(dbolter)

OS: Windows 7 → All

Hardware: Unspecified → All

Updated

•

7 years ago

OS: All → Windows

Comment 10

•

7 years ago

Taking NI from davidb, since I manage a11y now. I actually spent a while looking into this last week, but didn't turn up anything useful. :( It doesn't seem that this is related to a specific third party accessibility client. One important point is that it seems many of these stacks have something in msctf.dll (which I think is associated with Microsoft Text Services Framework) calling IAccessible::accRole on an oleacc proxy for an accessible in a content process. The oleacc proxy then queries for some interface (maybe IAccIdentity? I need to check that), at which point things freeze. Note that this isn't true for all stacks; some have msctf calling AccessibleObjectFromEvent, which in turns calls accChild, which then freezes. Regardless, these stacks suggest the content process stopped responding when (or before?) an a11y query was made, but what I don't understand is why. And without a stack from the content process (or a way to reproduce this), that's going to be nearly impossible to figure out. Theoretically, content might not even be freezing in the a11y call; it might have frozen before that point. I also can't see anything that landed in the Gecko accessibility module around the time this spike started that seems like it might have any impact on this kind of thing. I'll keep digging.

Flags: needinfo?(dbolter)

Comment 11

•

7 years ago

>I also can't see anything that landed in the Gecko accessibility module around the time this spike started that seems >like it might have any impact on this kind of thing. the spike happened across different release channels at the same time, so it's unlikely that some code-change from our side caused it. timing wise the spike took place on the same day as the monthly windows patchday, so there might be some connection and a change in the platform triggered this. asian locale builds (zh-cn, ja, ko) seem to be slightly overrepresented in these crash reports for what it's worth...

Comment 12

•

7 years ago

Marking this wontfix for 62/63, but we could still potentially take a fix for 64. The crash volume is very high on 62 on release, with over 5000 crashes in the last week.

Updated

•

7 years ago

status-firefox62: affected → wontfix

Comment 13

•

7 years ago

Marking fix-optional for 64. We could still take a patch for 65, and if it's verified and doesn't seem risky, could still take fixes for 64 as well.

status-firefox64: affected → fix-optional

Updated

•

7 years ago

status-firefox63: affected → wontfix

status-firefox65: --- → affected

Updated

•

7 years ago

Comment 15

•

7 years ago

Adding 66 as affected. In 64 release shutdownhang | NtUserMsgWaitForMultipleObjectsEx | CCliModalLoop::BlockFn is #10 overall.

status-firefox66: --- → affected

Ryan VanderMeulen [:RyanVM]

Comment 16

•

7 years ago

Looks like something in the 20181224215145 build may have decreased the crash volume in Nightly.

Comment 17

•

7 years ago

Not going to make 65 at this point, but leaving it set to fix-optional as a possible dot release ride-along should a low-risk patch be available.

status-firefox64: fix-optional → wontfix

status-firefox65: affected → fix-optional

BugBot [:suhaib / :marco/ :calixte]

Comment 18

•

7 years ago

Tracking for 66 to keep an eye on this in beta.

tracking-firefox66: --- → +

Comment 19

•

7 years ago

Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See How Do You Triage for more information

Priority: P3 → P2

Comment 20

•

7 years ago

Marking 67 as affected. #6 overall browser crash in 65 release.

status-firefox67: --- → affected

Comment 21

•

7 years ago

Adding another signature, Windows 10.

Reporter

Comment 22

•

7 years ago

NI Aaron in case he has any quick hunches.

Flags: needinfo?(aklotz)

Comment 23

•

7 years ago

•

Edited

It would be really helpful if we also had dumps from the content process side of things.

By virtue of the fact that we already know which child process we're waiting on (since we're resolving a child id that has a content process id encoded into it), it would be nice if we had some kind of RAII mechanism to say, "Yo, hang reporter, if you get any hangs while I'm on the stack, take a paired minidump with this process, okay?"

Sorry I don't have a better answer atm, but we need more data :-)

Flags: needinfo?(aklotz)

Comment 24

•

7 years ago

Aaron, do you know who might work on getting us that sort of information? is that something for work on the crash reporter itself?

Flags: needinfo?(aklotz)

Gabriele Svelto [:gsvelto]

Comment 25

•

7 years ago

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #24)

Aaron, do you know who might work on getting us that sort of information? is that something for work on the crash reporter itself?

I think it probably involves the hang reporter more than anything. IIRC Gabriele has recently worked on the hang monitor.

Gabriele, is my proposal in Comment 23 reasonable? Do you or somebody you know have cycles to implement such an API?

Flags: needinfo?(aklotz) → needinfo?(gsvelto)

Comment 26

•

7 years ago

Yes, we could modify the shutdown terminator with something like that. The only issue that would need to be addressed is that we're currently just calling MOZ_CRASH() to generate the crash report; we would need to issue the action to take a paired minidump manually. It's not a huge deal but I'm swamped with urgent stuff and won't have free cycles before the end of next week.

Flags: needinfo?(gsvelto)

Updated

•

7 years ago

Depends on: 1527668

Comment 27

•

7 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #26)

Yes, we could modify the shutdown terminator with something like that. The only issue that would need to be addressed is that we're currently just calling MOZ_CRASH() to generate the crash report; we would need to issue the action to take a paired minidump manually. It's not a huge deal but I'm swamped with urgent stuff and won't have free cycles before the end of next week.

I filed bug 1527668 for this part.

Comment 28

•

7 years ago

Discussed with Gabriele in email; other projects have priority for this week.

Updated

•

7 years ago

tracking-firefox67: --- → ?

Updated

•

7 years ago

tracking-firefox67: ? → +

Reporter

Comment 29

•

7 years ago

Jamie we need an assignee on tracked bugs.

Comment 30

•

7 years ago

Since this needs an owner, I'll "own" this, but it's important to note that right now, it isn't actionable without more data/a reproduceable test case.

Assignee: nobody → jteh

Tim Spurway [:tspurway]

Updated

•

7 years ago

status-firefox66: affected → fix-optional

tracking-firefox66: + → ---

Comment 31

•

7 years ago

We just had 50+ crashes over the week end on Nightly 67, Jamie is there something that landed recently that could explain this sudden spike on Nightly?

Comment 32

•

7 years ago

This is only a hunch, but next gen storage just got re-enabled on Nightly in bug 1517090. It was disabled originally due to a11y hangs; see bug 1516136. The hang was believed to be fixed (and I couldn't reproduce it any more with brief testing), but perhaps there's still an edge case?

Flags: needinfo?(jteh)

Comment 33

•

7 years ago

This is now the #2 browser crash on 67 nightly (shutdownhang | NtUserMsgWaitForMultipleObjectsEx | CCliModalLoop::BlockFn).

Keywords: topcrash

Comment 34

•

7 years ago

The deadlock described in comment 32 was fixed in bug 1534208. However, there seems to be another one: bug 1535221.

It'd be interesting to know whether the spike is caused by builds without the fix for bug 1534208 or whether the spike is still occurring even with that fix.

Comment 35

•

7 years ago

(In reply to James Teh [:Jamie] from comment #34)

It'd be interesting to know whether the spike is caused by builds without the fix for bug 1534208 or whether the spike is still occurring even with that fix.

the crash volume went down considerably in builds 20190311215435 & later that have the fix (from 20-50 crashes per build to 1-10 now)

Comment 36

•

7 years ago

(In reply to James Teh [:Jamie] from comment #32)

This is only a hunch, but next gen storage just got re-enabled on Nightly in bug 1517090. It was disabled originally due to a11y hangs; see bug 1516136. The hang was believed to be fixed (and I couldn't reproduce it any more with brief testing), but perhaps there's still an edge case?

The volume is very high on beta with about 100 crashes per day, so setting as P1. LSNG is delayed to 68 and will be deactivated in the next beta (67.0b9), which should tell you if the theory is correct.

status-firefox65: fix-optional → wontfix

status-firefox68: --- → affected

Priority: P2 → P1

Comment 37

•

7 years ago

Crashes went down by more than an order of magnitude on beta after we deactivated LSNG (310 crashes in beta 8 vs 22 in beta 10) so no need to track it anymore for 67 and marking as fix-optional for the release, but tracking for 68.

status-firefox66: fix-optional → wontfix

status-firefox67: affected → fix-optional

tracking-firefox67: + → -

tracking-firefox68: --- → +

Patricia Lawless

Comment 38

•

7 years ago

Hi Jamie, looks like this is related to LSNG (which is probably being enabled again for 68). Any updates? (tracking this in release triage for 68). Thanks!

Flags: needinfo?(jteh)

Jan Varga [:janv]

Comment 39

•

7 years ago

LSNG issue was fixed in bug 1535221.

Comment 40

•

7 years ago

I think we're now back to the original situation here; i.e. not LSNG, thus much lower volume, but without repro or data and thus little hope of a diagnosis or fix. :( See comment 10, comment 23, comment 26, comment 27.

Flags: needinfo?(jteh)

Updated

•

7 years ago

status-firefox69: --- → affected

Martijn Wargers (dead)

Comment 41

•

7 years ago

I might be able to reproduce.

Updated

•

7 years ago

status-firefox67: fix-optional → wontfix

Julien Cristau [:jcristau]

Comment 42

•

7 years ago

Martijn: Were you able to reproduce this issue? Thanks.

Flags: needinfo?(martijn.martijn)

Comment 43

•

6 years ago

wontfix for 68 based on comment 40.

status-firefox68: affected → wontfix

Comment 44

•

6 years ago

Also marking this stalled based on comment 40. Happy to take a patch though or otherwise help out.

status-firefox69: affected → fix-optional

status-firefox70: --- → fix-optional

Keywords: stalled

Martijn Wargers (dead)

Updated

•

6 years ago

Flags: needinfo?(martijn.martijn)

Updated

•

6 years ago

Assignee: jteh → nobody

Updated

•

6 years ago

Priority: P1 → P3

Sean Feng [:sefeng211]

Updated

•

5 years ago

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Depends on: 1737193

GerardoPcp04

Comment 46

•

3 years ago

Problem with Firefox 101.0: Crash Report [@ shutdownhang | NtUserMsgWaitForMultipleObjectsEx | CCliModalLoop::BlockFn ]
https://crash-stats.mozilla.org/report/index/daf246d5-6b58-4387-b612-5e8340220606#tab-bugzilla

Comment 47

•

3 years ago

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

BMO Automation

Updated

•

3 years ago

Severity: critical → S2

Jens Stutte [:jstutte]

Updated

•

3 years ago

Blocks: RunWatchdogShutdownhang