Closed Bug 1336478 Opened 8 years ago Closed 3 years ago

Crash in [@ mozilla::CycleCollectedJSContext::ProcessMetastableStateQueue]

Categories

(Core :: XPCOM, defect, P3)

51 Branch
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr45 --- unaffected
firefox51 --- wontfix
firefox52 + wontfix
firefox-esr52 --- wontfix
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox59 --- wontfix
firefox60 --- ?
firefox61 --- ?

People

(Reporter: philipp, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is report bp-eac07850-af35-4e34-9040-963d72170203. ============================================================= Crashing Thread (74) Frame Module Signature Source 0 xul.dll mozilla::CycleCollectedJSContext::ProcessMetastableStateQueue(unsigned int) xpcom/base/CycleCollectedJSContext.cpp:1351 1 xul.dll mozilla::CycleCollectedJSContext::AfterProcessTask(unsigned int) xpcom/base/CycleCollectedJSContext.cpp:1387 2 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1083 3 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:311 4 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:338 5 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:225 6 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:205 7 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:465 8 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 9 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 10 ucrtbase.dll _o__CIpow 11 kernel32.dll BaseThreadInitThunk 12 ntdll.dll __RtlUserThreadStart 13 ntdll.dll _RtlUserThreadStart this cross-platform crash signature is regressing since firefox 51 and later with MOZ_RELEASE_ASSERT(!mDoingStableStates) that was added with bug 1292892. it's a rather low volume crash though (<0.1% of crashes in 51.0.1). Correlations for Firefox Release (99.12% in signature vs 00.08% overall) moz_crash_reason = MOZ_RELEASE_ASSERT(!mDoingStableStates) (95.15% in signature vs 36.54% overall) reason = EXCEPTION_BREAKPOINT (17.18% in signature vs 00.24% overall) Addon "ascsurfingprotectionnew@iobit.com" = true [61.90% vs 00.32% if startup_crash = 0] (15.42% in signature vs 00.23% overall) Addon "ascsurfingprotectionnew@iobit.com" Version = 2.1.3 [55.56% vs 00.30% if startup_crash = 0] (33.92% in signature vs 02.95% overall) GFX_ERROR "Failed 2 buffer db=" = true [46.95% vs 05.29% if process_type = content] (25.11% in signature vs 00.58% overall) GFX_ERROR "Failed to create DIB section for a bitmap of size " = true [34.76% vs 01.89% if process_type = content] (79.74% in signature vs 28.82% overall) Module "qasf.dll" = true [72.22% vs 37.14% if platform_pretty_version = Windows 10] (31.28% in signature vs 03.12% overall) contains_memory_report = 1
Component: General → XPCOM
See Also: → 1312623
(In reply to [:philipp] from comment #0) > this cross-platform crash signature is regressing since firefox 51 and later > with MOZ_RELEASE_ASSERT(!mDoingStableStates) that was added with bug > 1292892. it's a rather low volume crash though (<0.1% of crashes in 51.0.1). Bug 1292892 did not add that assert. CC'ing some people who may know more. bz, smaug: see also bug 1312623.
oops sorry, i looked in the wrong line...
See Also: 1292892
I don't understand the stack trace from the initial comment. We don't have ProcessMetastableStateQueue more than once on stack there. Bug 1312623 looks like the issue which was fixed in http://searchfox.org/mozilla-central/rev/f5077ad52f8b90183e73038869f6140f0afbf427/dom/media/MediaStreamGraph.cpp#1720-1723
Tracking for 52 as a new crash.
this crash signature jumped up in volume again after the march release date. is there anything more we could do about it?
Mass wontfix for bugs affecting firefox 52.
Nathan, can you help get this to someone who can investigate? This is a pretty dramatic jump, I think in 52. Too late to fix for 53 but we could still shoot for 54. You can see the graph here https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3ACycleCollectedJSContext%3A%3AProcessMetastableStateQueue&date=%3E%3D2017-01-15T02%3A34%3A06.000Z&date=%3C2017-04-15T02%3A34%3A06.000Z#graphs
Flags: needinfo?(nfroyd)
Andrew, do you have any ideas here? I'm kind of with smaug's comment 3: I don't understand how mDoingStableStates can be true here...at least, assuming that the stack is correct. We're not doing something weird, like using the same JS context on multiple threads, are we? If you don't have any ideas, I guess I'll try looking at minidumps to see whether that stack is reasonable and what the value of mDoingStableStates might be. Maybe we're just looking at some weird memory corruption...?
Flags: needinfo?(nfroyd) → needinfo?(continuation)
Sorry, I don't know anything about this metastable state stuff.
Flags: needinfo?(continuation)
OK, I took at look at the crash from comment 0. The unwound stack looks reasonable from the stack memory in the crashdump, but I can't examine what the CycleCollectedJSContext looks like, as the crashdump doesn't have the range of memory containing the JS context. I also took a look at some other crashes, which looked a little more promising: https://crash-stats.mozilla.com/report/index/6acb7112-e2d3-41ac-9294-80c672170411 https://crash-stats.mozilla.com/report/index/54daad5e-bbf8-4c9c-8997-ac58f2170412 https://crash-stats.mozilla.com/report/index/d3d99b23-19d2-4969-afaa-4a6642170417 where it looks like we're processing from microtask checkpoints. Unfortunately, we're getting called from JIT code, so we can't see full stacks, but is it possible this code is getting called reentrantly?
Flags: needinfo?(bugs)
That is what the crash seems to hint. Media code had recently a bug where it re-entered metastable state handling. But the stack traces look still odd. What is that @0x0
Flags: needinfo?(bugs)
I can currently trigger this consistently in 53.0.2 by trying to open a link to thewrap.com from r/movies bp-96f7ea7e-8fed-421c-aed1-9c0a51170515 bp-4f6d823c-843b-4015-a544-6cd5a0170515 bp-06ddbcf1-05e1-478d-bbc6-6ebf00170515 Anything I can do to help debug?
Flags: needinfo?(nfroyd)
Flags: needinfo?(bugs)
Regression range would be really nice.
Flags: needinfo?(bugs)
Do you have _exact_ steps to reproduce? What does r/movies mean? So far I haven't managed to reproduce using FF 53.0.2 on linux
You do seem to have quite a few addons. Can you reproduce the issue without those addons?
(In reply to Olli Pettay [:smaug] from comment #14) > What does r/movies mean? Presumably he means https://www.reddit.com/r/movies/ Knowing if the issue reproduces without addons would be helpful.
Flags: needinfo?(nfroyd) → needinfo?(moz-ian)
(In reply to Olli Pettay [:smaug] from comment #14) > Do you have _exact_ steps to reproduce? > > What does r/movies mean? > > So far I haven't managed to reproduce using FF 53.0.2 on linux 0. Have 53.0.2 with NoScript installed. 1. Go to https://www.reddit.com/r/movies/ 2. (Middle) Click on the current link to http://www.thewrap.com/powers-boothe-emmy-winning-character-actor-dead-68/ (In reply to Olli Pettay [:smaug] from comment #15) > You do seem to have quite a few addons. Can you reproduce the issue without > those addons? Safe-mode: cannot reproduce. Disable just NoScript: cannot reproduce. In fact just allowing scripts globally with NoScript still enabled seems to stop it happening. The experience is basically identical to bug 1235183, which used to mean opening a facebook (or gfycat?) tab would crash the browser, and again seemed to be NoScript related.
Flags: needinfo?(moz-ian)
Still no luck reproducing, FF 53.0.2 on linux + NoScript.
oh, but I see it in the stack. http://searchfox.org/mozilla-central/rev/484d2b7f51b7aed035147bbb4a565061659d9278/netwerk/protocol/http/nsHttpChannel.cpp#6268 Looks like NoScript spins event loop during http-on-modify-request. That is not good, at all.
Giorgio, this looks like a NoScript bug to me. Is it spinning event loop at unsafe time?
Flags: needinfo?(g.maone)
(In reply to Olli Pettay [:smaug] from comment #20) > Giorgio, this looks like a NoScript bug to me. > Is it spinning event loop at unsafe time? No it should not, unless very obscure, deprecated (disabled by default of course and going away very soon) ABE options are enabled. Reporter, could you please try disabling ABE and/or sending me privately your NoScript Options>Export file? Thanks.
Flags: needinfo?(g.maone)
(In reply to Giorgio Maone [:mao] from comment #21) > (In reply to Olli Pettay [:smaug] from comment #20) > > Giorgio, this looks like a NoScript bug to me. > > Is it spinning event loop at unsafe time? > > No it should not, unless very obscure, deprecated (disabled by default of > course and going away very soon) ABE options are enabled. Ah. That sounds very much like it. I have... a few ABE rules. > Reporter, could you please try disabling ABE and/or sending me privately > your NoScript Options>Export file? > Thanks. Disabling ABE does seem to stop it, though re-enabling it doesn't bring the crash back (unless I restart the browser. Same thing happened when enabling scripts globally). Emailed you the file.
(In reply to Ian Moody [:Kwan] from comment #22) > Emailed you the file. Thank you, please reset your "abe.siteEnabled" about:config preference (AKA NoScript Options>Advanced>ABE>Allow sites to push their own rulesets) to its default "false" value. BTW, I don't think any production website has ever implemented its own server-side dynamic ABE rulesets, and that's exactly the deprecated option I was talking about ;)
I don't see any crashes on 57 in the past 2 weeks... and like, 12 for 56.
(In reply to Giorgio Maone [:mao] from comment #23) > (In reply to Ian Moody [:Kwan] from comment #22) > > > Emailed you the file. > > Thank you, please reset your "abe.siteEnabled" about:config preference (AKA > NoScript Options>Advanced>ABE>Allow sites to push their own rulesets) to its > default "false" value. > BTW, I don't think any production website has ever implemented its own > server-side dynamic ABE rulesets, and that's exactly the deprecated option I > was talking about ;) This did seem to fix it for the record, though as the STR stopped working it was hard to be certain. But when I turned it back on again for a while I'd occasionally get the crash, and then when back off again never did. Not entirely sure what my reasoning would have been behind turning it on in the first place.
Severity: critical → normal
Priority: -- → P3
Signature report for mozilla::CycleCollectedJSContext::ProcessMetastableStateQueue Showing results from 7 days ago Windows 10 957 30.9% Windows 7 951 30.7% Windows Vista 630 20.4% Windows XP 497 16.1% Windows 8.1 39 1.3% Windows Serv03 10 0.3% Linux 4 0.1% OS X 10.13 2 0.1% Windows 8 2 0.1% Android 1 0.0% Firefox 52.7.3esr 2212 71.4% 1960 Firefox 52.6.0esr 385 12.4% 284 Firefox 52.7.2esr 144 4.6% 116 Firefox 52.3.0esr 49 1.6% 17 Firefox 52.7.0esr 37 1.2% 37 Firefox 52.5.0esr 29 0.9% 22 Firefox 52.4.0esr 20 0.6% 13 Firefox 52.5.3esr 19 0.6% 19 Firefox 52.0.2esr 15 0.5% 6 Firefox 52.1.2esr 15 0.5% 13 Firefox 52.2.0esr 15 0.5% 13 Firefox 59.0.2 3 0.1% 3 Firefox 56.0.2 3 0.1% 3 Firefox 56.0.1 1 0.0% 1 Firefox 56.0 1 0.0% 1 Firefox 55.0.3 1 0.0% 1 Firefox 54.0b6 1 0.0% 1 Firefox 54.0.1 4 0.1% 3 Firefox 54.0 1 0.0% 1 Firefox 53.0b8 3 0.1% 1 Firefox 53.0b2 1 0.0% 1 FennecAndroid 58.0.1 1 0.0% 1 Uptime > 1 hour 2351 76.0% 15-60 min 480 15.5% 5-15 min 168 5.4% 1-5 min 61 2.0% < 1 min 33 1.1% Architecture x86 3077 99.5% amd64 15 0.5% arm 1 0.0%
Summary: Crash in mozilla::CycleCollectedJSContext::ProcessMetastableStateQueue → Crash in [@ mozilla::CycleCollectedJSContext::ProcessMetastableStateQueue]
This appears to still be lurking around on release at an extremely lower frequency. Bug 1452416 should take care of the high crash rate on ESR52.

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.