If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Crash in mozilla::CycleCollectedJSContext::ProcessMetastableStateQueue

NEW
Unassigned

Status

()

Core
XPCOM
--
critical
8 months ago
9 days ago

People

(Reporter: philipp, Unassigned)

Tracking

({crash, regression})

51 Branch
crash, regression
Points:
---

Firefox Tracking Flags

(firefox-esr45 unaffected, firefox51 wontfix, firefox52+ wontfix, firefox-esr52 affected, firefox53 wontfix, firefox54 wontfix, firefox55 wontfix, firefox56 wontfix, firefox57 ?)

Details

(crash signature)

(Reporter)

Description

8 months ago
This bug was filed from the Socorro interface and is 
report bp-eac07850-af35-4e34-9040-963d72170203.
=============================================================
Crashing Thread (74)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::CycleCollectedJSContext::ProcessMetastableStateQueue(unsigned int) 	xpcom/base/CycleCollectedJSContext.cpp:1351
1 	xul.dll 	mozilla::CycleCollectedJSContext::AfterProcessTask(unsigned int) 	xpcom/base/CycleCollectedJSContext.cpp:1387
2 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1083
3 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:311
4 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:338
5 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:225
6 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:205
7 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:465
8 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
9 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
10 	ucrtbase.dll 	_o__CIpow 	
11 	kernel32.dll 	BaseThreadInitThunk 	
12 	ntdll.dll 	__RtlUserThreadStart 	
13 	ntdll.dll 	_RtlUserThreadStart

this cross-platform crash signature is regressing since firefox 51 and later with MOZ_RELEASE_ASSERT(!mDoingStableStates) that was added with bug 1292892. it's a rather low volume crash though (<0.1% of crashes in 51.0.1).

Correlations for Firefox Release
(99.12% in signature vs 00.08% overall) moz_crash_reason = MOZ_RELEASE_ASSERT(!mDoingStableStates)
(95.15% in signature vs 36.54% overall) reason = EXCEPTION_BREAKPOINT
(17.18% in signature vs 00.24% overall) Addon "ascsurfingprotectionnew@iobit.com" = true [61.90% vs 00.32% if startup_crash = 0]
(15.42% in signature vs 00.23% overall) Addon "ascsurfingprotectionnew@iobit.com" Version = 2.1.3 [55.56% vs 00.30% if startup_crash = 0]
(33.92% in signature vs 02.95% overall) GFX_ERROR "Failed 2 buffer db=" = true [46.95% vs 05.29% if process_type = content]
(25.11% in signature vs 00.58% overall) GFX_ERROR "Failed to create DIB section for a bitmap of size " = true [34.76% vs 01.89% if process_type = content]
(79.74% in signature vs 28.82% overall) Module "qasf.dll" = true [72.22% vs 37.14% if platform_pretty_version = Windows 10]
(31.28% in signature vs 03.12% overall) contains_memory_report = 1
Component: General → XPCOM
See Also: → bug 1312623
(In reply to [:philipp] from comment #0)
> this cross-platform crash signature is regressing since firefox 51 and later
> with MOZ_RELEASE_ASSERT(!mDoingStableStates) that was added with bug
> 1292892. it's a rather low volume crash though (<0.1% of crashes in 51.0.1).

Bug 1292892 did not add that assert. CC'ing some people who may know more.

bz, smaug: see also bug 1312623.
(Reporter)

Comment 2

8 months ago
oops sorry, i looked in the wrong line...
See Also: bug 1292892

Comment 3

8 months ago
I don't understand the stack trace from the initial comment. We don't have ProcessMetastableStateQueue more than once on stack there.

Bug 1312623 looks like the issue which was fixed in
http://searchfox.org/mozilla-central/rev/f5077ad52f8b90183e73038869f6140f0afbf427/dom/media/MediaStreamGraph.cpp#1720-1723
Tracking for 52 as a new crash.
tracking-firefox52: --- → +
(Reporter)

Comment 5

7 months ago
this crash signature jumped up in volume again after the march release date. is there anything more we could do about it?
Mass wontfix for bugs affecting firefox 52.
status-firefox52: affected → wontfix
Nathan, can you help get this to someone who can investigate? This is a pretty dramatic jump, I think in 52.
Too late to fix for 53 but we could still shoot for 54. 

You can see the graph here https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3ACycleCollectedJSContext%3A%3AProcessMetastableStateQueue&date=%3E%3D2017-01-15T02%3A34%3A06.000Z&date=%3C2017-04-15T02%3A34%3A06.000Z#graphs
status-firefox53: affected → wontfix
Flags: needinfo?(nfroyd)
Andrew, do you have any ideas here?  I'm kind of with smaug's comment 3: I don't understand how mDoingStableStates can be true here...at least, assuming that the stack is correct.  We're not doing something weird, like using the same JS context on multiple threads, are we?

If you don't have any ideas, I guess I'll try looking at minidumps to see whether that stack is reasonable and what the value of mDoingStableStates might be.  Maybe we're just looking at some weird memory corruption...?
Flags: needinfo?(nfroyd) → needinfo?(continuation)
Sorry, I don't know anything about this metastable state stuff.
Flags: needinfo?(continuation)
OK, I took at look at the crash from comment 0.  The unwound stack looks reasonable from the stack memory in the crashdump, but I can't examine what the CycleCollectedJSContext looks like, as the crashdump doesn't have the range of memory containing the JS context.

I also took a look at some other crashes, which looked a little more promising:

https://crash-stats.mozilla.com/report/index/6acb7112-e2d3-41ac-9294-80c672170411
https://crash-stats.mozilla.com/report/index/54daad5e-bbf8-4c9c-8997-ac58f2170412
https://crash-stats.mozilla.com/report/index/d3d99b23-19d2-4969-afaa-4a6642170417

where it looks like we're processing from microtask checkpoints.  Unfortunately, we're getting called from JIT code, so we can't see full stacks, but is it possible this code is getting called reentrantly?
Flags: needinfo?(bugs)

Comment 11

5 months ago
That is what the crash seems to hint. Media code had recently a bug where it re-entered metastable state handling.

But the stack traces look still odd. What is that @0x0
Flags: needinfo?(bugs)

Comment 12

4 months ago
I can currently trigger this consistently in 53.0.2 by trying to open a link to thewrap.com from r/movies

bp-96f7ea7e-8fed-421c-aed1-9c0a51170515
bp-4f6d823c-843b-4015-a544-6cd5a0170515
bp-06ddbcf1-05e1-478d-bbc6-6ebf00170515

Anything I can do to help debug?
Flags: needinfo?(nfroyd)
Flags: needinfo?(bugs)

Comment 13

4 months ago
Regression range would be really nice.
Flags: needinfo?(bugs)

Comment 14

4 months ago
Do you have _exact_ steps to reproduce?

What does r/movies mean?

So far I haven't managed to reproduce using FF 53.0.2 on linux

Comment 15

4 months ago
You do seem to have quite a few addons. Can you reproduce the issue without those addons?
(In reply to Olli Pettay [:smaug] from comment #14)
> What does r/movies mean?

Presumably he means https://www.reddit.com/r/movies/

Knowing if the issue reproduces without addons would be helpful.
Flags: needinfo?(nfroyd) → needinfo?(moz-ian)

Comment 17

4 months ago
(In reply to Olli Pettay [:smaug] from comment #14)
> Do you have _exact_ steps to reproduce?
> 
> What does r/movies mean?
> 
> So far I haven't managed to reproduce using FF 53.0.2 on linux
0. Have 53.0.2 with NoScript installed.
1. Go to https://www.reddit.com/r/movies/
2. (Middle) Click on the current link to http://www.thewrap.com/powers-boothe-emmy-winning-character-actor-dead-68/

(In reply to Olli Pettay [:smaug] from comment #15)
> You do seem to have quite a few addons. Can you reproduce the issue without
> those addons?
Safe-mode: cannot reproduce.
Disable just NoScript: cannot reproduce.
In fact just allowing scripts globally with NoScript still enabled seems to stop it happening.

The experience is basically identical to bug 1235183, which used to mean opening a facebook (or gfycat?) tab would crash the browser, and again seemed to be NoScript related.
Flags: needinfo?(moz-ian)

Comment 18

4 months ago
Still no luck reproducing, FF 53.0.2 on linux + NoScript.

Comment 19

4 months ago
oh, but I see it in the stack. 
http://searchfox.org/mozilla-central/rev/484d2b7f51b7aed035147bbb4a565061659d9278/netwerk/protocol/http/nsHttpChannel.cpp#6268
Looks like NoScript spins event loop during http-on-modify-request. That is not good, at all.

Comment 20

4 months ago
Giorgio, this looks like a NoScript bug to me.
Is it spinning event loop at unsafe time?
Flags: needinfo?(g.maone)

Comment 21

4 months ago
(In reply to Olli Pettay [:smaug] from comment #20)
> Giorgio, this looks like a NoScript bug to me.
> Is it spinning event loop at unsafe time?

No it should not, unless very obscure, deprecated (disabled by default of course and going away very soon) ABE options are enabled.
Reporter, could you please try disabling ABE and/or sending me privately your NoScript Options>Export file?
Thanks.
Flags: needinfo?(g.maone)

Comment 22

4 months ago
(In reply to Giorgio Maone [:mao] from comment #21)
> (In reply to Olli Pettay [:smaug] from comment #20)
> > Giorgio, this looks like a NoScript bug to me.
> > Is it spinning event loop at unsafe time?
> 
> No it should not, unless very obscure, deprecated (disabled by default of
> course and going away very soon) ABE options are enabled.
Ah.  That sounds very much like it.  I have... a few ABE rules.

> Reporter, could you please try disabling ABE and/or sending me privately
> your NoScript Options>Export file?
> Thanks.
Disabling ABE does seem to stop it, though re-enabling it doesn't bring the crash back (unless I restart the browser.  Same thing happened when enabling scripts globally).
Emailed you the file.

Comment 23

4 months ago
(In reply to Ian Moody [:Kwan] from comment #22)

> Emailed you the file.

Thank you, please reset your "abe.siteEnabled" about:config preference (AKA NoScript Options>Advanced>ABE>Allow sites to push their own rulesets) to its default "false" value.
BTW, I don't think any production website has ever implemented its own server-side dynamic ABE rulesets, and that's exactly the deprecated option I was talking about ;)

Comment 24

3 months ago
4 failures in 814 pushes (0.005 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 2
* mozilla-central: 1
* autoland: 1

Platform breakdown:
* android-4-3-armv7-api15: 4

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1336478&startday=2017-06-12&endday=2017-06-18&tree=all

Comment 25

3 months ago
1 failures in 892 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 1

Platform breakdown:
* android-4-3-armv7-api15: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1336478&startday=2017-06-19&endday=2017-06-25&tree=all

Comment 26

3 months ago
1 failures in 718 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 1

Platform breakdown:
* android-4-3-armv7-api15: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1336478&startday=2017-06-26&endday=2017-07-02&tree=all
status-firefox54: affected → wontfix
status-firefox55: --- → affected
status-firefox56: --- → ?
I don't see any crashes on 57 in the past 2 weeks... and like, 12 for 56.
status-firefox55: affected → wontfix
status-firefox56: ? → wontfix
status-firefox57: --- → ?

Comment 28

9 days ago
(In reply to Giorgio Maone [:mao] from comment #23)
> (In reply to Ian Moody [:Kwan] from comment #22)
> 
> > Emailed you the file.
> 
> Thank you, please reset your "abe.siteEnabled" about:config preference (AKA
> NoScript Options>Advanced>ABE>Allow sites to push their own rulesets) to its
> default "false" value.
> BTW, I don't think any production website has ever implemented its own
> server-side dynamic ABE rulesets, and that's exactly the deprecated option I
> was talking about ;)

This did seem to fix it for the record, though as the STR stopped working it was hard to be certain.  But when I turned it back on again for a while I'd occasionally get the crash, and then when back off again never did.
Not entirely sure what my reasoning would have been behind turning it on in the first place.
You need to log in before you can comment on or make changes to this bug.