Closed Bug 1235183 Opened 8 years ago Closed 8 years ago

Assertion failure: !mDoingStableStates, at xpcom/base/CycleCollectedJSRuntime.cpp:1057

Categories

(Core :: Audio/Video: Playback, defect, P1)

46 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- affected
firefox48 --- affected
firefox49 --- affected
firefox-esr45 --- affected
firefox50 --- fixed

People

(Reporter: unmobile+mozbugs, Assigned: jwwang)

References

Details

(Whiteboard: dom-triaged)

Crash Data

Attachments

(6 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36

Steps to reproduce:

Running packaged iceweasel 43.0.1-1 from Debian Experimental on Debian Sid.

Using the NoScript add-on (up to date, AFAIK)

1. Browse to http://www.theatlantic.com/magazine/archive/2016/01/the-great-republican-revolt/419118/
2. Temporarily enable scripts from cdn.theatlantic.com from address bar drop-down menu


Actual results:

Assertion failure, as follows:

Assertion failure: !mDoingStableStates, at /tmp/buildd/iceweasel-43.0.1/xpcom/base/CycleCollectedJSRuntime.cpp:1057
#01: ???[/usr/lib/iceweasel/libxul.so +0xdc94e8]
#02: ???[/usr/lib/iceweasel/libxul.so +0x9d2c60]
#03: NS_InvokeByIndex[/usr/lib/iceweasel/libxul.so +0x9dc443]
#04: ???[/usr/lib/iceweasel/libxul.so +0xdd6a7b]
#05: ???[/usr/lib/iceweasel/libxul.so +0xddd950]
#06: ???[/usr/lib/iceweasel/libxul.so +0x25b06a6]
#07: ???[/usr/lib/iceweasel/libxul.so +0x25a3bad]
#08: ???[/usr/lib/iceweasel/libxul.so +0x25b0341]
#09: ???[/usr/lib/iceweasel/libxul.so +0x25b0603]
#10: ???[/usr/lib/iceweasel/libxul.so +0x25b0d43]
#11: ???[/usr/lib/iceweasel/libxul.so +0x271ed18]
#12: ??? (???:???)

In gdb, I get the following from this:

Program received signal SIGSEGV, Segmentation fault.
mozilla::CycleCollectedJSRuntime::ProcessMetastableStateQueue (this=this@entry=0x7fffe6889800, aRecursionDepth=<optimized out>)
    at /tmp/buildd/iceweasel-43.0.1/xpcom/base/CycleCollectedJSRuntime.cpp:1057
1057	/tmp/buildd/iceweasel-43.0.1/xpcom/base/CycleCollectedJSRuntime.cpp: No such file or directory.

(gdb) bt
#0  mozilla::CycleCollectedJSRuntime::ProcessMetastableStateQueue (this=this@entry=0x7fffe6889800, aRecursionDepth=<optimized out>) at /tmp/buildd/iceweasel-43.0.1/xpcom/base/CycleCollectedJSRuntime.cpp:1057
#1  0x00007ffff26ff63d in mozilla::CycleCollectedJSRuntime::AfterProcessTask (this=0x7fffe6889800, aRecursionDepth=<optimized out>) at /tmp/buildd/iceweasel-43.0.1/xpcom/base/CycleCollectedJSRuntime.cpp:1091
#2  0x00007ffff2b2c4e8 in XPCJSRuntime::AfterProcessTask (this=0x7fffe6889800, aNewRecursionDepth=2) at /tmp/buildd/iceweasel-43.0.1/js/xpconnect/src/XPCJSRuntime.cpp:3638
#3  0x00007ffff2735c60 in nsThread::ProcessNextEvent (this=0x7ffff6b70940, aMayWait=<optimized out>, aResult=0x7fffffff9ef8) at /tmp/buildd/iceweasel-43.0.1/xpcom/threads/nsThread.cpp:965
#4  0x00007ffff273f443 in NS_InvokeByIndex (that=<optimized out>, methodIndex=<optimized out>, paramCount=<optimized out>, params=<optimized out>)
    at /tmp/buildd/iceweasel-43.0.1/xpcom/reflect/xptcall/md/unix/xptcinvoke_x86_64_unix.cpp:176
#5  0x00007ffff2b39a7b in CallMethodHelper::Invoke (this=0x7fffffff9e98) at /tmp/buildd/iceweasel-43.0.1/js/xpconnect/src/XPCWrappedNative.cpp:2095
#6  CallMethodHelper::Call (this=0x7fffffff9e98) at /tmp/buildd/iceweasel-43.0.1/js/xpconnect/src/XPCWrappedNative.cpp:1415
#7  XPCWrappedNative::CallMethod (ccx=..., mode=mode@entry=XPCWrappedNative::CALL_METHOD) at /tmp/buildd/iceweasel-43.0.1/js/xpconnect/src/XPCWrappedNative.cpp:1382
#8  0x00007ffff2b40950 in XPC_WN_CallMethod (cx=0x7fffdc314800, argc=1, vp=0x7fffd8c1a318) at /tmp/buildd/iceweasel-43.0.1/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1145
#9  0x00007ffff43136a6 in js::CallJSNative (args=..., native=<optimized out>, cx=0x7fffdc314800) at /tmp/buildd/iceweasel-43.0.1/js/src/jscntxtinlines.h:235
#10 js::Invoke (cx=0x7fffdc314800, args=..., construct=<optimized out>) at /tmp/buildd/iceweasel-43.0.1/js/src/vm/Interpreter.cpp:765
#11 0x00007ffff4306bad in Interpret (cx=0x7fffdc314800, state=...) at /tmp/buildd/iceweasel-43.0.1/js/src/vm/Interpreter.cpp:3068
#12 0x00007ffff4313341 in js::RunScript (cx=cx@entry=0x7fffdc314800, state=...) at /tmp/buildd/iceweasel-43.0.1/js/src/vm/Interpreter.cpp:706
#13 0x00007ffff4313603 in js::Invoke (cx=cx@entry=0x7fffdc314800, args=..., construct=construct@entry=js::NO_CONSTRUCT) at /tmp/buildd/iceweasel-43.0.1/js/src/vm/Interpreter.cpp:783
#14 0x00007ffff4313d43 in js::Invoke (cx=cx@entry=0x7fffdc314800, thisv=..., fval=..., argc=argc@entry=2, argv=argv@entry=0x7fffffffadf0, rval=..., rval@entry=...) at /tmp/buildd/iceweasel-43.0.1/js/src/vm/Interpreter.cpp:820
#15 0x00007ffff4481d18 in js::jit::DoCallFallback (cx=0x7fffdc314800, frame=0x7fffffffae78, stub_=0x7fff9e144358, argc=2, vp=0x7fffffffade0, res=...) at /tmp/buildd/iceweasel-43.0.1/js/src/jit/BaselineIC.cpp:8900
#16 0x00007ffff7fd3b90 in ?? ()
#17 0x00007fff99ff1fb8 in ?? ()
#18 0x00007fffffffad98 in ?? ()
#19 0x00007fffffffaeb8 in ?? ()
#20 0xfff9000000000000 in ?? ()
#21 0x00007ffff686c6a0 in js::jit::DoSpreadCallFallbackInfo () from /usr/lib/iceweasel/libxul.so
#22 0x00007fffd8d53b80 in ?? ()
#23 0x00007fffe69617c3 in ?? ()
[snip more unsymbolized addresses]
#54 0x00007fffffffaf00 in ?? ()
#55 0x0000000000000000 in ?? ()



Expected results:

Page reloads with additional JS enabled
Component: Untriaged → XPCOM
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64
On advice of KWierso on IRC, set to Core :: XPCOM, adding CC to khuey and froydnj
Just tried on "Build identifier: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.5.0" and the crash did not reproduce.
OS: Linux → All
Hardware: x86_64 → All
Attached patch Patch (obsolete) — Splinter Review
The problem here is that the HTMLMediaElement code runs at a point where it's not safe to spin the event loop.  It then does a bunch of processing and calls into Necko, which eventually calls into addons (via the observer service), one of which spins the event loop and triggers an assertion in CycleCollectedJSRuntime.

This is sort of a bandaid, in that it only fixes a code path where Necko is *known* to call into addons that spin the event loop.  Calling AsyncOpen triggers other observer service notifications that *could* call into addons that spin the event loop.  I'm not sure whether the long term solution here is to simply not open channels in this situation or what, but I think we should take this patch.
Attachment #8703895 - Flags: review?(jduell.mcbugs)
Oh, and there was previous discussion of this in bug 1191713, which I couldn't find earlier.
Comment on attachment 8703895 [details] [diff] [review]
Patch

Review of attachment 8703895 [details] [diff] [review]:
-----------------------------------------------------------------

IRC chat about this patch:

jcduell: khuey: looking at https://bugzilla.mozilla.org/show_bug.cgi?id=1235183.  So even with this patch we’ll still call on-opening-request, which I was hoping was only used by Firebug (we added the flag for firebug some few years ago because it needed to be called back synchronously from within AsyncOpen(), IIRC because of some temporary variable or something like that).  But for some reasons it's gotten popular--there are now 95 addons in mxr that are hooking this:
jduell:   https://mxr.mozilla.org/addons/search?string=on-opening-request
jduell: khuey: How awful will things be if some users have these addons and we wind up being in JS-land despite this patch?  Is the right fix to stop necko channels from being opened during this state?
khuey: jduell: if we just end up in JS land it's not a huge deal, if the addon spins the event loop there we'll crash
khuey: in the same way we're crashing today
jduell: khuey: ok, if you’re good with that, I’ll +r
khuey: jduell: my plan is to take this fix, and if that ever becomes an issue, deal with it then
Attachment #8703895 - Flags: review?(jduell.mcbugs) → review+
Comment on attachment 8703895 [details] [diff] [review]
Patch

Unfortunately the other things are a problem.  I think we need to fix this in HTMLMediaElement.
Attachment #8703895 - Attachment is obsolete: true
(Picking cpearce semirandomly)

We need to defer the actual Necko calls (e.g. NS_NewChannel) in HTMLMediaElement::LoadResource when it's called from a stable state (i.e. via SelectResourceWrapper).  Is that safe?

The motivation for this is that we *cannot* spin the event loop from stable state, and Necko calls into a bunch of addons under NS_NewChannel that unfortunately do :/
Assignee: khuey → nobody
Component: XPCOM → DOM
Flags: needinfo?(cpearce)
Whiteboard: dom-triaged
In theory, we can run HTMLMediaElement::LoadResource() in a new runnable, provided it captures all the JS-modifiable state that it needs before exiting the synchronous section.
Flags: needinfo?(cpearce)
The issue can be found in Seamonkey 2.40 too:

# seamonkey &
[5] 14841
(seamonkey-bin:14848): Gtk-WARNING **: Theme directory  of theme oxygen has no size field


(seamonkey-bin:14848): Gtk-WARNING **: Theme directory  of theme oxygen has no size field

enigmail.js: Registered components
Assertion failure: !mDoingStableStates, at /home/abuild/rpmbuild/BUILD/seamonkey/mozilla/xpcom/base/CycleCollectedJSRuntime.cpp:1057
#01: ???[/usr/lib64/seamonkey/libxul.so +0x1056524]
#02: ???[/usr/lib64/seamonkey/libxul.so +0xc89615]
#03: NS_InvokeByIndex[/usr/lib64/seamonkey/libxul.so +0xc911fe]
#04: ???[/usr/lib64/seamonkey/libxul.so +0x10640ba]
#05: ???[/usr/lib64/seamonkey/libxul.so +0x1069b00]
#06: ??? (???:???)
Seen on Firefox 46.0.1 (Debian packaged 46.0.1-1) on Linux x86_64
Version: 43 Branch → 46 Branch
Ever since 46.0.1 this seems to have become a regular issue for me for some reason, happens about once a day when opening a gfycat link on both Windows and Linux, but didn't at all before hand.
Crash Signature: [@ mozilla::CycleCollectedJSRuntime::ProcessMetastableStateQueue]
I have exactly the same behaviour. It seems to be worse on Linux, but maybe because I use Windows less.
We've had 600+ occurrences of this in the past week.
Is it already clear, which add-ons trigger this bug?

Since I disabled most of my add-ons I never saw this bug again. Testing is a bit time-consuming, because sometimes you have to wait some hours until you see the bug. So for testing 5 add-ons you may need 5 days.
I suspect Reddit Enhancement Suite, since this has occurred in reddit a couple of times specifically when expanding the inline image viewer. Haven't yet been able to accurately detect if it happens only on specific links, as the crash doesn't always occur. I also suspect it might be Disconnect, Privacy Badger and uBlock Origin, as when expanding those elements one of these addons might be trying to block something from the domain where the element comes from. It's possible that these assumptions are entirely wrong as I'm but a novice in this field. But I'll try to disable these mentioned addons one by one to see if the behaviour remains.
ajones, could you perhaps try to find someone to take a look at this?
See https://bugzilla.mozilla.org/show_bug.cgi?id=1235183&mark=7-9#c7
Flags: needinfo?(ajones)
Component: DOM → Audio/Video: Playback
Priority: -- → P1
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) from comment #8)
> The motivation for this is that we *cannot* spin the event loop from stable
> state

Is that true all the time or only after xpcom-shutdown starts?
Flags: needinfo?(khuey)
All the time. "Stable state" in the HTML spec is supposed to guarantee that no script can run, which means we can't execute arbitrary nsIRunnables.
Flags: needinfo?(khuey)
It is a bit tricky to move the entire LoadResource() to a new runnable. It should be safer to move the call to NS_NewChannel() to a new runnable because it is just about implementation details. I will give it a try.
Yes, that should be fine.  The problem is that NS_NewChannel calls into addon script, which can do crazy things.  So deferring that from stable state should be sufficient.  Thanks for looking into this.
Hi Phil,
Can you try if this patch fix the problem (which applied to c2da34d96746)?
Thanks.
Assignee: nobody → jwwang
Flags: needinfo?(unmobile+mozbugs)
Review commit: https://reviewboard.mozilla.org/r/60962/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/60962/
Attachment #8765715 - Flags: review?(cpearce)
Attachment #8765716 - Flags: review?(cpearce)
Attachment #8765717 - Flags: review?(cpearce)
Attachment #8765718 - Flags: review?(cpearce)
Attachment #8765715 - Flags: review?(cpearce) → review+
Attachment #8765716 - Flags: review?(cpearce) → review+
Comment on attachment 8765717 [details]
Bug 1235183. Part 3 - move mChannel to ChannelLoader so it will be easier to create the channel asynchronously.

https://reviewboard.mozilla.org/r/60966/#review57988
Attachment #8765717 - Flags: review?(cpearce) → review+
Comment on attachment 8765718 [details]
Bug 1235183. Part 4 - create channel asynchronously.

https://reviewboard.mozilla.org/r/60968/#review57990
Attachment #8765718 - Flags: review?(cpearce) → review+
Running on Firefox 47.0 with NoScript 2.9.0.12rc1, the site I originally offered as a repro case doesn't crash any more. So, testing the patched version against that won't be directly informative. Same on Youtube as linked from some of the dupes. Some page with targeted scripts to trigger this exact issue would make testing much easier.
Flags: needinfo?(unmobile+mozbugs)
https://treeherder.mozilla.org/logviewer.html#?job_id=22992495&repo=try#L6176
Hit an assertion when running dom/media/test/crashtests/459439-1.html.

stack:
[1833] ###!!! ASSERTION: Not loading during source selection?: 'mNetworkState == NETWORK_LOADING', file /builds/slave/try-lx-d-000000000000000000000/build/src/dom/html/HTMLMediaElement.cpp, line 960
#01: mozilla::dom::HTMLMediaElement::NoSupportedMediaSourceError [dom/html/HTMLMediaElement.cpp:959]
#02: mozilla::dom::HTMLMediaElement::ChannelLoader::LoadInternal [dom/html/HTMLMediaElement.cpp:518]

This is because an empty string [1] is not a valid URI. Therefore [2] LoadResource() won't be called to cancel the existing |mChannelLoader|. We need to cancel the existing |mChannelLoader| in AbortExistingLoads() as well.

[1] https://hg.mozilla.org/mozilla-central/file/e45890951ce77c3df05575bd54072b9f300d77b0/dom/media/test/crashtests/459439-1.html#l27
[2] https://hg.mozilla.org/mozilla-central/file/e45890951ce77c3df05575bd54072b9f300d77b0/dom/html/HTMLMediaElement.cpp#l956
Attachment #8766159 - Flags: review?(cpearce) → review+
Comment on attachment 8766159 [details]
Bug 1235183. Part 5 - per comment 35, cancel existing |mChannelLoader| in AbortExistingLoads().

https://reviewboard.mozilla.org/r/61166/#review58028
I'm not sure this helps, but it happened often when:

* browsing reddit, using reddit enhancement suite extension, click to load a video from streamable.com, or a gif from gfycat (not sure if both, or just one anymore - I have been avoiding this as the crashes are extremely frustrating)

* loading www.bancobest.pt
Pushed by jwwang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/87ff4b8ecd9a
Part 1 - fix coding styles. r=cpearce
https://hg.mozilla.org/integration/autoland/rev/c88734b5bf9b
Part 2 - remove unnecessary null-checks. r=cpearce
https://hg.mozilla.org/integration/autoland/rev/f03f55222c0d
Part 3 - move mChannel to ChannelLoader so it will be easier to create the channel asynchronously. r=cpearce
https://hg.mozilla.org/integration/autoland/rev/d80a90575e2d
Part 4 - create channel asynchronously. r=cpearce
https://hg.mozilla.org/integration/autoland/rev/9eea8021987a
Part 5 - per comment 35, cancel existing |mChannelLoader| in AbortExistingLoads(). r=cpearce
Crash volume for signature 'mozilla::CycleCollectedJSRuntime::ProcessMetastableStateQueue':
 - nightly (version 50): 2 crashes from 2016-06-06.
 - aurora  (version 49): 17 crashes from 2016-06-07.
 - beta    (version 48): 223 crashes from 2016-06-06.
 - release (version 47): 2449 crashes from 2016-05-31.
 - esr     (version 45): 149 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          1          0          0          0          1          0          0
 - aurora           2          5          2          3          5          0          0
 - beta            33         26         18         24         44         61         14
 - release        311        270        372        380        400        416        163
 - esr              3          1          9         30         26         21          9

Affected platforms: Windows, Mac OS X, Linux
Flags: needinfo?(ajones)
Wasn't this bug supposed to be fixed in v50? I'm still getting it.

https://crash-stats.mozilla.com/report/index/08497341-310e-44b9-9656-bd5d22161119
Flags: needinfo?(jwwang)
(In reply to ReggX from comment #42)
> Wasn't this bug supposed to be fixed in v50? I'm still getting it.
> 
> https://crash-stats.mozilla.com/report/index/08497341-310e-44b9-9656-
> bd5d22161119

It looks like there are still other callsites spinning the event loop in the stable state. However, I can't tell which from the call stack. Maybe a DOM or JS peer should check this issue.
Flags: needinfo?(jwwang) → needinfo?(bugs)
I filed another bug recently.


https://crash-stats.mozilla.com/report/index/08497341-310e-44b9-9656-bd5d22161119 doesn't seem to be unfortunately too useful.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #44)
> I filed another bug recently.
> 
> 
> https://crash-stats.mozilla.com/report/index/08497341-310e-44b9-9656-
> bd5d22161119 doesn't seem to be unfortunately too useful.

Bug number?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: