Closed Bug 877708 Opened 11 years ago Closed 11 years ago

[A/V] ###!!! ABORT: ChannelEventQueue::Suspend called recursively: '!mSuspended', file ../../../dist/include/mozilla/net/ChannelEventQueue.h, line 124

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 870564

People

(Reporter: mikeh, Assigned: jduell.mcbugs)

Details

Attachments

(3 files)

Observed while trying to reproduce bug 870564 in a DEBUG build.

STR:
1. open browser to link in bug 870564 comment 0
2. play the video

Expected results: no aborts/crashing!

Observed results: video plays for some amount of time, then assertion failure triggers a SIGSEGV:

05-30 12:05:04.903   963   963 I PRLog   : 1074922744[425c92b0]: [Child 963] ###!!! ABORT: ChannelEventQueue::Suspend called recursively: '!mSuspended', file ../../../dist/include/mozilla/net/ChannelEventQueue.h, line 124
05-30 12:05:04.903   963   963 I Gecko   : [Child 963] ###!!! ABORT: ChannelEventQueue::Suspend called recursively: '!mSuspended', file ../../../dist/include/mozilla/net/ChannelEventQueue.h, line 124
  ...
05-30 12:05:04.973   963   963 E Gecko   : mozalloc_abort: [Child 963] ###!!! ABORT: ChannelEventQueue::Suspend called recursively: '!mSuspended', file ../../../dist/include/mozilla/net/ChannelEventQueue.h, line 124
  ...
05-30 12:05:04.983   963   963 F libc    : Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)

Observed on Inari with:
- gecko: b2g18:ceee454a90b5
- gaia: c51082d92f18c4c7cdbc4f96c977be735500562a
The abort in question is NS_ABORT_IF_FALSE()[1] which is removed in non-DEBUG builds[2].

1. http://mxr.mozilla.org/mozilla-central/source/netwerk/ipc/ChannelEventQueue.h#123
2. http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsDebug.h#142

Looks like this assertion/code was added in http://hg.mozilla.org/mozilla-central/rev/4a33bc1f772d

jduell, any idea on how to resolve this? (For now I'm going to comment out the macro.)
Flags: needinfo?(jduell.mcbugs)
A backtrace from the failing Suspend call would be a useful place to start.
Following code could make different decision between video read and audio read. audio track read more ahead(1 sec) than video track. This causes a lot of consecutive Suspend() and Resume() calls in some situations. This might affect to this bug.

> enableReading = predictedNewDataUse < latestNextUse;

http://mxr.mozilla.org/mozilla-b2g18/source/content/media/nsMediaCache.cpp#1191
jdm, as requested:

Program received signal SIGSEGV, Segmentation fault.
0x417f00b4 in mozalloc_abort (msg=<value optimized out>)
    at /home/mikeh/dev/mozilla/m-c/b2g18/memory/mozalloc/mozalloc_abort.cpp:30
30	    MOZ_CRASH();
(gdb) bt
#0  0x417f00b4 in mozalloc_abort (msg=<value optimized out>)
    at /home/mikeh/dev/mozilla/m-c/b2g18/memory/mozalloc/mozalloc_abort.cpp:30
#1  0x411c36b6 in Abort (aSeverity=<value optimized out>, aStr=<value optimized out>, 
    aExpr=<value optimized out>, aFile=<value optimized out>, aLine=124)
    at /home/mikeh/dev/mozilla/m-c/b2g18/xpcom/base/nsDebugImpl.cpp:423
#2  NS_DebugBreak_P (aSeverity=<value optimized out>, aStr=<value optimized out>, 
    aExpr=<value optimized out>, aFile=<value optimized out>, aLine=124)
    at /home/mikeh/dev/mozilla/m-c/b2g18/xpcom/base/nsDebugImpl.cpp:380
#3  0x404e849e in mozilla::net::ChannelEventQueue::Suspend (this=0x44811c00)
    at ../../../dist/include/mozilla/net/ChannelEventQueue.h:123
#4  mozilla::net::HttpChannelChild::Suspend (this=0x44811c00)
    at /home/mikeh/dev/mozilla/m-c/b2g18/netwerk/protocol/http/HttpChannelChild.cpp:915
#5  0x40bdbb6a in mozilla::ChannelMediaResource::PossiblySuspend (this=<value optimized out>)
    at /home/mikeh/dev/mozilla/m-c/b2g18/content/media/MediaResource.cpp:1126
#6  0x40bdd2b8 in mozilla::ChannelMediaResource::Suspend (this=0x44bb0000, 
    aCloseImmediately=<value optimized out>)
    at /home/mikeh/dev/mozilla/m-c/b2g18/content/media/MediaResource.cpp:813
#7  0x40bdba56 in mozilla::ChannelMediaResource::CacheClientSuspend (this=0xb5)
    at /home/mikeh/dev/mozilla/m-c/b2g18/content/media/MediaResource.cpp:1031
#8  0x40bf7b68 in nsMediaCache::Update (this=0x44c3cc90)
    at /home/mikeh/dev/mozilla/m-c/b2g18/content/media/nsMediaCache.cpp:1278
#9  0x40bf7cd4 in UpdateEvent::Run (this=<value optimized out>)
    at /home/mikeh/dev/mozilla/m-c/b2g18/content/media/nsMediaCache.cpp:1301
#10 0x411ba6aa in nsThread::ProcessNextEvent (this=0x424c9940, mayWait=<value optimized out>, 
    result=0xbeccedc7) at /home/mikeh/dev/mozilla/m-c/b2g18/xpcom/threads/nsThread.cpp:620
#11 0x41181df0 in NS_ProcessNextEvent_P (thread=0xb5, mayWait=false)
    at /home/mikeh/dev/mozilla/btg024/objdir-gecko-b2g18-debug/xpcom/build/nsThreadUtils.cpp:237
#12 0x40ff100e in mozilla::ipc::MessagePump::Run (this=0x424c5340, aDelegate=0xbeccf8b8)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/glue/MessagePump.cpp:82
#13 0x40ff1196 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x424c5340, aDelegate=0xbeccf8b8)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/glue/MessagePump.cpp:231
#14 0x411f0c86 in MessageLoop::RunInternal (this=0xbeccf8b8)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:219
#15 0x411f0ce6 in MessageLoop::RunHandler (this=0xbeccf8b8)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:212
#16 MessageLoop::Run (this=0xbeccf8b8)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:186
#17 0x40f27c1e in nsBaseAppShell::Run (this=0x4428d040)
    at /home/mikeh/dev/mozilla/m-c/b2g18/widget/xpwidgets/nsBaseAppShell.cpp:163
#18 0x404278e6 in XRE_RunAppShell ()
    at /home/mikeh/dev/mozilla/m-c/b2g18/toolkit/xre/nsEmbedFunctions.cpp:646
#19 0x40ff1100 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x424c5340, aDelegate=0xbeccf8b8)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/glue/MessagePump.cpp:198
#20 0x411f0c86 in MessageLoop::RunInternal (this=0xbeccf8b8)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:219
#21 0x411f0ce6 in MessageLoop::RunHandler (this=0xbeccf8b8)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:212
#22 MessageLoop::Run (this=0xbeccf8b8)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/chromium/src/base/message_loop.cc:186
#23 0x40428104 in XRE_InitChildProcess (aArgc=3, aArgv=0x424e8800, aProcess=GeckoProcessType_Content)
    at /home/mikeh/dev/mozilla/m-c/b2g18/toolkit/xre/nsEmbedFunctions.cpp:485
#24 0x00008662 in main (argc=5, argv=0xbeccfa34)
    at /home/mikeh/dev/mozilla/m-c/b2g18/ipc/app/MozillaRuntimeMain.cpp:60
Jason, do you remember why we made suspending the channel queue a binary operation while the regular HTTP channel supports multiple suspensions?
We make the channel queue non-recursive because we can "easily enough" keep track of recursive calls in the child channel, and then only issue a Suspend IPDL message (and also suspend the eventQueue) when we actually transition to/from suspension. It's worth not sending the extra IPDL traffic around, but there's no inherent reason ChannelEventQueue couldn't handle recursive suspend/resume itself.

Alas there's a race in the implementation: at Resume() we launch a new event which flushed the EventQueue (we want to do that async, so new events aren't dispatched during the caller's Resume() stack).  But if something happens to call Suspend() before that event runs, than we hit this assertion.  And more seriously, we will currently flush the queue when the event is finally dispatched, even though we're supposed to be suspended.
Flags: needinfo?(jduell.mcbugs)
I also had to get rid of the assert in ChannelEventQueue::Resume, because if we get a series of suspend/resumes in a row, we can stack up a series of HttpChannelChild::CompleteResume events, the 2nd and following of which will see !mSuspended (as the 1st will set it to false).

This means ChannelEventQueue can't handle recursive suspend/resume, but we also can't have any asserts to check it :(  Sad, but fine for now.  Fancier patch for m-c coming that will fix this.
Assignee: nobody → jduell.mcbugs
Status: NEW → ASSIGNED
Attachment #757784 - Flags: review?(josh)
OK, here's a version of ChannelEventQueue that handles recursive suspend/resume.  This makes eventQ more like the cache/network pumps used in nsHttpChannel.

Most of the changes are because in order to have it handle its own resume asynchronously it has to be AddRef/Release-able, so mEventQ is now a nsRefPtr.

Try server run:

    https://tbpl.mozilla.org/?tree=Try&rev=dfece2c8069a
Attachment #757787 - Flags: review?(josh)
Comment on attachment 757787 [details] [diff] [review]
v1: better fix for m-c

Review of attachment 757787 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.
Attachment #757787 - Flags: review?(josh) → review+
Comment on attachment 757784 [details] [diff] [review]
Fast and simple patch for b2g18 branch

Review of attachment 757784 [details] [diff] [review]:
-----------------------------------------------------------------

You should make an equivalent change to FTPChannelChild as well.
Attachment #757784 - Flags: review?(josh) → review+
Comment on attachment 757784 [details] [diff] [review]
Fast and simple patch for b2g18 branch

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): none
User impact if declined: Video playback can freeze after a while
Testing completed: not much
Risk to taking this patch (and alternatives if risky): Jason says it's fast and simple :-)
String or UUID changes made by this patch: none
Attachment #757784 - Flags: approval-mozilla-b2g18?
Comment on attachment 757787 [details] [diff] [review]
v1: better fix for m-c

Review of attachment 757787 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/ipc/ChannelEventQueue.h
@@ +142,5 @@
>  
>  inline void
>  ChannelEventQueue::Resume()
>  {
> +  if (!--mSuspendCount) {

Would it be useful to have an assertion here that would catch mSuspendCount < 0 ?
Comment on attachment 757784 [details] [diff] [review]
Fast and simple patch for b2g18 branch

See bug 870564 comment 103.
> Would it be useful to have an assertion here that would catch mSuspendCount < 0 ?
>
> You should make an equivalent change to FTPChannelChild as well.

Thanks--fixed both of these.

Dupe-ing this to bug 870564 and moving patches there (so don't have to ask for leo+ here).
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Attachment #757784 - Flags: approval-mozilla-b2g18?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: