Closed Bug 634407 Opened 9 years ago Closed 9 years ago
Going to a canvas based tetris game will cause the content not to redraw
Mozilla/5.0 (Android; Linux armv71; rv2.0b12pre) Gecko/20110215 Firefox/4.0b12pre Fennec/4.0b5pre Device: Droid 2 OS: Android 2.2 1. go to http://aduros.emufarmers.com/easel/ 2. try to go to www.google.com Expected: content gets redrawn Actual: content will not redraw. Note: 1. also expected when going to the canvas based tetris game, the game will start playing, but it does not. 2. game does play on Firefox/Minefield
I see this as well on build: Mozilla/5.0 (Android; Linux armv71; rv:2.0b12pre) Gecko/20110228 Firefox/4.0b13pre Fennec/4.0b6pre using the Samsung Vibrant.
tracking-fennec: --- → ?
Here is the output from a debug build while loading this. Lots of warnings, assertions and failures... don't really know which is responsible. Since things load pretty normally until the end, I think suspicion needs to start at the end of the log and work back (so the lib editor failures or the mozstorage failures perhaps?)
Event profiling shows that the child begins an |nsRunnableMethodImpl|, and never finishes it. It apparently gets stuck in there and does not respond to anything, logcat shows that from that point the child process is completely silent (this is a debugging+profiling build, so there is normally plenty of output). And in fact everything that needs a response from the child will not work at this point - for example, closing (remote) tabs, rendering remote content, etc. I am writing more fine-tuned code for event profiling so I can see exactly which method the hang occurs in. I suspect it's a nested event loop with sleeping, as the CPU is not maxed out (seem to recall we had some of those somewhere in networking code).
not that its particularly pertinent to this bug, but shouldn't the chrome process kill a hung content process?
It should. My suspicion is that an IPC loop is still running here, so it is not detected as hanging. But the loop is not actually executing the messages it receives (except for the 'alive' message), it is buffering them, looking for a message that never arrives. Just a guess, will know for sure shortly.
So, we are hanging on nsAutoMonitor monitor(mMonitor); in nsWaveStateMachine::GetNextFrameStatus. So we wait forever to grab a monitor. I guess we aren't detected as having crashed since the IPC receiving loop is running in another thread. 1. I'll figure out why this monitor is misused. 2. We should probably have the parent process be able to detect hangs of this form in a separate bug. If that makes sense I'll file it.
So, looks like 1. We grab the monitor in http://hg.mozilla.org/mozilla-central/file/e56ecd8b3a68/content/media/wave/nsWaveDecoder.cpp#l563 which is on a side thread. 2. Right after that, a few lines below, we call ChangeState, which also grabs it and releases it (but that thread had it anyhow, so nothing happens). 3. Later on the main thread tries to grab the monitor in GetNextFrameStatus and hangs forever.
If REMOTE_AUDIO is enabled in nsAudioStream.cpp (it's conditional on Android by default), it's possible to reproduce this hang with a desktop Fennec build. The main thread has this stack: #5 0x00007fa2b2438bb5 in nsAutoMonitor::nsAutoMonitor (this=0x7fffb0f20a50, mon=0x7fa2a5471300, _notifier=...) at ../../../dist/include/nsAutoLock.h:310 #6 0x00007fa2b2f4ca74 in nsWaveStateMachine::GetNextFrameStatus (this=0x7fa2a2c61200) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:528 #7 0x00007fa2b2f5027d in nsWaveDecoder::UpdateReadyStateForData (this=0x7fa2b065bdc0) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:1636 #8 0x00007fa2b2f4fe6c in nsWaveDecoder::NotifyBytesDownloaded (this=0x7fa2b065bdc0) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:1545 Thread 2 has this stack: #9 0x00007fa2b3987857 in NS_DispatchToMainThread_P (event=0x7fa2a2c27800, dispatchFlags=1) at nsThreadUtils.cpp:174 #10 0x00007fa2b2ef4998 in nsAudioStreamRemote::Init (this=0x7fa2a4329e00, aNumChannels=1, aRate=44100, aFormat=nsAudioStream::FORMAT_S16_LE) at /home/kinetik/fennec/mozilla-central/content/media/nsAudioStream.cpp:642 #11 0x00007fa2b2f4df6f in nsWaveStateMachine::OpenAudioStream (this=0x7fa2a2c61200) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:918 #12 0x00007fa2b2f4d005 in nsWaveStateMachine::Run (this=0x7fa2a2c61200) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:601 The monitor is held by thread 2 in nsWaveStateMachine::Run, and is dispatching a SYNC event to the main thread in nsAudioStreamRemote::Init. Since the main thread is waiting on the same monitor, we have a deadlock.
So, removing the NS_DISPATCH_SYNC from nsAudioStreamRemote::Init makes things work again. dougt, is the NS_DISPATCH_SYNC there needed?
I made it SYNC in bug 614160 comment 5. In other places where we dispatch SYNC events in the media code we drop the monitor, which is easy to do because they're dispatched by the owner of the monitor... in this case, the dispatcher is deeper in the callstack. That's still probably the easiest fix here, maybe by passing the monitor in to nsAudioStream::Init so the Remote variant can drop it temporarily.
Patch adds an additional place to drop the monitor in the audio thread's main loop, when calling OpenAudioStream. Now that it doesn't hang, the tetris game is actually playable with this patch on phones :) (tap left or right to move, up to rotate, at bottom to drop)
Patch passes try successfully.
I initially thought the nsBuiltinDecoder would have the same bug, but it avoids this deadlock because it has a separate monitor for the audio stream. This monitor is only ever held by the audio and state machine threads and is never held at the same time as the main decoder monitor, so it's safe to hold this monitor while sending a SYNC message to the main thread. For nsWaveDecoder, there's one place where dropping the monitor outside OpenAudioStream could cause a problem: SetVolume, which is called from the main thread. It needs to see either a null mAudioStream or a fully initialized mAudioStream and no intermediate states. Based on Alon's patch, this patch drops the monitor inside OpenAudioStream so it can deal with SetVolume safely.
Pushed the r+'d patch to try.
It looked good on try pushed http://hg.mozilla.org/mozilla-central/rev/89429f5cbf81
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
VERIFIED FIXED on: Build ID: Mozilla /5.0 (Android;Linux armv7l;rv:2.0b13pre) Gecko/20110316 Firefox/4.0b13pre Fennec /4.0b6pre Device: HTC Desire
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.