Going to a canvas based tetris game will cause the content not to redraw

VERIFIED FIXED

Status

Firefox for Android Graveyard
General
P2
normal
VERIFIED FIXED
7 years ago
7 years ago

People

(Reporter: nhirata, Assigned: kinetik)

Tracking

Trunk
ARM
Android

Details

Attachments

(2 attachments, 1 obsolete attachment)

Mozilla/5.0 (Android; Linux armv71; rv2.0b12pre) Gecko/20110215 Firefox/4.0b12pre Fennec/4.0b5pre
Device: Droid 2 
OS: Android 2.2

1. go to http://aduros.emufarmers.com/easel/
2. try to go to www.google.com

Expected: content gets redrawn
Actual: content will not redraw.

Note:
1. also expected when going to the canvas based tetris game, the game will start playing, but it does not.
2. game does play on Firefox/Minefield
Priority: -- → P2
I see this as well on build:

Mozilla/5.0 (Android; Linux armv71; rv:2.0b12pre) Gecko/20110228
Firefox/4.0b13pre Fennec/4.0b6pre

using the Samsung Vibrant.
tracking-fennec: --- → ?
Created attachment 516112 [details]
log while loading in debug build

Here is the output from a debug build while loading this. Lots of warnings, assertions and failures... don't really know which is responsible.

Since things load pretty normally until the end, I think suspicion needs to start at the end of the log and work back (so the lib editor failures or the mozstorage failures perhaps?)

Updated

7 years ago
Assignee: nobody → azakai
tracking-fennec: ? → 2.0+

Comment 3

7 years ago
Event profiling shows that the child begins an |nsRunnableMethodImpl|, and never finishes it. It apparently gets stuck in there and does not respond to anything, logcat shows that from that point the child process is completely silent (this is a debugging+profiling build, so there is normally plenty of output). And in fact everything that needs a response from the child will not work at this point - for example, closing (remote) tabs, rendering remote content, etc.

I am writing more fine-tuned code for event profiling so I can see exactly which method the hang occurs in. I suspect it's a nested event loop with sleeping, as the CPU is not maxed out (seem to recall we had some of those somewhere in networking code).
not that its particularly pertinent to this bug, but shouldn't the chrome process kill a hung content process?

Comment 5

7 years ago
It should. My suspicion is that an IPC loop is still running here, so it is not detected as hanging. But the loop is not actually executing the messages it receives (except for the 'alive' message), it is buffering them, looking for a message that never arrives. Just a guess, will know for sure shortly.

Comment 6

7 years ago
So, we are hanging on

  nsAutoMonitor monitor(mMonitor);

in nsWaveStateMachine::GetNextFrameStatus. So we wait forever to grab a monitor. 

I guess we aren't detected as having crashed since the IPC receiving loop is running in another thread.

1. I'll figure out why this monitor is misused.
2. We should probably have the parent process be able to detect hangs of this form in a separate bug. If that makes sense I'll file it.

Comment 7

7 years ago
So, looks like

1. We grab the monitor in

http://hg.mozilla.org/mozilla-central/file/e56ecd8b3a68/content/media/wave/nsWaveDecoder.cpp#l563

   which is on a side thread.

2. Right after that, a few lines below, we call ChangeState, which also grabs it and releases it (but that thread had it anyhow, so nothing happens).

3. Later on the main thread tries to grab the monitor in GetNextFrameStatus and hangs forever.
If REMOTE_AUDIO is enabled in nsAudioStream.cpp (it's conditional on Android by default), it's possible to reproduce this hang with a desktop Fennec build.

The main thread has this stack:

#5  0x00007fa2b2438bb5 in nsAutoMonitor::nsAutoMonitor (this=0x7fffb0f20a50, mon=0x7fa2a5471300, _notifier=...) at ../../../dist/include/nsAutoLock.h:310
#6  0x00007fa2b2f4ca74 in nsWaveStateMachine::GetNextFrameStatus (this=0x7fa2a2c61200) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:528
#7  0x00007fa2b2f5027d in nsWaveDecoder::UpdateReadyStateForData (this=0x7fa2b065bdc0) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:1636
#8  0x00007fa2b2f4fe6c in nsWaveDecoder::NotifyBytesDownloaded (this=0x7fa2b065bdc0) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:1545


Thread 2 has this stack:

#9  0x00007fa2b3987857 in NS_DispatchToMainThread_P (event=0x7fa2a2c27800, dispatchFlags=1) at nsThreadUtils.cpp:174
#10 0x00007fa2b2ef4998 in nsAudioStreamRemote::Init (this=0x7fa2a4329e00, aNumChannels=1, aRate=44100, aFormat=nsAudioStream::FORMAT_S16_LE)
    at /home/kinetik/fennec/mozilla-central/content/media/nsAudioStream.cpp:642
#11 0x00007fa2b2f4df6f in nsWaveStateMachine::OpenAudioStream (this=0x7fa2a2c61200) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:918
#12 0x00007fa2b2f4d005 in nsWaveStateMachine::Run (this=0x7fa2a2c61200) at /home/kinetik/fennec/mozilla-central/content/media/wave/nsWaveDecoder.cpp:601


The monitor is held by thread 2 in nsWaveStateMachine::Run, and is dispatching a SYNC event to the main thread in nsAudioStreamRemote::Init.

Since the main thread is waiting on the same monitor, we have a deadlock.

Comment 9

7 years ago
So, removing the NS_DISPATCH_SYNC from nsAudioStreamRemote::Init makes things work again. dougt, is the NS_DISPATCH_SYNC there needed?
I made it SYNC in bug 614160 comment 5.  In other places where we dispatch SYNC events in the media code we drop the monitor, which is easy to do because they're dispatched by the owner of the monitor... in this case, the dispatcher is deeper in the callstack.  That's still probably the easiest fix here, maybe by passing the monitor in to nsAudioStream::Init so the Remote variant can drop it temporarily.
Created attachment 517038 [details] [diff] [review]
patch

Patch adds an additional place to drop the monitor in the audio thread's main loop, when calling OpenAudioStream.

Now that it doesn't hang, the tetris game is actually playable with this patch on phones :) (tap left or right to move, up to rotate, at bottom to drop)
Attachment #517038 - Flags: review?(kinetik)
Patch passes try successfully.
Whiteboard: [needs-review (kinetik)]
Created attachment 517549 [details] [diff] [review]
patch v0

I initially thought the nsBuiltinDecoder would have the same bug, but it avoids this deadlock because it has a separate monitor for the audio stream.  This monitor is only ever held by the audio and state machine threads and is never held at the same time as the main decoder monitor, so it's safe to hold this monitor while sending a SYNC message to the main thread.

For nsWaveDecoder, there's one place where dropping the monitor outside OpenAudioStream could cause a problem: SetVolume, which is called from the main thread.  It needs to see either a null mAudioStream or a fully initialized mAudioStream and no intermediate states.

Based on Alon's patch, this patch drops the monitor inside OpenAudioStream so it can deal with SetVolume safely.
Assignee: azakai → kinetik
Attachment #517038 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #517549 - Flags: review?(chris.double)
Attachment #517038 - Flags: review?(kinetik)
Whiteboard: [needs-review (kinetik)]

Updated

7 years ago
Attachment #517549 - Flags: review?(chris.double) → review+
Whiteboard: [needs landing]

Updated

7 years ago
Keywords: checkin-needed
Whiteboard: [needs landing]
Pushed the r+'d patch to try.
It looked good on try

pushed http://hg.mozilla.org/mozilla-central/rev/89429f5cbf81
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Keywords: checkin-needed

Comment 16

7 years ago
VERIFIED FIXED on:
Build ID: Mozilla /5.0 (Android;Linux armv7l;rv:2.0b13pre) Gecko/20110316
Firefox/4.0b13pre Fennec /4.0b6pre
Device: HTC Desire
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.