Chrome process intermittently aborts in content/media/test/crashtests/459439-1.html and crashes in content/media/test/crashtests/466607-1.html and content/media/test/crashtests/466945-1.html

VERIFIED FIXED

Status

()

VERIFIED FIXED
8 years ago
8 years ago

People

(Reporter: cjones, Assigned: kinetik)

Tracking

unspecified
Points:
---

Firefox Tracking Flags

(fennec2.0b4+)

Details

Attachments

(1 attachment, 1 obsolete attachment)

This is in a debug desktop build, running crashtest-ipc (bug 615386).


REFTEST INFO | START file:///home/cjones/mozilla/mozilla-central/content/media/test/crashtests/459439-1.html
WARNING: Subdocument container has no content: file /home/cjones/mozilla/mozilla-central/layout/base/nsDocumentViewer.cpp, line 2383
++DOMWINDOW == 112 (0x1243b308) [serial = 753] [outer = 0x256d1b0]
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: file:///home/cjones/mozilla/ff-dbg/dist/bin/components/nsHandlerService.js :: <TOP_LEVEL> :: line 909"  data: no]
************************************************************
REFTEST INFO | [CONTENT] OnDocumentLoad triggering WaitForTestEnd
REFTEST INFO | Initializing canvas snapshot
REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | [CONTENT] AttrModifiedListener fired
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: file:///home/cjones/mozilla/ff-dbg/dist/bin/components/nsHandlerService.js :: <TOP_LEVEL> :: line 909"  data: no]
************************************************************
REFTEST INFO | [CONTENT] AfterPaintListener in file:///home/cjones/mozilla/mozilla-central/content/media/test/crashtests/459439-1.html
REFTEST INFO | Updating canvas for invalidation
REFTEST INFO | No canvas to update!
REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | [CONTENT] AttrModifiedListener fired
REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | [CONTENT] MakeProgress: waiting for MozAfterPaint
Assertion 'pa_atomic_load(&(c)->_ref) >= 1' failed at pulse/context.c:1032, function pa_context_get_state(). Aborting.
Assertion 'pthread_mutex_destroy(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:83, function pa_mutex_free(). Aborting.

###!!! [Child][SyncChannel] Error: Channel error: cannot send/recv

WARNING: content process _exit()ing: file /home/cjones/mozilla/mozilla-central/dom/ipc/ContentChild.cpp, line 471
TEST-UNEXPECTED-FAIL | file:///home/cjones/mozilla/mozilla-central/content/media/test/crashtests/459439-1.html | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:01:11.345283
INFO | automation.py | Reading PID log: /tmp/tmpEX2xsUpidlog
==> process 30987 launched child process 30998
INFO | automation.py | Checking for orphan process with PID: 30998
PROCESS-CRASH | file:///home/cjones/mozilla/mozilla-central/content/media/test/crashtests/459439-1.html | application crashed (minidump found)
Neither MINIDUMP_STACKWALK nor MINIDUMP_STACKWALK_CGI is set, can't process dump.


Fairly easy to reproduce for me, ~1/5 or so.  Will see what valgrind has to say in a bit.
This bug likely affects maemo and probably android too, if it's a race condition or something like that in the remote audio code.
tracking-fennec: --- → ?
==16501== Thread 11:
==16501== Invalid read of size 8
==16501==    at 0xE335913: snd_pcm_poll_descriptors_revents (in /usr/lib/libasound.so.2.0.0)
==16501==    by 0xE3396B8: ??? (in /usr/lib/libasound.so.2.0.0)
==16501==    by 0xE33998C: ??? (in /usr/lib/libasound.so.2.0.0)
==16501==    by 0xE3784D9: ??? (in /usr/lib/libasound.so.2.0.0)
==16501==    by 0x6457D78: sa_stream_write (sydney_audio_alsa.c:260)
==16501==    by 0x63F98F2: nsAudioStreamLocal::Write(void const*, unsigned int, int) (nsAudioStream.cpp:455)
==16501==    by 0x6D91564: mozilla::dom::AudioWriteEvent::Run() (AudioParent.cpp:59)
==16501==    by 0x702C554: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:626)
==16501==    by 0x6FB90E3: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
==16501==    by 0x702B3E0: nsThread::ThreadFunc(void*) (nsThread.cpp:278)
==16501==    by 0x983057A: _pt_root (ptthread.c:187)
==16501==    by 0x4E339C9: start_thread (pthread_create.c:300)
==16501==  Address 0x1cbb1dd8 is 392 bytes inside a block of size 440 free'd
==16501==    at 0x4C270BD: free (vg_replace_malloc.c:366)
==16501==    by 0x706613B: free (nsTraceMalloc.c:1303)
==16501==    by 0xE339D1E: ??? (in /usr/lib/libasound.so.2.0.0)
==16501==    by 0xE33AB22: snd_pcm_close (in /usr/lib/libasound.so.2.0.0)
==16501==    by 0x6457C66: sa_stream_destroy (sydney_audio_alsa.c:223)
==16501==    by 0x63F9490: nsAudioStreamLocal::Shutdown() (nsAudioStream.cpp:380)
==16501==    by 0x6D90F64: mozilla::dom::AudioParent::Recv__delete__() (AudioParent.cpp:196)
==16501==    by 0x6E0CEFD: mozilla::dom::PAudioParent::OnMessageReceived(IPC::Message const&) (PAudioParent.cpp:128)
==16501==    by 0x6E1E86F: mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) (PContentParent.cpp:628)
==16501==    by 0x6DC9FFE: mozilla::ipc::AsyncChannel::OnDispatchMessage(IPC::Message const&) (AsyncChannel.cpp:262)
==16501==    by 0x6DD30EB: mozilla::ipc::RPCChannel::OnMaybeDequeueOne() (RPCChannel.cpp:438)
==16501==    by 0x6DD8FCF: void DispatchToMethod<mozilla::ipc::RPCChannel, bool (mozilla::ipc::RPCChannel::*)()>(mozilla::ipc::RPCChannel*, bool (mozilla::ipc::RPCChannel::*)(), Tuple0 const&) (tuple.h:383)
==16501==
It looks to me like we RecvWrite(), fire off a Write task to thread 11 here, then soon thereafter happily destroy the libsydney stuff on the main thread from Recv__delete__(), while the write task is still being processed.  How are those supposed to synchronize?
one would have hoped that the stream shutdown (sa_stream_destroy()) would have prevented this.  I suppose libsydney isn't as threadsafe as you'd would hope.  

Basically what we have is:

Thread A calling sa_stream_destroy(i)
Thread B calling sa_stream_write(i)

without any synchronization.
(Assignee)

Comment 5

8 years ago
Created attachment 501215 [details] [diff] [review]
patch v0

Stream writes are requeued on the parent audio thread, but the shutdown calls are not.  This makes it possible to have pending writes queued and racing with shutdown.  The fix is to also shunt AudioStream::Shutdown to the audio thread to restore the natural synchronization expected by the callers.
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Attachment #501215 - Flags: review?(doug.turner)
tracking-fennec: ? → 2.0b4+
Comment on attachment 501215 [details] [diff] [review]
patch v0



>+class AudioShutdownEvent : public nsRunnable
>+{
>+ public:
>+  AudioShutdownEvent(nsAudioStream* owner)
>+  {
>+    mOwner = owner;
>+  }
>+
>+  NS_IMETHOD Run()
>+  {
>+    mOwner->Shutdown();
>+    return NS_OK;
>+  }
>+
>+ private:
>+    nsRefPtr<nsAudioStream> mOwner;
>+};

have you seen roc's patch?  It adds a similar event for shutting down the audio thread.  Would you mind changing the name of this class to something like |AudioStreamShutdownEvent|?

see http://hg.mozilla.org/mozilla-central/rev/29867a7bccf2

>-  if (mStream)
>-    mStream->SetVolume(aVolume);
>+  mStream->SetVolume(aVolume);
>   return true;
> }

...

>   mStream = nsAudioStream::AllocateStream();
>   if (mStream)

I am fine with not testing for mStream, however.... we do have methods called from various places.  If any place calls after the shutdown event is called, but before AudioParent is destroyed, we will crash. Quickly looking, I don't think that will happen.  However, I'd feel more comfortable with NS_ASSERTIONS() or NS_ENSURE so that we can catch any new case and remove it.

Also, remove/assert here too:

http://mxr.mozilla.org/mozilla-central/source/dom/ipc/AudioParent.cpp#211
Attachment #501215 - Flags: review?(doug.turner) → review-
(Assignee)

Comment 7

8 years ago
Created attachment 501511 [details] [diff] [review]
patch v1

Renamed the stream shutdown event.  I've also restored the mStream tests (and added them in more places), and generally reviewed what happens when there's an initialization error.  The old code, and my earlier patch, both could've had problems.  The right way to deal with initialization problems is by fixing bug 612799, but this at least makes a failed stream safe.
Attachment #501215 - Attachment is obsolete: true
Attachment #501511 - Flags: review?(doug.turner)
(Assignee)

Comment 8

8 years ago
Oh, and I changed modelines to match the style of the file, rather than reformatting the entire file.
Attachment #501511 - Flags: review?(doug.turner) → review+
I also saw this on windows tinderbox:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1294818191.1294818480.24265.gz
s: try-w32-slave30
TEST-UNEXPECTED-FAIL | file:///e:/builds/moz2_slave/test/build/reftest/tests/content/media/test/crashtests/466607-1.html | Exited with code -1073741819 during test run
PROCESS-CRASH | file:///e:/builds/moz2_slave/test/build/reftest/tests/content/media/test/crashtests/466607-1.html | application crashed (minidump found)
Thread 23 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | tab process 4064 | automationutils.processLeakLog() | missing output line for total leaks!


http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1294823249.1294823532.15996.gz
s: try-w32-slave30
TEST-UNEXPECTED-FAIL | file:///e:/builds/moz2_slave/test/build/reftest/tests/content/media/test/crashtests/466607-1.html | Exited with code -1073741819 during test run
PROCESS-CRASH | file:///e:/builds/moz2_slave/test/build/reftest/tests/content/media/test/crashtests/466607-1.html | application crashed (minidump found)
Thread 23 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | tab process 4064 | automationutils.processLeakLog() | missing output line for total leaks!

Looks like the same issue from a quick glance.
OS: Linux → All
Hardware: x86_64 → All
Summary: Chrome process intermittently aborts in content/media/test/crashtests/459439-1.html → Chrome process intermittently aborts in content/media/test/crashtests/459439-1.html and crashes in content/media/test/crashtests/466607-1.html and content/media/test/crashtests/466945-1.html
http://hg.mozilla.org/mozilla-central/rev/ae4ee53297b0
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.