Closed Bug 571798 Opened 14 years ago Closed 14 years ago

Invalid read [@ snd_config_searcha_hooks] during shutdown

Categories

(Core :: Audio/Video, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jruderman, Assigned: kinetik)

References

Details

(Keywords: valgrind)

Attachments

(2 files, 2 obsolete files)

Attached file valgrind output
This happens sometimes during shutdown.  I don't have steps to reproduce, but I'm pretty sure there's an ogg vorbis file involved.
Looking at the implementation in libasound, snd_pcm_open/snd_pcm_close are not thread-safe.

I can't find any documentation that states what (if any) ALSA functionality *is* thread-safe, so sydneyaudio could be stepping on toes all over the place.
This general problem is probably largely responsible for the random orange in bug 562004, bug 557432, and maybe others.
Adding two locks to sydneyaudio may be sufficient: a library-level lock for snd_pcm_open/snd_pcm_close, and a per-stream lock for everything else.  This would require changing nsBuiltinDecoderStateMachine's logic back to the emulated-non-blocking approach we used to use, otherwise we'll be forced to hold locks around long-running calls to snd_pcm_writei.

Digging around more, mAudioMonitor should already protect most calls into sydneyaudio, so perhaps we only need a lock for the library and any other threading bugs can be attributed to incorrect use of mAudioMonitor (such as bug 560784).
Blocks: 562004, 557432
Attached patch patch v0 (obsolete) — Splinter Review
(In reply to comment #3)
> Digging around more, mAudioMonitor should already protect most calls into
> sydneyaudio, so perhaps we only need a lock for the library and any other
> threading bugs can be attributed to incorrect use of mAudioMonitor (such as bug
> 560784).

The OS X and Win32 sydneyaudio implementations seem to already be thread-safe with respect to concurrent calls to write/get_position, so I think it actually does make sense to make the per-stream stuff thread safe.

This patch does that, but doesn't address the potential problem with blocking snd_pcm_writei calls vs calls to get_position.  The OS X and Win32 backends won't suffer from this problem, so I think I'll need to address it before asking for review.
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Attached patch patch v1 (obsolete) — Splinter Review
Attachment #453634 - Attachment is obsolete: true
Same as the last patch, but addresses the blocking snd_pcm_writei issue.  The key differences are:

Open the pcm device as non-blocking:

   if (snd_pcm_open(&s->output_unit, 
                    "default", 
                    SND_PCM_STREAM_PLAYBACK, 
-                   0) < 0) {
+                   SND_PCM_NONBLOCK) < 0) {

Handle non-blocking writes by dropping the stream lock, waiting for the pcm to become ready, and reacquiring the lock:

+      if (frames == -EAGAIN && snd_pcm_state(s->output_unit) == SND_PCM_STATE_RUNNING) {
+        pthread_mutex_unlock(&s->mutex);
+        snd_pcm_wait(s->output_unit, -1);
+        pthread_mutex_lock(&s->mutex);
+        continue;
+      }

I think this patch is good to go, but needs testing by more people on more machines.  I've tested Fedora 12 with PulseAudio and direct ALSA and it works fine there (under VMWare Fusion 3.1 and VirtualBox 3.2).
Confirmation that it's not safe to call ALSA library functions for the same device concurrently: http://mailman.alsa-project.org/pipermail/alsa-devel/2010-June/028349.html

Unfortunately, I've found a problem with the patch: in my F12/VMWare Fusion 3.1 setup I'm seeing a livelock where snd_pcm_writei returns EAGAIN, snd_pcm_wait returns 1 (ready), and snd_pcm_writei still returns EAGAIN, so the write makes no progress.  I can usually reproduce this within 2-3 runs of 8 parallel sets of test_playback running.
It looks like that hang is fixed in the current git head of alsa-plugins, most likely fixed by the patch discussed in bug 573924 comment 2.
Blocks: 560784
Still seeing this on m-c of two days back
TEST_PATH=content/media/test/test_playback.html

I get the same valgrind errors as Jesse (comment #1).  Native
runs randomly either succeed, terminate with "out of memory" (!)
or terminate with a double free detected by glibc.
This is on Ubuntu 10.04 x86_64.
Depends on: 574190
(In reply to comment #5)
> Created attachment 454312 [details] [diff] [review]
> patch v1

This patch isn't right.  It looks like it's not safe to call snd_pcm_wait on one thread while another thread calls other snd_pcm_* functions.  I think the current patch also breaks drain, since in non-blocking mode the drain will return immediately.  The drain call needs to be modified to wait on the PCM device and check its status until it moves from DRAINING to whatever state it ends up in after draining.

Rather than try to fix everything at once and end up blocked on updating PulseAudio/ALSA on the test machines, I'll attach a patch that fixes this particular bug.
Attachment #454312 - Attachment is obsolete: true
Attached patch patch v2Splinter Review
Comment on attachment 468249 [details] [diff] [review]
patch v2

This adds a library-global lock that's used for creating and destroying streams.  It doesn't solve the general thread-safety problems we have with the current design, but it will prevent a bunch of crashes caused by asserts like the following:

Assertion 'pa_atomic_load(&(s)->_ref) >= 1' failed at pulse/stream.c:253, function pa_stream_unref(). Aborting.
Attachment #468249 - Flags: review?(chris.double)
Attachment #468249 - Flags: review?(chris.double) → review+
Attachment #468249 - Flags: approval2.0?
Keywords: checkin-needed
http://hg.mozilla.org/mozilla-central/rev/2ba7ee65686d
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: