Closed Bug 469635 Opened 11 years ago Closed 11 years ago

ALSA error messages from esound daemon on Linux build machines

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cajbir, Unassigned)

References

Details

Attachments

(2 files, 6 obsolete files)

The landing of bug 419275 is causing random oranges on Linux tinderboxes:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229140245.1229142959.23926.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229322185.1229324135.13138.gz

The errors seem to be libcanberra causing alsa to report errors caused by lack of a sound card in the tinderbox machine.

These errors are similar to ones I use to get in the video code that used the
libsydneyaudio alsa backend. The alsa library outputs error messages in a
'chatty' fashion and we had to turn them off in a change to libsydneyaudio.
Attached patch Patch (obsolete) — Splinter Review
libcanberra only provides an API for backends, not actual device detection. The only way I see to solve this is to dlopen ALSA and set a blank error handler like we do for sydney audio.
Assignee: nobody → ventnor.bugzilla
Attachment #353126 - Flags: superreview?(roc)
Attachment #353126 - Flags: review?(roc)
+    if (!libasound) {
+        libasound = PR_LoadLibrary("libasound.so.2");
+        if (libasound) {
+            snd_lib_error_set_handler_fn snd_lib_error_set_handler =
+                 (snd_lib_error_set_handler_fn) PR_FindFunctionSymbol(libasound, "snd_lib_error_set_handler");
+            if (snd_lib_error_set_handler) {
+                snd_lib_error_set_handler(quiet_error_handler);
+            } else {
+                PR_UnloadLibrary(libasound);
+                libasound = nsnull;
             }
         }

Can't we unconditionally unload the library here, after we've called the function we care about?
(In reply to comment #2)
> Can't we unconditionally unload the library here, after we've called the
> function we care about?

doublec says that once we unload the library we risk losing that change.
If you set the error handler then unload the shared library, I suspect that change would be lost unless we have loaded it already for some other reason (ie. libcanberra loaded it, or we're linked against it for video/audio support). I'm not 100% sure though.
I'm hoping that something we depend on has linked against it, yes. If not, then it doesn't really matter what we do here, right?
Attached patch Patch 2 (obsolete) — Splinter Review
Now that I think about it, considering this is only necessary if ALSA is the canberra backend then this may still work.
Attachment #353126 - Attachment is obsolete: true
Attachment #353140 - Flags: superreview?(roc)
Attachment #353140 - Flags: review?(roc)
Attachment #353126 - Flags: superreview?(roc)
Attachment #353126 - Flags: review?(roc)
Attachment #353140 - Flags: superreview?(roc)
Attachment #353140 - Flags: superreview+
Attachment #353140 - Flags: review?(roc)
Attachment #353140 - Flags: review+
Keywords: checkin-needed
Whiteboard: [needs landing]
Pushed bb18fee38c6a to trunk.
Blocks: 419275
Status: NEW → RESOLVED
Closed: 11 years ago
No longer depends on: 419275
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [needs landing] → [needs 191 landing]
this build on changeset 79136a417a3a:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229506263.1229508801.972.gz
still shows errors:
ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_card_driver returned error: No such device
ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_concat returned error: No such device
ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_refer returned error: No such device
ALSA lib conf.c:3947:(snd_config_expand) Evaluate error: No such device
ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_card_driver returned error: No such device
ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_concat returned error: No such device
ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_refer returned error: No such device
ALSA lib conf.c:3947:(snd_config_expand) Evaluate error: No such device
ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_card_driver returned error: No such device
ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_concat returned error: No such device
ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_refer returned error: No such device
ALSA lib conf.c:3947:(snd_config_expand) Evaluate error: No such device

is this expected?
I haven't backed out the patch, but it didn't work. The build logs still show these messages.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229500040.1229503150.15781.gz#err0
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Is it because of the unloading of the library after making the call? Easy to test locally by disabling the sound card - or run in a VM with no sound hardware set.
Attached patch Patch 3 (obsolete) — Splinter Review
I tested this patch by removing my sound module and yes, it indeed works.
Attachment #353140 - Attachment is obsolete: true
Attachment #353549 - Flags: superreview?(roc)
Attachment #353549 - Flags: review?(roc)
This patch is applied on top of patch 2, seeing as patch 2 wasn't backed out.
Attachment #353549 - Flags: superreview?(roc)
Attachment #353549 - Flags: superreview+
Attachment #353549 - Flags: review?(roc)
Attachment #353549 - Flags: review+
Keywords: checkin-needed
Whiteboard: [needs 191 landing] → [needs landing]
Pushed d92586bc82f6 to trunk
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing] → [needs 191 landing]
I'm defeated then, since I get nothing at all on my box when I disable my sound card and run the mochitests that cause these messages.

I'm wondering what the tests run that call a system sound. Do these builds also have special compile options?
And I verified that the dummy error-handling function was called when I added a printf into it.
Attached patch Patch 4 (obsolete) — Splinter Review
roc gave me a suggestion for a new strategy; the library may not be given that file name on the tinderboxes so we should lookup via symbol instead of filename.
Attachment #353549 - Attachment is obsolete: true
Attachment #353612 - Flags: superreview?(roc)
Attachment #353612 - Flags: review?(roc)
Attachment #353612 - Flags: superreview?(roc)
Attachment #353612 - Flags: superreview+
Attachment #353612 - Flags: review?(roc)
Attachment #353612 - Flags: review+
Pushed that as 545936b5bbe4. I'll leave the bug open though until we're sure it's fixed :-)
Still happening:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229574617.1229577197.5534.gz

I noticed that there are two error handlers in libasound (one seems to be for internal use).  Depending on how the library is built, you can only override one of them using snd_lib_error_set_handler:

int snd_lib_error_set_handler(snd_lib_error_handler_t handler)
{                                        
        snd_lib_error = handler == NULL ? snd_lib_error_default : handler;
#ifndef NDEBUG                                
        if (snd_lib_error != snd_lib_error_default)
                snd_err_msg = snd_lib_error;
#endif                                
        return 0;                     
}

I booted up my CentOS 5 reference VM and confirmed via disassembly that snd_lib_error_set_handler in libasound.so.2 is compiled such that it allows both handlers to be overridden.  From a quick scan of the errors we're seeing, they're all logged via the snd_lib_error handler anyway.

Do the test machines have libcanberra installed?  My reference VM doesn't, and I haven't done anything with it since I downloaded it from ftp.mozilla.org.  If the test machines don't, then I wonder if this is being caused through using libesd (which does exist and links against libasound).  Maybe just moving the snd_lib_error_set_handler call above the load and initialization of libesd would be sufficient to silence these warnings.
Duplicate of this bug: 476032
(In reply to comment #19)
> Do the test machines have libcanberra installed?  My reference VM doesn't, and
> I haven't done anything with it since I downloaded it from ftp.mozilla.org. 

It appears not based on lack of matches to /usr/lib/libcan*. The downloadable reference VM will be slightly behind the current state of the art, it'll be one of the version here
  https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.0
but there's nothing there modifying sound libraries.
Anyone want to give this another shot in light of comment 19 and comment 21? These extra error messages make it a lot harder to see real test failures.
Matthew, can you create a patch for this? Even if you can't reproduce the problem, we can try to land it and see if it fixes things.
Attached patch patch v5 (obsolete) — Splinter Review
Not sure this'll work, but let's see.  Move the call to snd_lib_error_set_handler above the opening of ESD.  Add some debug logging in a bunch of places to help see what's going on.  We can remove the logging after one cycle on the build machines.
Attachment #353612 - Attachment is obsolete: true
Attachment #363047 - Flags: superreview?(roc)
Attachment #363047 - Flags: review?(roc)
Attachment #363047 - Flags: superreview?(roc)
Attachment #363047 - Flags: superreview+
Attachment #363047 - Flags: review?(roc)
Attachment #363047 - Flags: review+
Keywords: checkin-needed
Whiteboard: [needs landing]
Attached patch revert debug logging (obsolete) — Splinter Review
If patch v5 is successful at suppressing the errors we're seeing on the test machines, this patch can be applied to remove the debug logging I added in patch v5.  No need for review because it's just a straight revert of the logging parts of patch v5.
Comment on attachment 363047 [details] [diff] [review]
patch v5

This was pushed

http://hg.mozilla.org/mozilla-central/rev/03b4b372bacc

but had no effect and so was backed out.
Attachment #363234 - Attachment is obsolete: true
The logging produced:

asound_hack: 1 lib_err 442ab40, err_msg 442a9f0
asound_hack: quiet_error_handler installed 2 lib_err 157e0c4, err_msg 157e0c4
asound_hack: 3 lib_err 157e0c4, err_msg 157e0c4, can (nil)
ALSA lib confmisc.c:670:(snd_func_card_driver) cannot find card '0'
NEXT ERROR ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_card_driver returned error: No such device
ALSA lib confmisc.c:391:(snd_func_concat) error evaluating strings
NEXT ERROR ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_concat returned error: No such device
ALSA lib confmisc.c:1070:(snd_func_refer) error evaluating name
NEXT ERROR ALSA lib conf.c:3479:(_snd_config_evaluate) function snd_func_refer returned error: No such device
ALSA lib conf.c:3947:(snd_config_expand) Evaluate error: No such device
ALSA lib pcm.c:2143:(snd_pcm_open_noupdate) Unknown PCM default

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1235101197.1235107655.3423.gz#err0
Keywords: checkin-needed
Whiteboard: [needs landing]
Thanks Karl.  The extra logging at least proves that we successfully install our quiet error handler before the messages are logged (and that they're still logged despite that).  It also confirms that we're not finding libcanberra, so we can probably rule that out of the equation.

The errors are logged when attempting to open a sound device.  The last error (from snd_pcm_open_noupdate) is logged when the configuration phase has failed.
Uh, I'm sure I was going somewhere with the last paragraph of comment #28, but I don't know where.

Anyway, I spent this evening setting up a build environment on my reference VM and I can now reproduce this locally.

It turns out that it's the call to esd_open_sound that's causing the ALSA errors, but they're not logged by any code executing in our process.  esd_open_sound will spawn a (local, temporary) copy of the ESD daemon if there isn't a daemon already running.  The daemon process attempts to initialize ALSA and ends up printing the messages we're seeing.

Later versions of ESD (starting with 0.2.39, see http://bugzilla.gnome.org/buglist.cgi?query=528718) call snd_lib_error_set_handler to suppress error logging.

I'm not sure why we only just started noticing this.  Maybe something was installed on the build hosts that pulled in ESD?

I'm not sure why we call esd_open_sound anyway--the esdref we get back is never used again (except to close ESD again), and the message about it being needed for streams but not playing sounds is curious.

Removing the call wouldn't help, anyway, because we'll eventually call esd_play_stream (which calls esd_open_sound internally).  But removing both of those and just using esd_audio_open (and calling snd_lib_error_set_handler ourselves beforehand) would work, since that calls into the audio backend directly, without spawning a daemon.

Removing ESD from the build machines would stop the errors, but also causes libmozgnome.so (which is linked to libesd.so.0) to fail to load.

Also, after calling esd_open_sound we should be testing for esdref < 0, not !esdref, since it's returning a file descriptor.
It sounds like we need to call esd_open_sound to initialize the library, but it's hard to tell, the docs (http://www.tux.org/~ricdude/dbdocs/esound_api.html) are not particularly enlightening.  Given that esd_play_stream calls it internally anyway, I'm not convinced.  Trawling through CVS history doesn't shed much light on the situation, either.

CentOS is still shipping ESD 0.2.36 in 5.1, so we can't easily pull an updated package to solve this.

It'd be bad to always use esd_audio_open, since then we'd never use an existing ESD daemon, and might run into problems sharing the audio hardware that would've been avoided had we used the daemon.

We could work around it on the build machines: rename /usr/bin/esd to /usr/bin/esd.real and put an esd script in place that execs esd.real with stderr redirected.
Overkill for this bug perhaps, but could we remove the use of ESD completely by refactoring our content/media Wave code and reimplementing nsSound on top of that?
Yup, I think we should do that eventually.  My one concern would be that simple sound playback might break for some users since ESD supports more backends (and, I think, can be configured to use different backends without a recompile) than libsydneyaudio.  Obviously the same users wouldn't have working audio with the new video/audio stuff, so we'd want to fix it anyway, which might not make this much of a concern.
I'm not sure what our best option is, short of that.
Given that this is only a problem on the build machines, I think the best option is silencing ESD on those machines:

% mv /usr/bin/esd /usr/bin/esd.real
% cat > /usr/bin/esd
#! /bin/sh
exec /usr/bin/esd.real "$@" 2>/dev/null
^D
% chmod +x /usr/bin/esd

I've tested that on my reference VM and it does the trick.
This fixes the esd_sound_open error handling case, and removes all of the libasound hackery now that we know it's unnecessary.
Attachment #363047 - Attachment is obsolete: true
Attachment #363521 - Flags: superreview?(roc)
Attachment #363521 - Flags: review?(roc)
Attachment #363521 - Flags: superreview?(roc)
Attachment #363521 - Flags: superreview+
Attachment #363521 - Flags: review?(roc)
Attachment #363521 - Flags: review+
Pushed http://hg.mozilla.org/mozilla-central/rev/57ace2f00d62.

Retitled and reassigning to build machine component for someone to look at comment #30 and comment #34.
Assignee: ventnor.bugzilla → nobody
Component: Widget: Gtk → Release Engineering
Product: Core → mozilla.org
QA Contact: gtk → release
Summary: gnome sound support via libcanberra outputs alsa error messages to stdout/stderr in machines with no sound card → ALSA error messages from esound daemon on Linux build machines
Version: Trunk → other
I backed out that patch (http://hg.mozilla.org/mozilla-central/rev/c984d8c6bca4) for two reasons.  There was a mochitest timeout (which shouldn't be related), and it made the ALSA error logging even worse (from 4 sets of errors to 8 pages).  Without the quiet handler installed, end up seeing a set of errors every time we go through the non-daemon ESD fallback path as well.

So we either need a combination of the quiet handler installed and ESD silenced per comment #34, or an upgraded ESD on the build machines.  Or give up on the ESD approach and spent the time converting nsSound to use the Wave decoder as Robert suggested in comment #31.
I take it that the nsSound route isn't the preferred one?
We should do both IMHO, but replacing nsSound is a fair bit of work that wasn't on our list of things to do in the 1.9.1 timeframe.
Okay, in the meantime I'll go ahead and deploy the esd wrapper in our pool.
I've deployed the esd wrapper on all of the 1.9.1, tm, and m-c slave as well as the try slaves. Only place left is the ref platform itself.

Roc/Matthew, do you want to take this bug back or file a new one for nsSound?
Thanks Ben!

I've filed a new bug for nsSound: bug 479822.
Alright, this has been deployed on the ref platform as well. We're all done here.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.