Closed Bug 340314 Opened 18 years ago Closed 18 years ago

Sound is played to random file descriptors, corrupting mailboxes, address books, cert DBs

Categories

(Core :: Widget, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla1.8.1beta2

People

(Reporter: bugs+mozilla, Assigned: mwu)

References

Details

(Keywords: dataloss, fixed1.8.0.7, fixed1.8.1)

Attachments

(1 file, 4 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.3) Gecko/20060504 Fedora/1.5.0.3-1.1.fc5 Firefox/1.5.0.3
Build Identifier: Thunderbird 1.5.0.2 (X11/20060501)

Sometimes (about 5% of occurrences), when new mail or RSS items arrive, Thunderbird doesn't play the new mail sound, instead, the raw WAV RIFF data is written to a random file descriptor, like the standard output or another open file. This causes address books and mailboxes to corrupt.

When the sound is sent to the standard output or some other socket it is not a problem, just my .xsession_errors becomes filled with junk or I lose a connection; but sometimes the sound is written over my address book or mailbox files, causing data loss.

I have been observing this behavior since Thunderbird 1.0, with different frequencies. I was almost sure that someone should have reported it already, but I couldn't find any bug related to this. I'm setting severity to Critical since it causes me to lose data, change it if it is not appropriate.

Reproducible: Sometimes




Yesterday, I noticed that I haven't heard the new mail song, but ok... Today, when I launched Thunderbird again, it displayed a message saying that my abook.mab was corrupt, that it was backuped to abook.mab.bak and a new one was created. Then, this is what I found inside the abook.mab.bak file (hexdump):

0019190: 2020 2020 285e 4245 3d29 285e 4246 3d29      (^BE=)(^BF=)
00191a0: 285e 4330 3d29 285e 4331 3d29 285e 4332  (^C0=)(^C1=)(^C2
00191b0: 3d29 285e 4333 3d29 285e 4139 5e35 4143  =)(^C3=)(^A9^5AC
00191c0: 295d 0a40 2424 7d31 367d 400a 5249 4646  )].@$$}16}@.RIFF
00191d0: 248c 0000 5741 5645 666d 7420 1000 0000  $...WAVEfmt ....
00191e0: 0100 0100 112b 0000 2256 0000 0200 1000  .....+.."V......
00191f0: 6461 7461 008c 0000 0000 8dff 8eff 8fff  data............
0019200: feff 0100 8eff 8dff ffff 0100 0000 0000  ................
0019210: 0100 ffff 9907 2507 48f5 7200 b406 6009  ......%.H.r...`.
0019220: 80de fcfb ff14 0ff7 f60c b9f5 3b02 9d0b  ............;...

Notice the RIFF WAVE header and the following junk, that matches perfectly the contents of the bell.wav file I use as new mail sound.
Mark, wasn't this happening to you as well, on Linux?
(In reply to comment #1)
> Mark, wasn't this happening to you as well, on Linux?
> 

Yes, I think I've also seen reports of it happening on Windows.
Really? I was hoping it was specific to linux, and maybe even specific to a particular version of linux :-( It just feels like an os-specific thing.
(In reply to comment #3)
> Really? I was hoping it was specific to linux, and maybe even specific to a
> particular version of linux :-( It just feels like an os-specific thing.
> 

See http://groups.google.com/group/mozilla.support.thunderbird/browse_thread/thread/6efafec3896b745d/402c9fac2dbf68d0?q=How+to+salvage+address+book%3F&rnum=1

Especially the first message there. The last message confirms the os as Win XP sp2.
That sounds like a disk file corruption due to loss of power, not due to the new mail sound getting written to the end of the AB. But who knows...
I've just reproduced this on Linux, with today's cvs build of SeaMonkey 1.1a (1.8 branch). So I'm moving this to core mailnews/backend as we don't quite know where the problem is, but its more representative to both TB & SeaMonkey.

The debug output isn't that helpful unfortunately (see below, from beginning of mail delivery to first line of wav file output).

Begin mail message delivery.
GetDiskSpaceAvailable returned: 1221783552 bytes
Incorporate message begin:
uidl string: 43c8d21f000007cc
Incorporate message complete.
Incorporate message begin:
uidl string: 43c8d21f000007cd
Incorporate message complete.
--DOMWINDOW == 20
Incorporate message begin:
uidl string: 43c8d21f000007ce
Incorporate message complete.
Incorporate message begin:
uidl string: 43c8d21f000007cf
Incorporate message complete.
End mail message delivery.
WARNING: NS_ENSURE_TRUE(aEntryName) failed, file /mozilla/branch/mozilla/xpcom/components/nsCategoryManager.cpp, line 560
++WEBSHELL == 9
++DOMWINDOW == 21
++DOMWINDOW == 22
RIFF&WAVEfmt 0u0data~~~~~~~~~~~~~~~~~~}}}~~~}}||}}}|||}}}}}~~~~~~~~~~~~~~~~~~}}}~~~~~~~~~~}}}~~}}}~~~~~~~
Assignee: mscott → nobody
Status: UNCONFIRMED → NEW
Component: General → MailNews: Backend
Ever confirmed: true
Product: Thunderbird → Core
QA Contact: general → backend
Version: unspecified → Trunk
I'm adding the dataloss keyword, as this bug will probably make most users think their address book is lost if it happens to them (even though they could remove the sound data from the end of the backup file and then rename to the original - not exactly obvious).

Also, not sure if we should block TB 2.0 (and SM 1.1?) on this, but it does appear to happen quite a lot on Linux with SeaMonkey 1.0 (I assume therefore the same with Thunderbird 1.5).
Flags: blocking-thunderbird2?
Keywords: dataloss
I agree.
Flags: blocking-thunderbird2? → blocking-thunderbird2+
I've found a problem, though not the exact cause of it. Here's some debug from widget/src/gtk2/nsSound.cpp for a successful play:

PLAY
STREAM Loader spec: file:///all/wavs/icky_small.wav
Play() RESULT 0
...
URI Spec is: file:///all/wavs/icky_small.wav
First checks ok
fd is 59
Writing
Written
Closed

On an unsuccessful play (where in this case the output is standard out or error (?))

PLAY
STREAM Loader spec: file:///all/wavs/ni.wav
Play() RESULT 0
URI Spec is: file:///all/wavs/ni.wav
First checks ok
fd is 0
Writing
RIFFWAVEfmt +data}~zy{~}zy{}{{|~||~}|{}}|}~}zy{~~~}{yxy|~~~~|z{|{yx{}~~~~{z{{zz{}~{wx{}}~~|}~|}}||}}}~}zy

So, for some reason we're getting an fd of zero.

I've also seen this with other sounds, not just mailnews ones (e.g. chatzilla).

I'm moving this to Core/GFX as its looking more like a core sound problem than just mailnews.
Assignee: nobody → general
Component: MailNews: Backend → GFX
Flags: blocking-thunderbird2+
QA Contact: backend → ian
This lost its blocking thunderbird 2.0 status in moving from Core:Mailnews to Core:GFX, so therefore I'm requesting blocking 1.8.1.

In summary, when we play a sound, at least on Linux/gtk2, there is the chance that the sound will actually be written to a random(?) file descriptor. We've seen it on the console output and being written into address books.
Flags: blocking1.8.1?
Summary: new mail sound is played to random file descriptors, corrupting mailboxes and address books → Sound is played to random file descriptors, corrupting mailboxes and address books
(In reply to comment #9)
> I'm moving this to Core/GFX as its looking more like a core sound problem than
> just mailnews.
> 
Sorry, I've just been informed that this should be Core/widget - I'm not 100% sure at this stage if this is just limited to gtk.
Assignee: general → general
Component: GFX → Widget
QA Contact: ian
The GTK2 sound code is in http://lxr.mozilla.org/seamonkey/source/widget/src/gtk2/nsSound.cpp

It might be relevant to ask what version of libesd.so.0 you have, although if this is the _moz_mailbeep sound, then the only question is what implementation of gdk_beep you have.
(In reply to comment #12)
> The GTK2 sound code is in
> http://lxr.mozilla.org/seamonkey/source/widget/src/gtk2/nsSound.cpp
> 
> It might be relevant to ask what version of libesd.so.0 you have, although if
> this is the _moz_mailbeep sound, then the only question is what implementation
> of gdk_beep you have.
> 

It is libesd.so.0.2.36 (have you got a link to the esd website? - I'm struggling to find it).

This is not the _moz_mailbeep sound - this is any custom wav file that you can play through mozilla. I've seen problems with the ones from chatzilla and the custom mail sound.

I should have been more explicit with my debug output in comment 9 - its all in the nsSound.cpp that you quoted.

This is some output I've added to nsSound::Play 

PLAY
STREAM Loader spec: file:///all/wavs/icky_small.wav
Play() RESULT 0

This is the output from nsSound::OnStreamComplete:

URI Spec is: file:///all/wavs/ni.wav
First checks ok
fd is 0
Writing
Written
Closed

Except in some of the failure cases, where fd = 0, I get the output filled with the raw data of the sound file.

From my logs I can see that the fd has recovered to something "sensible" e.g. 45 or 70, but typically once its got set to 0 it stays at zero.

This implies that the non null pointer (I checked) to esd_play_stream_fallback is being supplied with the wrong details or is returning an error (?) to us.

Whichever way, writing to a file descriptor of zero doesn't sound like a good idea to me, but I don't know enough about the sound code to know if that's acceptable or not.
Flags: blocking1.8.1? → blocking1.8.1+
(In reply to comment #13)
> Except in some of the failure cases, where fd = 0, I get the output filled with
> the raw data of the sound file.

I've not got too much time to look further at this at the moment - some help from sound experts would be useful. All I've seen so far is writing to fd = 0 sometimes works for sound output, but sometimes also produces output to stdout, or I guess other open file descriptors. I did look around on the internet, but couldn't find any documentation that might suggest a problem with the specific versions I'm using.
No owner; no patch in sight; not going to block FF2 beta1 for this.
Flags: blocking1.8.1+ → blocking1.8.1-
Whiteboard: [ff2b2]
(In reply to comment #15)
> No owner; no patch in sight; not going to block FF2 beta1 for this.
> 
Erm, why isn't blocking1.8.1 for the final release of the gecko, e.g. the equivalent of FF2 final?

Ok, but this is probably a gecko bug that will affect Thunderbird 2 and SeaMonkey 1.1 so how do we get a) someone to own it and b) show that it should block those releases?
Flags: blocking1.8.1- → blocking1.8.1+
Whiteboard: [ff2b2]
Target Milestone: --- → mozilla1.8.1beta2
*** Bug 343467 has been marked as a duplicate of this bug. ***
*** Bug 322113 has been marked as a duplicate of this bug. ***
This CERTAINLY explains the corrupted (partially overwritten with a sound file) 
cert8.db file documented and analyzed in bug 322113 !
Summary: Sound is played to random file descriptors, corrupting mailboxes and address books → Sound is played to random file descriptors, corrupting mailboxes, address books, cert DBs
Assignee: general → michael.wu
So, the problem is basically this: esd_play_stream_fallback is retarded. It is not safe to use. This is because when the fallback is used, we *can't* just directly write to the fd returned from esd_play_stream_fallback. We have to use the esd_audio_write and esd_audio_close functions. Of course, there's no way you can figure out the fallback was used from the return value. We have to implement our own fallback to do this right. Otherwise, if esd_play_stream_fallback decides to use a hardware fallback such as, say, alsa 0.9, the alsa implementation of esd_audio_open will return 0 for a fd because you don't directly use a fd to send sound out to alsa.  And of course, any file that happens to be open for writing on fd 0 is now overwritten with our sound. Writing to fd 0 normally wouldn't be a problem since the file open on fd 0 is usually /dev/null, but fd 0 is closed after writing, allowing the next file that is opened to be on fd 0.
Attachment #231972 - Flags: superreview?(darin)
Attachment #231972 - Flags: review?(pavlov)
Forgot to check the return value of esd_audio_open.
Attachment #231972 - Attachment is obsolete: true
Attachment #231973 - Flags: superreview?(darin)
Attachment #231973 - Flags: review?(pavlov)
Attachment #231972 - Flags: superreview?(darin)
Attachment #231972 - Flags: review?(pavlov)
Attachment #231973 - Flags: superreview?(darin) → superreview?(roc)
Attachment #231973 - Flags: review?(pavlov) → review?(roc)
Comment on attachment 231973 [details] [diff] [review]
Remove use of esd_play_stream_fallback, v2

+    PRUint8 *buf = nsnull;

Make buf an nsAutoArrayPtr.

After filling buf, make data point to buf + WAV_MIN_LENGTH, that will simplify the later code.

Can we always use the fallback? What's the advantage of using the stream API?

Otherwise looks good.
(In reply to comment #23)
> Can we always use the fallback? What's the advantage of using the stream API?
> 
We could, but if there is a sound server, we should use it. ESD allows alsa systems w/o dmix and linux oss audio devices to play more sounds at once than normally allowed by the hardware. ESD also allows sounds to be sent over the network (in theory).
Attachment #231973 - Attachment is obsolete: true
Attachment #231973 - Flags: superreview?(roc)
Attachment #231973 - Flags: review?(roc)
Attachment #232510 - Flags: superreview?(roc)
Attachment #232510 - Flags: review?(roc)
+      write(fd, data, dataLen);

'write' can sometimes only perform a partial write, so I think here you really need a loop, something like

       while (dataLen > 0) {
         size_t written = write(fd, data, dataLen);
         if (written <= 0)
           break;
         data += written;
         dataLen -= written;
       }
Correctly use write()
Attachment #232510 - Attachment is obsolete: true
Attachment #232510 - Flags: superreview?(roc)
Attachment #232510 - Flags: review?(roc)
Attachment #232581 - Flags: superreview?(roc)
Attachment #232581 - Flags: review?(roc)
Attachment #232581 - Flags: superreview?(roc)
Attachment #232581 - Flags: superreview+
Attachment #232581 - Flags: review?(roc)
Attachment #232581 - Flags: review+
Whiteboard: [at risk] → [checkin needed]
Err, I screwed up the last patch. hold on
Whiteboard: [checkin needed] → [at risk]
Put those ifdefs back..
Attachment #232581 - Attachment is obsolete: true
Whiteboard: [at risk] → [checkin needed]
I think I hit this yesterday, so I checked in attachment 232627 [details] [diff] [review] (v5).

/cvsroot/mozilla/widget/src/gtk2/nsSound.cpp,v  <--  nsSound.cpp
new revision: 1.14; previous revision: 1.13
done
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Comment on attachment 232627 [details] [diff] [review]
Remove use of esd_play_stream_fallback, v5

data corruption bug...
Attachment #232627 - Flags: approval1.8.0.7?
Whiteboard: [checkin needed] → [need-a]
Attachment #232627 - Flags: approval1.8.1?
Comment on attachment 232627 [details] [diff] [review]
Remove use of esd_play_stream_fallback, v5

a=drivers, please land on the branch
Attachment #232627 - Flags: approval1.8.1? → approval1.8.1+
Whiteboard: [need-a] → [checkin needed]
Whiteboard: [checkin needed] → [checkin needed (1.8 branch)]
*** Bug 301555 has been marked as a duplicate of this bug. ***
mozilla/widget/src/gtk2/nsSound.cpp 	1.12.20.1
Keywords: fixed1.8.1
Whiteboard: [checkin needed (1.8 branch)]
we have a fix, and 1.8.1 has taken this fix.
Flags: blocking1.8.0.7?
Flags: blocking1.8.0.7? → blocking1.8.0.7+
Comment on attachment 232627 [details] [diff] [review]
Remove use of esd_play_stream_fallback, v5

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #232627 - Flags: approval1.8.0.7? → approval1.8.0.7+
Whiteboard: [checkin needed (1.8.0 branch)]
Checking in widget/src/gtk2/nsSound.cpp;
/cvsroot/mozilla/widget/src/gtk2/nsSound.cpp,v  <--  nsSound.cpp
new revision: 1.12.26.1; previous revision: 1.12
done
Keywords: fixed1.8.0.7
Whiteboard: [checkin needed (1.8.0 branch)]
does this bug at all address the fact that even though i use alsa for all my sound that mozilla is unable to play sound if another app is playing sound?  and vice versa?  is mozilla even using alsa?  and if not, how to i get it to fall back to alsa?
(In reply to comment #38)
> does this bug at all address the fact that even though i use alsa for all my
> sound that mozilla is unable to play sound if another app is playing sound? 
> and vice versa?
Note that flash uses OSS, and alsa OSS emulation isn't set up to use dmix properly on most systems, thus blocking the sound.

>  is mozilla even using alsa
Not directly. It uses ESD, which supports and defaults to ALSA.

>  and if not, how to i get it to
> fall back to alsa?
> 
I think moving /usr/bin/esd somewhere else might do the trick by disabling the daemon but not the library, but I haven't tried it. It's also likely to break other apps that don't have an ALSA compatible fallback working. There's probably an env var somewhere to tame esd, but I know of any ATM.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: