Last Comment Bug 515217 - Crash [@ oggplay_buffer_set_last_data ] in test_media_selection.html
: Crash [@ oggplay_buffer_set_last_data ] in test_media_selection.html
Status: RESOLVED FIXED
[ccbr][sg:dos] null deref
: intermittent-failure
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: Trunk
: x86 Mac OS X
: -- normal (vote)
: mozilla1.9.3a1
Assigned To: Matthew Gregan [:kinetik]
:
:
Mentors:
Depends on:
Blocks: 438871
  Show dependency treegraph
 
Reported: 2009-09-08 12:01 PDT by Ted Mielczarek [:ted.mielczarek]
Modified: 2012-11-25 19:31 PST (History)
9 users (show)
dveditz: wanted1.9.0.x-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta3-fixed
.8-fixed


Attachments
patch v0 (3.08 KB, patch)
2009-09-29 17:09 PDT, Matthew Gregan [:kinetik]
cajbir.bugzilla: review+
Details | Diff | Splinter Review
patch v1 (7.83 KB, patch)
2009-10-01 22:54 PDT, Matthew Gregan [:kinetik]
cajbir.bugzilla: review+
roc: approval1.9.2+
dveditz: approval1.9.1.8+
Details | Diff | Splinter Review

Description Ted Mielczarek [:ted.mielczarek] 2009-09-08 12:01:36 PDT
See URL for full log, but this is the relevant output and the crashing thread stack:

29336 INFO TEST-PASS | /tests/content/media/test/test_media_selection.html | 320x240.ogv video width
29337 INFO TEST-PASS | /tests/content/media/test/test_media_selection.html | 320x240.ogv video height
NEXT ERROR TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:03:24.962890
NEXT ERROR TEST-UNEXPECTED-FAIL | automation.py | application crashed (minidump found)
Operating system: Mac OS X
                  10.5.2 9C31
CPU: x86
     GenuineIntel family 6 model 15 stepping 6
     4 CPUs

Crash reason:  EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash address: 0xc

Thread 28 (crashed)
 0  XUL!oggplay_buffer_set_last_data [oggplay_buffer.c:bdec1ecda678 : 150 + 0x0]
    eip = 0x026c54a0   esp = 0xb0dcddd4   ebp = 0xb0dcddd8   ebx = 0x02be2bc0
    esi = 0x0e951100   edi = 0x0e951100   eax = 0x00000000   ecx = 0x10155200
    edx = 0x00000000   efl = 0x00010246
 1  XUL!oggplay_step_decoding [oggplay.c:bdec1ecda678 : 716 + 0xb]
    eip = 0x026c2d22   esp = 0xb0dcdde0   ebp = 0xb0dcde38
 2  XUL!nsOggStepDecodeEvent::Run() [nsOggDecoder.cpp:bdec1ecda678 : 667 + 0xd]
    eip = 0x026f5121   esp = 0xb0dcde40   ebp = 0xb0dcde78
 3  XUL!nsThread::ProcessNextEvent(int, int*) [nsThread.cpp:bdec1ecda678 : 527 + 0x9]
    eip = 0x02be2cbe   esp = 0xb0dcde80   ebp = 0xb0dcdec8
 4  XUL!NS_ProcessNextEvent_P(nsIThread*, int) [nsThreadUtils.cpp : 230 + 0x15]
    eip = 0x02b9faba   esp = 0xb0dcded0   ebp = 0xb0dcdef8
 5  XUL!nsThread::ThreadFunc(void*) [nsThread.cpp:bdec1ecda678 : 254 + 0xf]
    eip = 0x02be2e74   esp = 0xb0dcdf00   ebp = 0xb0dcdf48
 6  libnspr4.dylib!_pt_root [ptthread.c:bdec1ecda678 : 228 + 0x8]
    eip = 0x00068cb8   esp = 0xb0dcdf50   ebp = 0xb0dcdf78
 7  libSystem.B.dylib + 0x31c54
    eip = 0x9353cc55   esp = 0xb0dcdf80   ebp = 0xb0dcdfc8
 8  libSystem.B.dylib + 0x31b11
    eip = 0x9353cb12   esp = 0xb0dcdfd0   ebp = 0xb0dcdfec
Comment 1 Serge Gautherie (:sgautherie) 2009-09-25 06:53:41 PDT
(In reply to comment #0)
> See URL for full log

http://tinderbox.mozilla.org/showlog.cgi?tree=Firefox&errorparser=unittest&logfile=1252430439.1252432786.28120.gz&buildtime=1252430439&buildname=OS%20X%2010.5.2%20mozilla-central%20unit%20test&fulltext=1

***

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1253885814.1253886255.12523.gz&fulltext=1
OS X 10.5.2 mozilla-1.9.2 test mochitests on 2009/09/25 06:36:54
{
29466 INFO TEST-PASS | /tests/content/media/test/test_media_selection.html | bug498855-1.ogv duration should be around 0.2
TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:05:06.873648
TEST-UNEXPECTED-FAIL | automation.py | application crashed (minidump found)
Operating system: Mac OS X
                  10.5.2 9C31
CPU: x86
     GenuineIntel family 6 model 15 stepping 2
     2 CPUs

Crash reason:  EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash address: 0x18105d0

Thread 45 (crashed)
 0  XUL + 0x70c5d0
}
Comment 2 Ted Mielczarek [:ted.mielczarek] 2009-09-25 13:15:18 PDT
I think this is probably the same crash, in test_load.html:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1253906896.1253907324.24793.gz

No symbols in the stack though, so I can't tell. :-/
Comment 3 Matthew Gregan [:kinetik] 2009-09-28 20:24:33 PDT
I think I can reproduce this.  It looks like a thread safety issue.  If I add a one second sleep in oggplay_step_decoding just before oggplay_buffer_set_last_data is called, I crash with the same stacks that Ted saw in the original crash:

Thread 18:
163	    p->stream_info = OGGPLAY_STREAM_LAST_DATA;
#0  0x12d86610 in oggplay_buffer_set_last_data (me=0x1f0a59f0, buffer=0x1f0a6340) at /Users/kinetik/work/mozilla-central/media/liboggplay/src/liboggplay/oggplay_buffer.c:163
#1  0x12d82d99 in oggplay_step_decoding (me=0x1f0a59f0) at /Users/kinetik/work/mozilla-central/media/liboggplay/src/liboggplay/oggplay.c:766
#2  0x12dceda0 in nsOggStepDecodeEvent::Run (this=0x1a67f6a0) at /Users/kinetik/work/mozilla-central/content/media/ogg/nsOggDecoder.cpp:667

Thread 17:
#5  0x12c70019 in nsAutoMonitor::Wait (this=0xb0795b8c, interval=4294967295) at nsAutoLock.h:340
#6  0x12dcbf9c in nsOggDecodeStateMachine::PlayFrame (this=0x1a6809c0) at /Users/kinetik/work/mozilla-central/content/media/ogg/nsOggDecoder.cpp:968
#7  0x12dcca60 in nsOggDecodeStateMachine::Run (this=0x1a6809c0) at /Users/kinetik/work/mozilla-central/content/media/ogg/nsOggDecoder.cpp:1557
Comment 4 Matthew Gregan [:kinetik] 2009-09-28 21:50:24 PDT
I wonder if oggplay_buffer_set_last_data should be using buffer_mirror instead of buffer_list.  Making that change is sufficient to run the video mochitests without crashing, which would previously crash during the first test every time (with the sleep hack applied as described above).
Comment 5 Matthew Gregan [:kinetik] 2009-09-28 21:55:38 PDT
Earlier bug in the exact same area of code: bug 464007.
Comment 6 Matthew Gregan [:kinetik] 2009-09-29 17:09:53 PDT
Created attachment 403638 [details] [diff] [review]
patch v0

Viktor, do you have an opinion about this?
Comment 7 cajbir (:cajbir) 2009-09-30 21:02:49 PDT
Comment on attachment 403638 [details] [diff] [review]
patch v0

r+ from me as long as wiking doesn't come up with an objection.
Comment 8 Viktor Gal 2009-09-30 21:48:30 PDT
if you set the last data into the mirror list instead of the 'real' list, there's no way you are going to be able to retrieve that from the buffer, as oggplay_buffer_retrieve_next operates on buffer_list only and not on buffer_mirror... so how do you get a reference on it later?
Comment 9 Matthew Gregan [:kinetik] 2009-09-30 22:15:35 PDT
It doesn't matter.  The presentation thread might already have read an incorrect value between the callback and set_last_data, which is the same end result, except without crashes.

Some context from #annodex:
17:44 < wiking> kinetik: the problem with that patch is that you might as well
                just simply ignore that last data because you won't be able
                to retrieve that data...
17:47 < kinetik> well, it's marking all of the buffers
17:47 < kinetik> so you might see some later one
17:47 < kinetik> but the problem is really that it is fundamentally not thread
                 safe
17:47 < wiking> kinetik: yes i know :S
17:47 < kinetik> it's modifying buffers after they've been published to the
                 presentation thread
17:48 < kinetik> not sure what to do about that
17:50 < kinetik> it could keep using buffer_list but skip nulls, but there's
                 still a window between retrieve and release where it will be
                 non-null but already in use by the pres thread
17:58 < kinetik> or perhaps get rid of set_last_data and pass the EOF
                 condition into the callback, then have it mark it correctly
                 before publishing it
18:09 < kinetik> ignore my earlier stuff about marking all buffers/skipping
                 nulls, i forgot that it's iterating over the buffer's tracks,
                 not the buffers
18:09 < kinetik> so it does seem like it has to have the correct stream_info
                 value before it's visible to the presentation thread
Comment 10 Matthew Gregan [:kinetik] 2009-10-01 22:54:22 PDT
Created attachment 404193 [details] [diff] [review]
patch v1

Another approach: mark tracks at LAST_DATA before calling the buffer callback.  This basically just does the work of set_last_data before calling the callback, which should be safe as the data isn't visible to the presentation thread until published by the buffer callback.
Comment 11 Matthew Gregan [:kinetik] 2009-10-06 16:15:46 PDT
Comment on attachment 404193 [details] [diff] [review]
patch v1

I can't get any feedback from Viktor on this bug, so I'll ask Chris to take a look and then go with this fix if he's happy with it.
Comment 12 Viktor Gal 2009-10-06 21:47:30 PDT
(In reply to comment #11)
> (From update of attachment 404193 [details] [diff] [review])
> I can't get any feedback from Viktor on this bug, so I'll ask Chris to take a
> look and then go with this fix if he's happy with it.

this patch should be fine
Comment 13 Matthew Gregan [:kinetik] 2009-10-14 19:17:28 PDT
http://hg.mozilla.org/mozilla-central/rev/03cf4d377f03
Comment 14 Matthew Gregan [:kinetik] 2009-10-26 18:48:06 PDT
Comment on attachment 404193 [details] [diff] [review]
patch v1

We should take this on 1.9.2 to fix a possibly difficult to track down after the fact crash.

Is it worth taking this on 1.9.1 as well?
Comment 15 Matthew Gregan [:kinetik] 2009-11-03 14:58:54 PST
Comment on attachment 404193 [details] [diff] [review]
patch v1

People are crashing on this in 1.9.1, so requesting approval.

http://crash-stats.mozilla.com/report/index/1f014bad-9a6b-408f-a39d-366862091103
Comment 16 Samuel Sidler (old account; do not CC) 2009-11-04 17:11:19 PST
Comment on attachment 404193 [details] [diff] [review]
patch v1

We'll consider this for 1.9.1.7 after it's sufficiently baked on 1.9.2.

(Note that there were only 4 crashes of this in the last week on the 1.9.1 branch.)
Comment 17 Matthew Gregan [:kinetik] 2009-11-11 18:27:23 PST
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/da88c8220c15
Comment 18 Daniel Veditz [:dveditz] 2009-12-18 11:57:02 PST
Comment on attachment 404193 [details] [diff] [review]
patch v1

Approved for 1.9.1.8, a=dveditz for release-drivers
Comment 19 Chris Pearce (:cpearce) 2009-12-20 18:16:40 PST
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/db244e6aa78b

Note You need to log in before you can comment on or make changes to this bug.