Closed Bug 515217 Opened 15 years ago Closed 15 years ago

Crash [@ oggplay_buffer_set_last_data ] in test_media_selection.html

Categories

(Core :: Audio/Video, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1
Tracking Status
status1.9.2 --- beta3-fixed
status1.9.1 --- .8-fixed

People

(Reporter: ted, Assigned: kinetik)

References

Details

(Keywords: intermittent-failure, Whiteboard: [ccbr][sg:dos] null deref)

Attachments

(1 file, 1 obsolete file)

See URL for full log, but this is the relevant output and the crashing thread stack:

29336 INFO TEST-PASS | /tests/content/media/test/test_media_selection.html | 320x240.ogv video width
29337 INFO TEST-PASS | /tests/content/media/test/test_media_selection.html | 320x240.ogv video height
NEXT ERROR TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:03:24.962890
NEXT ERROR TEST-UNEXPECTED-FAIL | automation.py | application crashed (minidump found)
Operating system: Mac OS X
                  10.5.2 9C31
CPU: x86
     GenuineIntel family 6 model 15 stepping 6
     4 CPUs

Crash reason:  EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash address: 0xc

Thread 28 (crashed)
 0  XUL!oggplay_buffer_set_last_data [oggplay_buffer.c:bdec1ecda678 : 150 + 0x0]
    eip = 0x026c54a0   esp = 0xb0dcddd4   ebp = 0xb0dcddd8   ebx = 0x02be2bc0
    esi = 0x0e951100   edi = 0x0e951100   eax = 0x00000000   ecx = 0x10155200
    edx = 0x00000000   efl = 0x00010246
 1  XUL!oggplay_step_decoding [oggplay.c:bdec1ecda678 : 716 + 0xb]
    eip = 0x026c2d22   esp = 0xb0dcdde0   ebp = 0xb0dcde38
 2  XUL!nsOggStepDecodeEvent::Run() [nsOggDecoder.cpp:bdec1ecda678 : 667 + 0xd]
    eip = 0x026f5121   esp = 0xb0dcde40   ebp = 0xb0dcde78
 3  XUL!nsThread::ProcessNextEvent(int, int*) [nsThread.cpp:bdec1ecda678 : 527 + 0x9]
    eip = 0x02be2cbe   esp = 0xb0dcde80   ebp = 0xb0dcdec8
 4  XUL!NS_ProcessNextEvent_P(nsIThread*, int) [nsThreadUtils.cpp : 230 + 0x15]
    eip = 0x02b9faba   esp = 0xb0dcded0   ebp = 0xb0dcdef8
 5  XUL!nsThread::ThreadFunc(void*) [nsThread.cpp:bdec1ecda678 : 254 + 0xf]
    eip = 0x02be2e74   esp = 0xb0dcdf00   ebp = 0xb0dcdf48
 6  libnspr4.dylib!_pt_root [ptthread.c:bdec1ecda678 : 228 + 0x8]
    eip = 0x00068cb8   esp = 0xb0dcdf50   ebp = 0xb0dcdf78
 7  libSystem.B.dylib + 0x31c54
    eip = 0x9353cc55   esp = 0xb0dcdf80   ebp = 0xb0dcdfc8
 8  libSystem.B.dylib + 0x31b11
    eip = 0x9353cb12   esp = 0xb0dcdfd0   ebp = 0xb0dcdfec
Blocks: 438871
Whiteboard: [orange]
Whiteboard: [orange] → [orange][ccbr]
(In reply to comment #0)
> See URL for full log

http://tinderbox.mozilla.org/showlog.cgi?tree=Firefox&errorparser=unittest&logfile=1252430439.1252432786.28120.gz&buildtime=1252430439&buildname=OS%20X%2010.5.2%20mozilla-central%20unit%20test&fulltext=1

***

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1253885814.1253886255.12523.gz&fulltext=1
OS X 10.5.2 mozilla-1.9.2 test mochitests on 2009/09/25 06:36:54
{
29466 INFO TEST-PASS | /tests/content/media/test/test_media_selection.html | bug498855-1.ogv duration should be around 0.2
TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:05:06.873648
TEST-UNEXPECTED-FAIL | automation.py | application crashed (minidump found)
Operating system: Mac OS X
                  10.5.2 9C31
CPU: x86
     GenuineIntel family 6 model 15 stepping 2
     2 CPUs

Crash reason:  EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash address: 0x18105d0

Thread 45 (crashed)
 0  XUL + 0x70c5d0
}
I think this is probably the same crash, in test_load.html:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1253906896.1253907324.24793.gz

No symbols in the stack though, so I can't tell. :-/
I think I can reproduce this.  It looks like a thread safety issue.  If I add a one second sleep in oggplay_step_decoding just before oggplay_buffer_set_last_data is called, I crash with the same stacks that Ted saw in the original crash:

Thread 18:
163	    p->stream_info = OGGPLAY_STREAM_LAST_DATA;
#0  0x12d86610 in oggplay_buffer_set_last_data (me=0x1f0a59f0, buffer=0x1f0a6340) at /Users/kinetik/work/mozilla-central/media/liboggplay/src/liboggplay/oggplay_buffer.c:163
#1  0x12d82d99 in oggplay_step_decoding (me=0x1f0a59f0) at /Users/kinetik/work/mozilla-central/media/liboggplay/src/liboggplay/oggplay.c:766
#2  0x12dceda0 in nsOggStepDecodeEvent::Run (this=0x1a67f6a0) at /Users/kinetik/work/mozilla-central/content/media/ogg/nsOggDecoder.cpp:667

Thread 17:
#5  0x12c70019 in nsAutoMonitor::Wait (this=0xb0795b8c, interval=4294967295) at nsAutoLock.h:340
#6  0x12dcbf9c in nsOggDecodeStateMachine::PlayFrame (this=0x1a6809c0) at /Users/kinetik/work/mozilla-central/content/media/ogg/nsOggDecoder.cpp:968
#7  0x12dcca60 in nsOggDecodeStateMachine::Run (this=0x1a6809c0) at /Users/kinetik/work/mozilla-central/content/media/ogg/nsOggDecoder.cpp:1557
Assignee: nobody → kinetik
I wonder if oggplay_buffer_set_last_data should be using buffer_mirror instead of buffer_list.  Making that change is sufficient to run the video mochitests without crashing, which would previously crash during the first test every time (with the sleep hack applied as described above).
Earlier bug in the exact same area of code: bug 464007.
Attached patch patch v0 (obsolete) — Splinter Review
Viktor, do you have an opinion about this?
Attachment #403638 - Flags: review?(chris.double)
Comment on attachment 403638 [details] [diff] [review]
patch v0

r+ from me as long as wiking doesn't come up with an objection.
Attachment #403638 - Flags: review?(chris.double) → review+
if you set the last data into the mirror list instead of the 'real' list, there's no way you are going to be able to retrieve that from the buffer, as oggplay_buffer_retrieve_next operates on buffer_list only and not on buffer_mirror... so how do you get a reference on it later?
It doesn't matter.  The presentation thread might already have read an incorrect value between the callback and set_last_data, which is the same end result, except without crashes.

Some context from #annodex:
17:44 < wiking> kinetik: the problem with that patch is that you might as well
                just simply ignore that last data because you won't be able
                to retrieve that data...
17:47 < kinetik> well, it's marking all of the buffers
17:47 < kinetik> so you might see some later one
17:47 < kinetik> but the problem is really that it is fundamentally not thread
                 safe
17:47 < wiking> kinetik: yes i know :S
17:47 < kinetik> it's modifying buffers after they've been published to the
                 presentation thread
17:48 < kinetik> not sure what to do about that
17:50 < kinetik> it could keep using buffer_list but skip nulls, but there's
                 still a window between retrieve and release where it will be
                 non-null but already in use by the pres thread
17:58 < kinetik> or perhaps get rid of set_last_data and pass the EOF
                 condition into the callback, then have it mark it correctly
                 before publishing it
18:09 < kinetik> ignore my earlier stuff about marking all buffers/skipping
                 nulls, i forgot that it's iterating over the buffer's tracks,
                 not the buffers
18:09 < kinetik> so it does seem like it has to have the correct stream_info
                 value before it's visible to the presentation thread
Attached patch patch v1Splinter Review
Another approach: mark tracks at LAST_DATA before calling the buffer callback.  This basically just does the work of set_last_data before calling the callback, which should be safe as the data isn't visible to the presentation thread until published by the buffer callback.
Attachment #403638 - Attachment is obsolete: true
Comment on attachment 404193 [details] [diff] [review]
patch v1

I can't get any feedback from Viktor on this bug, so I'll ask Chris to take a look and then go with this fix if he's happy with it.
Attachment #404193 - Flags: review?(chris.double)
(In reply to comment #11)
> (From update of attachment 404193 [details] [diff] [review])
> I can't get any feedback from Viktor on this bug, so I'll ask Chris to take a
> look and then go with this fix if he's happy with it.

this patch should be fine
Attachment #404193 - Flags: review?(chris.double) → review+
http://hg.mozilla.org/mozilla-central/rev/03cf4d377f03
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a1
Comment on attachment 404193 [details] [diff] [review]
patch v1

We should take this on 1.9.2 to fix a possibly difficult to track down after the fact crash.

Is it worth taking this on 1.9.1 as well?
Attachment #404193 - Flags: approval1.9.2?
Comment on attachment 404193 [details] [diff] [review]
patch v1

People are crashing on this in 1.9.1, so requesting approval.

http://crash-stats.mozilla.com/report/index/1f014bad-9a6b-408f-a39d-366862091103
Attachment #404193 - Flags: approval1.9.1.6?
Attachment #404193 - Flags: approval1.9.2? → approval1.9.2+
Whiteboard: [orange][ccbr] → [orange][ccbr][sg:dos] null deref
Comment on attachment 404193 [details] [diff] [review]
patch v1

We'll consider this for 1.9.1.7 after it's sufficiently baked on 1.9.2.

(Note that there were only 4 crashes of this in the last week on the 1.9.1 branch.)
Attachment #404193 - Flags: approval1.9.1.6? → approval1.9.1.7?
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/da88c8220c15
Comment on attachment 404193 [details] [diff] [review]
patch v1

Approved for 1.9.1.8, a=dveditz for release-drivers
Attachment #404193 - Flags: approval1.9.1.8? → approval1.9.1.8+
Flags: wanted1.9.0.x-
Whiteboard: [orange][ccbr][sg:dos] null deref → [ccbr][sg:dos] null deref
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: