Closed Bug 496581 Opened 15 years ago Closed 15 years ago

crash [@ oggplay_buffer_set_last_data ]

Categories

(Core :: Audio/Video, defect)

1.9.1 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1
Tracking Status
blocking1.9.1 --- .4+
status1.9.1 --- .4-fixed

People

(Reporter: kbrosnan, Assigned: roc)

References

()

Details

(4 keywords, Whiteboard: [3.5Beta99testday])

Crash Data

Attachments

(4 files)

Load http://tinyvid.tv/show/4yy6letux3ct watch the video. After the video ends drag the playback marker to the right of the end. Firefox will crash after dragging for a couple seconds.

0  	libxul.so  	oggplay_buffer_set_last_data  	 media/liboggplay/src/liboggplay/oggplay_buffer.c:150
1 	libxul.so 	oggplay_step_decoding 	media/liboggplay/src/liboggplay/oggplay.c:715
2 	libxul.so 	nsOggStepDecodeEvent::Run() 	content/media/video/src/nsOggDecoder.cpp:664
3 	libxul.so 	nsThread::ProcessNextEvent(int, int*) 	xpcom/threads/nsThread.cpp:510
4 	libxul.so 	NS_ProcessNextEvent_P(nsIThread*, int) 	nsThreadUtils.cpp:230
5 	libxul.so 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:254
6 	libnspr4.so 	_pt_root 	nsprpub/pr/src/pthreads/ptthread.c:228
7 	libpthread-2.9.so 	libpthread-2.9.so@0x64fe

http://crash-stats.mozilla.com/report/index/a48086c5-f4d4-4ece-af8f-01cdf2090605
http://crash-stats.mozilla.com/report/index/5dbb8902-c178-40d3-82a7-9926d2090605
Flags: blocking1.9.1?
Whiteboard: [3.5Beta99testday]
Can you please specify what do you mean with dragging a couple of seconds? For me the scrubber jumps to the beginning of the file when dragging after the end.
Linux only? I can't reproduce on OSX.
Confirmed on w32, can't reproduce on OSX.
OS: Linux → All
Hardware: x86 → All
This regressed between nightly builds of 2009053004 and 2009053104
sorry, I just got 0530 build to crash... let me go back further.
it's at least as old as 5/26 continually messing with the slider eventually causes a crash.
so examining this more closely....  There seems to be an older underlying crash that was exposed on 6/2 with the quickly wiggling timeline slider when it is dragged to the right momentarily

20090601 http://hg.mozilla.org/releases/mozilla-1.9.1/rev/0e772546d16b does not exhibit the problem, but
20090602 http://hg.mozilla.org/releases/mozilla-1.9.1/rev/18046f6a7cb1 does crash with wiggling slider

The underlying crash manually messing with the slider as tmyoung posted in his screencast has been around much longer.  at least 5/26  (I didn't check further back)
Attached video video of crash
By staying to the right of the end of the video I get Firefox to crash much quicker.
Given by Tracys changeset the following checkins happened in this timeframe:

http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?fromchange=0e772546d16b&tochange=18046f6a7cb1

Eventually a regression from the fix on bug 495300 or bug 495319? If Roc cannot say it right away, I could start a bisect.
Ok, windbg doesn't help me much here. I only get memory access errors. I don't have an idea if that could be the root cause here?

me	0x10655e82 struct _OggPlay *
buffer	0x00000000 struct OggPlayBuffer *
 buffer_list	<Memory access error>
  *	<Memory access error>
   *	
 buffer_mirror	<Memory access error>
 buffer_size	<Memory access error>
 last_filled	<Memory access error>
 last_emptied	<Memory access error>
 frame_sem	<Memory access error>

That would mean with the following assignment we will have a bad pointer and accessing a erroneous child will cause us to crash:

http://mxr.mozilla.org/mozilla1.9.1/source/media/liboggplay/src/liboggplay/oggplay_buffer.c#147

I cannot reproduce the crash in my Windows Vista environment where the build system has been setup. I now have to setup everything on XP before I could give more information and bisect the range.
> buffer    0x00000000 struct OggPlayBuffer *

that's a simple null pointer. a stack trace will show the caller, the callee (oggplay_buffer_set_last_data) clearly doesn't approve of null pointers.

as to the line number, keep in mind that MSVC optimizes heavily, it can favor

147   p = (OggPlayCallbackInfo *)buffer->buffer_list

over
143   if (buffer->last_filled == -1) {

under the assumptions that:
1. it'll crash doing buffer-> in either case
2. buffer->buffer_list is more useful
3. it will need to check [buffer->last_filled];
4. it wants the array index after it has buffer_list
I should have been added a bit more to my last comment. Sorry Timeless. When checking oggplay_step_decoding in oggplay.c I can see only one call to oggplay_buffer_set_last_data which happens here:

http://mxr.mozilla.org/mozilla1.9.1/source/media/liboggplay/src/liboggplay/oggplay.c#709

Comparing with the locals output in oggplay_step_decoding 'me' is a null pointer there but not in oggplay_buffer_set_last_data. That's why I think something is corrupt. Am I wrong?

me	0x00000000 struct _OggPlay *
 reader	<Memory access error>
 oggz	<Memory access error>
 decode_data	<Memory access error>
 callback_info	<Memory access error>
 num_tracks	<Memory access error>
 all_tracks_initialised	<Memory access error>
 callback_period	<Memory access error>
 callback	<Memory access error>
 callback_user_ptr	<Memory access error>
 target	<Memory access error>
 active_tracks	<Memory access error>
 buffer	<Memory access error>
 presentation_time	<Memory access error>
 trash	<Memory access error>
 shutdown	<Memory access error>
 pt_update_valid	<Memory access error>
 duration	<Memory access error>
chunk_count	1
i	1
info	0x00000000
 *	<Memory access error>
r	4414770
remaining	1
Everything looks fine two frames higher in nsOggStepDecodeEvent::Run()

http://mxr.mozilla.org/mozilla1.9.1/source/content/media/video/src/nsOggDecoder.cpp#664


initialDownloadPosition	18961292836974540
mon	class nsAutoMonitor
r	62913648 (No matching enumerant)
this	0x06cdab50 class nsOggStepDecodeEvent *
 nsRunnable	class nsRunnable
 mDecodeStateMachine	0x045ea200 class nsOggDecodeStateMachine *
 mPlayer	0x03b7d940 struct _OggPlay *
  reader	0x03b7bc70 struct _OggPlayReader *
  oggz	0x03b91400
  decode_data	0x03b7e160
  callback_info	0x03b0f5b0 struct _OggPlayCallbackInfo *
  num_tracks	3
  all_tracks_initialised	1
  callback_period	172018876
  callback	0x103bc4b3
  callback_user_ptr	0x00000000
  target	272305880707
  active_tracks	0
  buffer	0x01eb7040 struct OggPlayBuffer *
   buffer_list	0x040229c0
   buffer_mirror	0x04022ab0
   buffer_size	20
   last_filled	6
   last_emptied	6
   frame_sem	0x000005a4
  presentation_time	247019105935
  trash	0x00000000 struct OggPlaySeekTrash *
  shutdown	0
  pt_update_valid	0
  duration	64164
Attached file wingdb session
I spent some time together with timeless on debugging this problem with windbg. Attached you can find our findings. Looks like we wanna access an buffer element outside of its boundary (20).
Chris, could you have a look at it?
I don't think we should block, but we should still take a fix.
Flags: blocking1.9.1? → wanted1.9.1+
Attached file stack
With a debug build I get an assertion when dragging the scrubber to the right when it is near the end position:

ASSERTION: Can only seek in range [0,duration]: 'mSeekTime >= 0 && mSeekTime <= duration'

See the attached wingdb stack.
On trunk this regressed between the following two builds from 090531 and 090601.

Checkins in this timeframe: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f014c39be5dc&tochange=ab395a1916be

Due to there are only two check-ins which match the regression range on 1.9.1 and the patch on bug 487519 was backed out again, bug 495300 has to be the reason for this regression.
Attached patch fixSplinter Review
OK, this bug is because we seek to the end, decode one frame, then decode another frame to see if we're at the end. In this case we are, so DecodeFrame returns E_OGGPLAY_OK. Then we go back to the DECODING state and try to decode another frame. This triggers a crash in oggplay, but it's really our fault because we shouldn't try to do oggplay_step_decoding after E_OGGPLAY_OK was returned without seeking in between.

Obviously we should go to the COMPLETED state when we know we've got the last frame.

We also shouldn't be reading mDecodedFrames while not holding the lock, that was bad.
Assignee: nobody → roc
Attachment #382085 - Flags: review?(chris.double)
While seeking to the end a lot in this testcase, we hit a lot of
###!!! ASSERTION: Can only seek in range [0,duration]: 'mSeekTime >= 0 && mSeekTime <= duration', file /home/roc/shared/mozilla-checkin/content/media/video/src/nsOggDecoder.cpp, line 1236
But I'm pretty sure this is due to the first frame presentation time being reported incorrectly by liboggplay; see https://bugzilla.mozilla.org/show_bug.cgi?id=496051#c3.
Attachment #382085 - Flags: review?(chris.double) → review+
Comment on attachment 382085 [details] [diff] [review]
fix

Avoids a crash while seeking to the end of some videos.
Comment on attachment 382085 [details] [diff] [review]
fix

How well tested are we here? Let's definitely start by getting it on trunk.
Oh, I was not aware that we could land on trunk without 1.9.1 approval now
Whiteboard: [3.5Beta99testday] → [3.5Beta99testday][needs landing]
Sadly, looks like this will miss the 191 cutoff, though :(
Flags: wanted1.9.1.x+
Keywords: relnote
http://hg.mozilla.org/mozilla-central/rev/54be845ea550
Status: NEW → RESOLVED
Closed: 15 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Whiteboard: [3.5Beta99testday][needs landing] → [3.5Beta99testday][needs 191 approval]
Target Milestone: --- → mozilla1.9.2a1
Verified fixed on trunk with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090613 Minefield/3.6a1pre ID:20090613042114
Status: RESOLVED → VERIFIED
Comment on attachment 382085 [details] [diff] [review]
fix

a191=beltzner
Attachment #382085 - Flags: approval1.9.1? → approval1.9.1+
This patch appears to have gotten 1.9.1 approval but didn't land? Or was it fixed as part of some other bug and forgotten to be marked here?
blocking1.9.1: --- → ?
Comment on attachment 382085 [details] [diff] [review]
fix

Removing the stale 1.9.1 approval -- we're code-frozen for 1.9.1.3 and don't want this landing right at the moment. If this has not been fixed on that branch we could land it next time, or if you no longer want to land this patch then just remove the approval request.
Attachment #382085 - Flags: approval1.9.1+ → approval1.9.1.4?
blocking1.9.1: ? → .4+
Comment on attachment 382085 [details] [diff] [review]
fix

Approved for 1.9.1.4, a=dveditz for release-drivers

Please make sure this gets in before the code-freeze for the 1.9.1 branch.
Attachment #382085 - Flags: approval1.9.1.4? → approval1.9.1.4+
Keywords: checkin-needed
Whiteboard: [3.5Beta99testday][needs 191 approval] → [3.5Beta99testday][needs 191 landing]
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/ae0289c62cfb
Keywords: checkin-needed
Whiteboard: [3.5Beta99testday][needs 191 landing] → [3.5Beta99testday]
Verified fixed on 1.9.1 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4pre) Gecko/20090922 Shiretoko/3.5.4pre (.NET CLR 3.5.30729) ID:20090922041550
Keywords: verified1.9.1
Removing relnote
Keywords: relnote
Crash Signature: [@ oggplay_buffer_set_last_data ]
You need to log in before you can comment on or make changes to this bug.