Closed Bug 480063 Opened 12 years ago Closed 12 years ago

Infinite loop in oggplay_step_decoding

Categories

(Core :: Audio/Video, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: roc, Assigned: cajbir)

References

Details

(Keywords: fixed1.9.1)

Attachments

(1 file, 2 obsolete files)

I noticed that my Firefox build was spinning, so I attached gdb and found an Ogg decoder thread looping in oggplay_step_decoding while we were trying to shut it down.

Starting at read_more_data:

(gdb) p *me
$12 = {
  reader = 0x317d6a60, 
  oggz = 0xe856200, 
  decode_data = 0x10902180, 
  callback_info = 0x96f1700, 
  num_tracks = 1, 
  all_tracks_initialised = 1, 
  callback_period = 0, 
  callback = 0x3d9e94d <oggplay_buffer_callback>, 
  callback_user_ptr = 0x0, 
  target = 0, 
  active_tracks = 0, 
  buffer = 0x3150d9e0, 
  presentation_time = 0, 
  trash = 0x0, 
  shutdown = 1, 
  pt_update_valid = 0

(gdb) p *me->decode_data[0]
$9 = {
  serialno = 692291755, 
  content_type = 12, 
  content_type_name = 0x40514c6 "Unknown", 
  decoded_type = OGGPLAY_TYPE_UNKNOWN, 
  granuleperiod = 4294967296, 
  last_granulepos = 0, 
  offset = 0, 
  current_loc = -1, 
  active = 0, 
  final_granulepos = -1, 
  player = 0x2ab030b0, 
  data_list = 0x0, 
  end_of_data_list = 0x0, 
  untimed_data_list = 0x0, 
  stream_info = OGGPLAY_STREAM_UNINITIALISED, 
  preroll = -1008088133

remaining gets set to 1, which seems suspect.
        if (me->decode_data[i]->current_loc +
                     me->decode_data[i]->granuleperiod >= me->target + me->decode_data[i]->offset) {
Surely this logic shouldn't be operating on current_loc == -1? Should we just add a current_loc != -1 check here here?

After that, need_data gets set to 0. Then we break out of the while loop. oggplay_callback_info_prepare returns num_records == 1, info == null, and
r gets set to 0. And we take the goto read_more_data.
This is pretty bad since it makes the browser slow until you restart it but is otherwise mostly invisible.
Flags: blocking1.9.1+
I've hit this again, but this time not while shutting down. I've got two tracks --- Theora and Vorbis --- but current_loc is -1 in each track, and we're doing arithmetic on it while we compute 'remaining'.
Bug 476089 is related, or maybe even the same bug.
I think this blocks my cache work in bug 475441. With the patches in that bug applied, we hang in a situation like this every time I try to play an HTTP video.
Blocks: 475441
Bug 476089 no longer reproduces on trunk for me. I'll close this as WFM too until/unless I can reproduce it again.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
OK, this happened to me again in a build with revision 9dc4b85b3470, April 15.

gdb stack trace:

(gdb) where
#0  0x942c0549 in szone_free ()
#1  0x942c02cd in free ()
#2  0x03ccd38b in oggplay_callback_info_prepare (me=0x2ab681f0, info=0xb1c2bbf0) at /Users/roc/mozilla-checkin/media/liboggplay/src/liboggplay/oggplay_callback_info.c:315
#3  0x03ccad2e in oggplay_step_decoding (me=0x2ab681f0) at /Users/roc/mozilla-checkin/media/liboggplay/src/liboggplay/oggplay.c:696
#4  0x03cba196 in nsOggDecodeStateMachine::DecodeFrame (this=0x26250520) at /Users/roc/mozilla-checkin/content/media/video/src/nsOggDecoder.cpp:538
#5  0x03cbe293 in nsOggDecodeStateMachine::Run (this=0x26250520) at /Users/roc/mozilla-checkin/content/media/video/src/nsOggDecoder.cpp:971
#6  0x00549cda in nsThread::ProcessNextEvent (this=0x2e284fc0, mayWait=1, result=0xb1c2becc) at /Users/roc/mozilla-checkin/xpcom/threads/nsThread.cpp:510
#7  0x004d2674 in NS_ProcessNextEvent_P (thread=0x2e284fc0, mayWait=1) at nsThreadUtils.cpp:230
#8  0x00549ee9 in nsThread::ThreadFunc (arg=0x2e284fc0) at /Users/roc/mozilla-checkin/xpcom/threads/nsThread.cpp:254
#9  0x00a963c5 in _pt_root (arg=0x2e75aa30) at /Users/roc/mozilla-checkin/nsprpub/pr/src/pthreads/ptthread.c:228
#10 0x942eb095 in _pthread_start ()
#11 0x942eaf52 in thread_start ()

Here's me stepping through oggplay_step_decode showing one iteration of the infinite loop:

(gdb) n
609	    if (me->active_tracks == 0) {
(gdb) n
610	      int remaining = 0;
(gdb) n
611	      for (i = 0; i < me->num_tracks; i++) {
(gdb) n
612	        if (me->decode_data[i]->current_loc +
(gdb) n
611	      for (i = 0; i < me->num_tracks; i++) {
(gdb) n
612	        if (me->decode_data[i]->current_loc +
(gdb) n
614	          remaining++;
(gdb) n
611	      for (i = 0; i < me->num_tracks; i++) {
(gdb) n
612	        if (me->decode_data[i]->current_loc +
(gdb) n
611	      for (i = 0; i < me->num_tracks; i++) {
(gdb) n
617	      if (remaining == 0) {
(gdb) n
626	    need_data = 0;
(gdb) n
627	    for (i = 0; i < me->num_tracks; i++) {
(gdb) n
628	      if (me->decode_data[i]->active == 0)
(gdb) n
627	    for (i = 0; i < me->num_tracks; i++) {
(gdb) n
628	      if (me->decode_data[i]->active == 0)
(gdb) n
627	    for (i = 0; i < me->num_tracks; i++) {
(gdb) n
628	      if (me->decode_data[i]->active == 0)
(gdb) n
627	    for (i = 0; i < me->num_tracks; i++) {
(gdb) n
646	    if (!need_data) {
(gdb) n
696	  num_records = oggplay_callback_info_prepare (me, &info);
(gdb) n
697	  if (info != NULL) {
(gdb) n
701	    r = 0;
(gdb) n
707	  for (i = 0; i < me->num_tracks; i++) {
(gdb) n
708	    oggplay_data_clean_list (me->decode_data[i]);
(gdb) n
707	  for (i = 0; i < me->num_tracks; i++) {
(gdb) n
708	    oggplay_data_clean_list (me->decode_data[i]);
(gdb) n
707	  for (i = 0; i < me->num_tracks; i++) {
(gdb) n
708	    oggplay_data_clean_list (me->decode_data[i]);
(gdb) n
707	  for (i = 0; i < me->num_tracks; i++) {
(gdb) n
711	  if (info == NULL) {
(gdb) n
609	    if (me->active_tracks == 0) {
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Does the patch here help? It fixes an infinite loop I found while working on the HD performance bug.

https://trac.annodex.net/ticket/466
I forgot to check me->shutdown in that last case, but in the other cases it was true, so I strongly expect your patch to fix this. Thanks!
Whiteboard: [Depends on external bug/fix: see comment 8]
When might we pick up this fix? This seems to be causing bug 471085, and disabling the only test for videocontrols to prevent the random oranges makes me sad.
Blocks: 474540
Attached patch update to liboggz git head (obsolete) — Splinter Review
The liboggplay fix requires an update to liboggz. This patch is that update. There are issues with the current liboggplay git head which I'm working with the maintainer to resolve and then I'll attach the patch for that.
Attached patch liboggz and liboggplay patch (obsolete) — Splinter Review
waiting for tryserver builds to see if this works on other platforms.
Attachment #376375 - Attachment is obsolete: true
There was a build error on Mac and ref test failures on Linux. I've not updated liboggplay to the latest release, instead cherry picked the relevant patch to fix this bug until I work out the cause of the ref test failures.
Attachment #376646 - Attachment is obsolete: true
Pushed to mozilla-central:
http://hg.mozilla.org/mozilla-central/rev/ed38105c9c2a
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Whiteboard: [Depends on external bug/fix: see comment 8] → [baking for 1.9.1]
Does this show up as a video that stops playing and then the browser is spinning, and other than that the browser works?  If so I've seen this as well on the 1.9.1 branch.  I'll keep an eye out for that symptom post this being checked into 1.9.1 as well.
This shows up as the video stops playing, the browser takes up 100% cpu, but otherwise works normally.
OK - I'll keep an eye out on 1.9.1 once this lands.
Assignee: nobody → chris.double
Blocks: 483912
Target Milestone: --- → mozilla1.9.2a1
You need to log in before you can comment on or make changes to this bug.