Closed Bug 466045 Opened 11 years ago Closed 11 years ago

Intermittent failure of video seek mochitests

Categories

(Core :: Audio/Video, defect)

x86
All
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla1.9.1b3

People

(Reporter: cpearce, Assigned: cpearce)

References

Details

(Keywords: fixed1.9.1, intermittent-failure)

Attachments

(3 files, 2 obsolete files)

Video seek mochitests (is() ok()) fail intermittently.

STR:
1. Run mochitest for content/media/video/test/test_seek1.html
2. While all tests pass, reload the page.
3. Eventually mochitests will fail. For test_seek1, typically this is on line 60: |ok(t >= 2 && t <= 3, "Video currentTime should be around 2: " + t);| t==0 in failure cases.

Note the tests still run to completion, they've just failed.

We're seeing this on all platforms.

See also:
Bug 465921 -  httpd.js fails to deliver after about 35 reloads of test_seek1.html
Bug 464302 -  test_seek* tests timing out during mochitest
Steps to reproduce. Works on WinXP through LigHTTPd and MacOS and Vista through mochitest server.
(We need to use LigHTTPd on WinXP because httpd.js on WinXP suffers from bug 465921).

On Windows XP:
1. Install LightHTTPd for Windows: http://en.wlmp-project.net/downloads.php?cat=lighty
2. Setup the server to serve seek.ogg on localhost @ http://localhost/seek.ogg
3. Run mozilla-central Firefox build.
4. Load seek_hang.html
5. Wait while test runs, when "Seeks Remaining" counts down to 240, liboggplay will be caught in a "live lock".

On MacOS:
1. Edit seek_hang.html to have the video's src="http://localhost:8888/tests/content/media/video/test/seek.ogg".
2. Run mochitest server (without --autorun).
3. Load seek_hang.html in the mochitest browser instance.
4. Wait while test runs, when "Seeks Remaining" counts down to 220, liboggplay will be caught in a "live lock".

liboggplay gets caught in oggplay_set_decoding(). Here's a typical stack:

gklayout.dll!oggplay_callback_info_prepare(_OggPlay * me=0x06547e80, _OggPlayCallbackInfo * * * info=0x0748fdd8)  Line 278 + 0xc bytes	C
gklayout.dll!oggplay_step_decoding(_OggPlay * me=0x06547e80)  Line 538 + 0xd bytes	C
gklayout.dll!nsOggDecodeStateMachine::DecodeFrame()  Line 495 + 0xc bytes	C++
gklayout.dll!nsOggDecodeStateMachine::Run()  Line 976 + 0x8 bytes	C++
xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=0x00000001, int * result=0x0748ff08)  Line 511	C++
xpcom_core.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x06550e18, int mayWait=0x00000001)  Line 227 + 0x16 bytes	C++
xpcom_core.dll!nsThread::ThreadFunc(void * arg=0x06550e18)  Line 254 + 0xb bytes	C++
nspr4.dll!_PR_NativeRunThread(void * arg=0x06551018)  Line 436 + 0xf bytes	C
nspr4.dll!pr_root(void * arg=0x06551018)  Line 122 + 0xf bytes	C
msvcr80d.dll!_callthreadstartex()  Line 348 + 0xf bytes	C
msvcr80d.dll!_threadstartex(void * ptd=0x06551940)  Line 331	C
kernel32.dll!_BaseThreadStart@8()  + 0x37 bytes	

The call to oggplay_callback_info_prepare() @ oggplay.c:538 sets info to NULL, which triggers the NULL test at oggplay.c:553 to cause us to jump back to :read_more_data. But the "if (me->active_tracks == 0)" block @ oggplay.c:453 triggers us not to exit, because "we might need to return some data left over at the end of a once-active track that has had all of its data processed.". The need_data check decides we don't need to read data, and we break and then end up at the oggplay_callback_info_prepare() @ oggplay.c:538 and we repeat the loop.

Note the decoder state machine is in seeking state.
(In reply to comment #1)
> Created an attachment (id=349861) [details]
> Testcase for WinXP and MacOS

This testcase is set to read from mochitest BTW, to read from your own http server, change the src of the video element.
I can reproduce this on Vista when serving video from httpd.js, but it's slower than WinXP + LigHTTPd.
Chris Double, do you think we're doing anything that causes this? Or is it a bug in liboggplay?
Running this testcase, particularly in multiple tabs simultaneously, will cause the livelock in liboggplay. Works for me on Vista. Based on earlier testcases. Loads video file off disk, so doesn't require HTTP server.
Attached file Testcase 3
This is pretty much testcase 2, but it actually causes the same problem the bug is filed for! I'll file another bug for the liboggplay livelock/infinite loop.

The bug only happens in my testcase when I've got a slow version of my log() function in (testcase 2 had a faster one). My original log() function gets slower and slower as time goes on (because it's going "log_div.innerHTML+= log_message" a lot, and log_div.innerHTML gets very very big). Eventually it takes so long that for whatever reason, it causes the currentTime to be reported incorrectly. So in summary, the problem is that long running event handlers can cause problems with video stuff which is reliant on events. This may or may not be the same problem that's causing the intermittent failure.

To reproduce, run the testcase until it counts down to from 10000 to about 9920. It'll fail the test, then crash.
Attachment #351837 - Attachment is obsolete: true
Ok, there are two problems here:

1. When liboggplay reads to the end of a stream, it deactivates the track for which it was reading. This is bad in the following situation:
(In reply to comment #7)

Grr TAB+space...

---

Ok, there are two problems here:
 
1. When liboggplay reads to the end of a stream, it deactivates the track for which it was reading. This is bad in the following situation: If we start to decode data for a video, and then the main thread goes off into a slow JS event handler, and a seek is initiated, the seek can come in *just* before the decode thread is finished, when all the data on the stream has been read, but not all of it has been decoded (an "overhang"). This means the decode state machine won't have transitioned to SHUTDOWN state, so the state machine and ogg player won't be reinstantiated on seek. But when the end of stream is reached, the tracks were deactivated. When the seek completes, it tries to read the next frame to get the time stamp, but the track's not active (because although ogg resets the stream's read cursor on seek, it doesn't reset the active status), so reads fail, no frames will be decoded, so we can't update the current time to reflect the new seek position, and all hell breaks loose.

The fix is to just reactivate tracks immediately after we seek, just like we do in LoadOggHeaders(). That seems to work a treat.

Also...

2. Seek events can get lost if long running JS event handlers are present:
* If while you're in a long-running event handler which is going to fire a seek, and if while you're in the event handler, but before you've triggered the seek, the decode thread thread decodes and plays the last frame, it will queue playbackEnded and that will run before the seek event it actioned, and the seek event ends up getting lost somehow. I'll file another bug for this.
Attached patch Patch v1 (obsolete) — Splinter Review
Patch v1 - reactivate tracks on seek, and some assertions to make sure things are sane after seek.
Attachment #352042 - Flags: review?(chris.double)
+          NS_ASSERTION(frame->mDecodedFrameTime >= mSeekTime &&
+                       frame->mDecodedFrameTime <= (mSeekTime+1.0),
+                       "Time not about right after seek.");

This could fail on some valid video's depending on how the frames are structured. For example, a single frame, no audio, displayed for more than one second. I think these assertions should be removed (or at least remove the <= part.

+        {
+          float time = mDecoder->GetCurrentTime();
+          NS_ASSERTION(time >= mSeekTime && time <= (mSeekTime+1.0),
+            "Time not about right after seek.");
+        }

Ditto. Also mDecoder->GetCurrentTime() accesses mDecoder->mCurrentTime which is only allowed to be read/written on the main thread.
I removed the time assertions after seek. I left in the assertion that we should have a frame after seek, I assume that's ok? Even if we seek right to the end of media, we should still display the last frame right?
Attachment #352042 - Attachment is obsolete: true
Attachment #352190 - Flags: review?(chris.double)
Attachment #352042 - Flags: review?(chris.double)
Comment on attachment 352190 [details] [diff] [review]
Patch v2
[Checkin: Comment 14 & 15]

As far as I know the assertion is fine. Although liboggplay documentation doesn't say one way or the other hence the guard.
Attachment #352190 - Flags: review?(chris.double) → review+
Attachment #352190 - Flags: superreview?(roc)
Attachment #352190 - Flags: superreview?(roc) → superreview+
Flags: blocking1.9.1?
May I have blocking on this? It solves some intermittent test failures in video.
Flags: blocking1.9.1? → blocking1.9.1+
Keywords: checkin-needed
Whiteboard: [needs landing]
Comment on attachment 352190 [details] [diff] [review]
Patch v2
[Checkin: Comment 14 & 15]

http://hg.mozilla.org/mozilla-central/rev/28f45bf33ba9
Attachment #352190 - Attachment description: Patch v2 → Patch v2 [Checkin: Comment 14]
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing] → [c-n: baking for 1.9.1] [needs landing]
Target Milestone: --- → mozilla1.9.2a1
Comment on attachment 352190 [details] [diff] [review]
Patch v2
[Checkin: Comment 14 & 15]

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/8ace5327c486
Attachment #352190 - Attachment description: Patch v2 [Checkin: Comment 14] → Patch v2 [Checkin: Comment 14 & 15]
Whiteboard: [c-n: baking for 1.9.1] [needs landing]
Target Milestone: mozilla1.9.2a1 → mozilla1.9.1b3
Whiteboard: [orange]
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.