Closed Bug 1120741 Opened 9 years ago Closed 9 years ago

Intermittent test_eme_playback.html | Test timed out

Categories

(Core :: Audio/Video, defect, P2)

x86_64
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox38 --- wontfix
firefox38.0.5 --- wontfix
firefox39 --- wontfix
firefox40 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- disabled

People

(Reporter: RyanVM, Assigned: cpearce)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

14:28:50 INFO - 168 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [14:23:25.759] short-cenc.mp4-0 got ended event
14:28:50 INFO - 169 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [14:23:25.759] short-cenc.mp4-0 encrypted event should have fired
14:28:50 INFO - 170 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [14:23:25.759] short-cenc.mp4-0 playing event should have fired
14:28:50 INFO - 171 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [14:23:25.759] short-cenc.mp4-0 Duration of video should be corrrect
14:28:50 INFO - 172 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [14:23:25.759] short-cenc.mp4-0 Current time should be same as duration
14:28:50 INFO - 173 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [14:23:25.759] short-cenc.mp4-0 should have 1 session
14:28:50 INFO - 174 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [14:23:25.759] short-cenc.mp4-0 should have received at least one keychange event
14:28:50 INFO - 175 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [14:23:25.760] short-cenc.mp4-0 key with id 7e571d017e571d017e571d017e571d01 was usable as expected
14:28:50 INFO - 176 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [14:23:25.760] short-cenc.mp4-0 key with id 7e571d027e571d027e571d027e571d02 was usable as expected
14:28:50 INFO - 177 INFO [finished short-cenc.mp4-0] remaining= gizmo-frag-cencinit.mp4-1
14:28:50 INFO - 178 INFO TEST-PASS | dom/media/test/test_eme_playback.html | [finished short-cenc.mp4-0] Length of array should match number of running tests
14:28:50 INFO - 179 INFO TEST-FAIL | dom/media/test/test_eme_playback.html | The author of the test has indicated that flaky timeouts are expected. Reason: untriaged
14:28:50 INFO - 180 INFO [14:23:25.873] gizmo-frag-cencinit.mp4-1 progress
14:28:50 INFO - 181 INFO [14:23:28.333] gizmo-frag-cencinit.mp4-1 stalled
14:28:50 INFO - 182 INFO Watchdog remaining tests= gizmo-frag-cencinit.mp4-1
14:28:50 INFO - 183 INFO TEST-UNEXPECTED-FAIL | dom/media/test/test_eme_playback.html | Test timed out. - expected PASS
14:28:50 INFO - 184 INFO MEMORY STAT vsize after test: 3558993920
14:28:50 INFO - 185 INFO MEMORY STAT residentFast after test: 286715904
14:28:50 INFO - 186 INFO MEMORY STAT heapAllocated after test: 71530864
14:28:50 INFO - 187 INFO TEST-OK | dom/media/test/test_eme_playback.html | took 325544ms
I believe the modifications to MediaSourceReader that are part of bug 1096089, fixed this issue.

Under some circumstances, it was possible for the MDSM to treats data not even added via appendBuffer as EOF rather than WAITING_FOR_DATA and as such playback never completed (and so ended event was never fired).

I will keep this one opened for a few days and close it if it the issue doesn't occur anymore
The huge spike in this bug goes back to jya's push from earlier today.
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=5a745cf431cb
Flags: needinfo?(jyavenard)
I can't explain why my changes has any impact on the EME stuff; the new code path isn't hit.
just a timing issue I believe.
edwin or cpearce would be the one to know
Flags: needinfo?(jyavenard)
Flags: needinfo?(edwin)
Flags: needinfo?(cpearce)
That link to inbound showed that my push caused no failures at all !
You're making that conclusion off a single result on a failure that happens *intermittently* ?
I don't understand jya's changes, but that test can only fail if the "ended" event doesn't fire, and I see from the logs that playback is stalling. My guess would be jya's changes tickled some timing issue that causes this failure to occur. We've had other tests that fail due to "ended" not firing.

This failure also isn't limited to MacOSX.
And also it appears all fragments are being appended and ending the stream. So we're not failing to hit ended state because of the XHR failing.
Flags: needinfo?(cpearce)
Given the frequency of this failure, I intend to disable the test on Friday if progress isn't made fixing it.
Flags: needinfo?(cpearce)
I found a bunch of crashes in gmp-clearkey in Bug 1136984 that will help here. Patch up for review there. I was also able to reproduce this failure on Anthony Jone's Win7 machine. I will try to debug this tomorrow.
I can repro the failure without the patches in bug 1136986 applied, and not with those patches applied. So I think bug 1136986 will fix this.
Blocks: EME
Flags: needinfo?(edwin)
Flags: needinfo?(cpearce)
Pushing some logging at Ryan's request to see if we can make headway: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b03eef62e919
(In reply to Bobby Holley (:bholley) from comment #270)
> Pushing some logging at Ryan's request to see if we can make headway:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b03eef62e919

It's certainly not perma-orange with the logging, but I did finally manage to get a failure:

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bobbyholley@gmail.com-b53d9e809886/try-macosx64/try_snowleopard_test-mochitest-3-bm108-tests1-macosx-build479.txt.gz
Flags: needinfo?(edwin)
(In reply to Bobby Holley (:bholley) from comment #302)
> (In reply to Bobby Holley (:bholley) from comment #270)
> > Pushing some logging at Ryan's request to see if we can make headway:
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=b03eef62e919
> 
> It's certainly not perma-orange with the logging, but I did finally manage
> to get a failure:
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bobbyholley@gmail.
> com-b53d9e809886/try-macosx64/try_snowleopard_test-mochitest-3-bm108-tests1-
> macosx-build479.txt.gz

10:44:01     INFO -  346 INFO bipbop-cenc-videoinit.mp4-3 error fetching http://test2.mochi.test:8888/tests/dom/media/test/allowed.sjs?bipbop-cenc-audio2.m4s
10:44:01     INFO -  347 INFO bipbop-cenc-videoinit.mp4-3 error fetching http://test2.mochi.test:8888/tests/dom/media/test/allowed.sjs?bipbop-cenc-video2.m4s

that could explain the stall !
(In reply to Jean-Yves Avenard [:jya] from comment #307)
> (In reply to Bobby Holley (:bholley) from comment #302)
> > (In reply to Bobby Holley (:bholley) from comment #270)
> > > Pushing some logging at Ryan's request to see if we can make headway:
> > > https://treeherder.mozilla.org/#/jobs?repo=try&revision=b03eef62e919
> > 
> > It's certainly not perma-orange with the logging, but I did finally manage
> > to get a failure:
> > 
> > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bobbyholley@gmail.
> > com-b53d9e809886/try-macosx64/try_snowleopard_test-mochitest-3-bm108-tests1-
> > macosx-build479.txt.gz
> 
> 10:44:01     INFO -  346 INFO bipbop-cenc-videoinit.mp4-3 error fetching
> http://test2.mochi.test:8888/tests/dom/media/test/allowed.sjs?bipbop-cenc-
> audio2.m4s
> 10:44:01     INFO -  347 INFO bipbop-cenc-videoinit.mp4-3 error fetching
> http://test2.mochi.test:8888/tests/dom/media/test/allowed.sjs?bipbop-cenc-
> video2.m4s
> 
> that could explain the stall !

It could be, though know that, from the timestamps, this happens _after_ the test times out. I guess it's also possible that the network is just stalling the whole time? I didn't see this message in the failures on m-c though, so this failure may also just be a fluke and not the actual problem.
(In reply to Bobby Holley (:bholley) from comment #311)

> It could be, though know that, from the timestamps, this happens _after_ the
> test times out. I guess it's also possible that the network is just stalling
> the whole time? I didn't see this message in the failures on m-c though, so
> this failure may also just be a fluke and not the actual problem.

I didn't see any of those failures in the usual timeout runs (like https://treeherder.mozilla.org/#/jobs?repo=try&revision=2594583fc312)
(In reply to Treeherder Robot from comment #598)
> log:
> https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-
> inbound&job_id=7909429
> repository: mozilla-inbound
> start_time: 2015-03-22T11:14:39
> who: philringnalda[at]gmail[dot]com
> machine: t-snow-r4-0083
> buildname: Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound debug test
> mochitest-3
> revision: c3638d994edd
> 
> 208 INFO TEST-UNEXPECTED-FAIL | dom/media/test/test_eme_playback.html | Test
> timed out. - expected PASS

bipbop-cenc-videoinit.mp4-3 video: addSourceBuffer(video/mp4; codecs="avc1.64000d")
bipbop-cenc-videoinit.mp4-3 video: addNextFragment() fetching next fragment http://test2.mochi.test:8888/tests/dom/media/test/allowed.sjs?bipbop-cenc-videoinit.mp4
bipbop-cenc-videoinit.mp4-3 video: fetch of http://test2.mochi.test:8888/tests/dom/media/test/allowed.sjs?bipbop-cenc-videoinit.mp4 complete, appending

It looks like 'updateend' is not fired and the video track can never be ended.

ni jya who might have some clues.
Flags: needinfo?(jyavenard)
At this stage it is still a bit of a mystery. Bobby's comment 302 indicates a failure fetching the video segments (segment 1 and 2).

Also, in all those logs, you find no:
addNextFragment() fetching next fragment bipbop-cenc-video2.m4s
nor addNextFragment() fetching next fragment bipbop-cenc-video3.m4s

so the video can never completes.

So now is this an issue with the JS that doesn't fetch the data it should, or the local http server failing... is up for debate
Flags: needinfo?(jyavenard)
Trying something on https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f902c54a371
1. Bug 1134434 and bug 1144409 move the 'encrypted' event handling in a separate thread instead of blocking appendBuffer and preventing updateend, which can block some EME tests.
2. Bug 1142379 and bug 1138294 introduce more and better tests relying on the proper encrypted event.
3. A potential patch https://hg.mozilla.org/try/rev/94a3c4865626 adds better handling of XHR issues with a short timeout (so we can hopefully see the issues sooner instead of waiting 5 minutes), better loading error-handling in tests, and a sprinkle of SimpleTest.requestCompleteLog() so we don't risk losing valuable logging.

As of this comment, 50+ test runs have passed, but I've been this "lucky" before, and the platform with the most issues (Mac 10.6) hasn't built yet, so I'll keep trying...
Priority: -- → P1
Priority: P1 → P2
I declare victory. I think this was fixed by bug 1154133.
Status: NEW → RESOLVED
Closed: 9 years ago
Depends on: 1154133
Flags: needinfo?(edwin)
Resolution: --- → FIXED
Assignee: nobody → cpearce
Target Milestone: --- → mozilla40
Disabled on esr38 (EME is disabled on that branch anyway, so meh). Hopefully we can still backport the fix to Fx39, though.

https://hg.mozilla.org/releases/mozilla-esr38/rev/1ab9708d0ed7
HTTP404 error in the last one. Sounds more like an infrastructure issue.
You need to log in before you can comment on or make changes to this bug.