Closed Bug 615134 Opened 14 years ago Closed 14 years ago

Intermittent timeout of content/media/test/test_seek.html on Linux64

Categories

(Core :: Audio/Video, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: cpearce, Assigned: kinetik)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

test_seek failed on Linux64, 1 test-run after the landing of bug 610570. The failure was in seek5.js on bug516323.indexed.ogv, which relies on playing the media through after a seek and ending the test when we receive an ended event. Since bug 610570 changed the buffering logic, it's *possible* we were stuck in buffering during playback after seek, and so never got the ended event. Or it could be something else. This is the first time this test has failed in a while, so I'm filing a new bug for it. http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1290981667.1290984079.21600.gz Build Log (Brief) Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2010/11/28 14:01:07 SEEK-TEST: Finished bug516323.indexed.ogv seek test 3 57892 INFO TEST-PASS | /tests/content/media/test/test_seek.html | bug516323.indexed.ogv seek test 4: Video currentTime should be around 1.052: 1.0520000457763672 57893 INFO TEST-PASS | /tests/content/media/test/test_seek.html | bug516323.indexed.ogv seek test 4: Second seek on target: 1.0520000457763672 SEEK-TEST: Finished bug516323.indexed.ogv seek test 4 SEEK-TEST: Started bug516323.indexed.ogv seek test 5 SEEK-TEST: Started bug516323.indexed.ogv seek test 6 WARNING: NS_ENSURE_TRUE(mIsPending) failed: file /builds/slave/mozilla-central-linux64-debug/build/netwerk/protocol/http/nsHttpChannel.cpp, line 3544 WARNING: NS_ENSURE_TRUE(mIsPending) failed: file /builds/slave/mozilla-central-linux64-debug/build/netwerk/protocol/http/nsHttpChannel.cpp, line 3544 WARNING: NS_ENSURE_TRUE(mSuspendCount > 0) failed: file /builds/slave/mozilla-central-linux64-debug/build/netwerk/protocol/http/nsHttpChannel.cpp, line 3561 57894 INFO TEST-PASS | /tests/content/media/test/test_seek.html | bug516323.indexed.ogv seek test 5: Video currentTime should be around 2.104: 2.1040000915527344 57895 INFO TEST-PASS | /tests/content/media/test/test_seek.html | bug516323.indexed.ogv seek test 6: Seek 1 57896 INFO TEST-PASS | /tests/content/media/test/test_seek.html | bug516323.indexed.ogv seek test 6: Seek 2 57897 INFO TEST-PASS | /tests/content/media/test/test_seek.html | bug516323.indexed.ogv seek test 6: Seek 3 Seek test 6 time sum:23.735999822616577 SEEK-TEST: Finished bug516323.indexed.ogv seek test 6 57898 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_seek.html | Test timed out.
I had nthomas spool up 6 extra mochitest1/5 runs on my push, and 1 of the 6 failed with this. So It's failed on Linux x64 in 2 of 8 runs so far. It hasn't failed on any other platforms yet.
Hardware: x86 → x86_64
Assignee: nobody → chris
Two more failures, reported as bug 503623, but actually this bug. dao%mozilla.com http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1291036660.1291038945.15767.gz Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2010/11/29 05:17:40 s: talos-r3-fed64-034 57911 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_seek.html | Test timed out. Comment 283 TinderboxPushlog Robot 2010-11-29 07:58:55 PST ehsan%mozilla.com http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1291043777.1291046140.21083.gz Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2010/11/29 07:16:17 s: talos-r3-fed64-039 57886 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_seek.html | Test timed out.
Summary: Intermittent failure of test_seek.html on Linux64 → Intermittent timeout of content/media/test/test_seek.html on Linux64
With the ALSA backend, there's a data start threshold that must be reached before stream playback begins. It defaults to ~500ms on my machine (not 375ms as I guessed in bug 615452). It looks like what's happening here is that after seeking we write a small amount of audio, then enter buffering state, then eventually resume writing audio. Between writing the first bit of audio and the second, the stream underruns and, post recovery, the amount of audio written before waiting for the clock to advance isn't sufficient to start the stream playing. It's necessary to write sufficient audio to reach the start threshold when the stream is first opened, and any time the stream suffers an underrun for any reason. Note that the current design doesn't provide a way to signal underruns back to the playback engine. This timeout can be avoided by changing AMPLE_AUDIO_MS back to 2000 (it was changed to 1000 in bug 610570, which revealed this problem). That's not a guaranteed fix, and won't help in any case where we need to recover from an underrun. An ideal short-term solution would be to set the start threshold to 1, which would provide similar behaviour to the Win32 sydneyaudio backend. It's possible to reduce the start threshold, but the minimum it can be set to (on my machine, at least) is ~125ms. Anything lower results in cascading underruns, as the start threshold must be not smaller than the stream's period. This is also not a guaranteed fix, but it makes recovery more likely in any case where it's necessary. With this change, I can run 5 tabs of test_seek in parallel for an hour without a hang, where they would previously hang within minutes. I'll need to test this change on the N900 to make sure it doesn't regress bug 607200. A further possible improvement would be to expose the start threshold from sydneyaudio, and modify the playback engines to write at least that much audio data before expecting the stream to begin playback. The post FF4 audio rewrite can deal with this correctly by design, but that doesn't solve the problem now.
Attached patch patch v0Splinter Review
This ran 5 tabs of test_seek running continuously overnight without hanging. It's not a complete fix, but a complete fix is difficult and will have to wait for the post FF4 audio rewrite. This patch also removes the underrun recovery from get_position, because it's no longer necessary since we can return last_pos (and would after recovery anyway). Since this function is called without the audio monitor held and called from the playback thread, recovering from underrun here is dangerous because it can race with whatever the audio thread is doing. Will request review once I've tested on Maemo5/N900.
Assignee: chris → kinetik
Status: NEW → ASSIGNED
Comment on attachment 494530 [details] [diff] [review] patch v0 Tested on the N900 and with direct ALSA (no PA) on my desktop, works fine. Mochitests pass on desktop.
Attachment #494530 - Flags: review?(chris)
Attachment #494530 - Flags: approval2.0?
Attachment #494530 - Flags: review?(chris) → review+
Whiteboard: [orange] → [orange][needs landing]
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [orange][needs landing] → [orange]
Status: RESOLVED → VERIFIED
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: