Closed Bug 786539 Opened 13 years ago Closed 12 years ago

Intermittent test_timeupdate_small_files.html | Test timed out

Categories

(Core :: Audio/Video, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23
Tracking Status
firefox18 --- disabled
firefox19 --- disabled
firefox21 --- disabled
firefox22 + fixed

People

(Reporter: philor, Assigned: roc)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #687972 +++ https://tbpl.mozilla.org/php/getParsedLog.php?id=14792926&tree=Firefox Rev3 WINNT 6.1 mozilla-central opt test mochitests-1/5 on 2012-08-28 18:53:57 PDT for push 0b4bc3760d51 slave: talos-r3-w7-024 130056 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/detodos.opus got timeupdate nsBuiltinDecoderStateMachine::RunStateMachine queuing nsBuiltinDecoder::PlaybackEnded 130057 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | detodos.opus - shouldn't get timeupdate after ended 130058 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/detodos.opus got timeupdate 130059 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/detodos.opus got pause 130060 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | detodos.opus - shouldn't get timeupdate after ended 130061 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/detodos.opus got timeupdate 130062 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | detodos.opus - should see at least one timeupdate: 2.9135 130063 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/detodos.opus got ended 130064 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [finished detodos.opus-4] Length of array should match number of running tests - 1 should equal 1 130065 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_timeupdate_small_files.html | Test timed out. (screenshot) 130066 INFO TEST-END | /tests/content/media/test/test_timeupdate_small_files.html | finished in 304592ms 130067 INFO TEST-START | /tests/content/media/test/test_too_many_elements.html nsBuiltinDecoderStateMachine::RunStateMachine queuing nsBuiltinDecoder::PlaybackEnded
Blocks: 761917
Bug 761917 was Linux only, this is a different failure. There are probably a bunch of bugs like this; the existing bugs were a mess of different similar looking failures. I closed all of them ones that bug 761917 fixed a Linux variant of the failure on, so now we'll get new bugs for each of the existing failures masked previously masked by that problem.
No longer blocks: 761917
Toporange with several hundred failures not telling us anything new. Disabled for now on Windows, as part of the drive to reduce our OrangeFactor: https://hg.mozilla.org/integration/mozilla-inbound/rev/47ae2d6ba237
Whiteboard: [orange] → [orange][test disabled on Windows][leave open]
Whiteboard: [orange][test disabled on Windows][leave open] → [test disabled on Windows][leave open]
Any idea why this has spiked so significantly on OSX over the last day?
Flags: needinfo?(paul)
Flags: needinfo?(cpearce)
Paul, I'm wondering if bug 861936 is to blame for the spike.
Paul's suggestion seems the most likely.
Flags: needinfo?(cpearce)
Going back on inbound, bug 834835 appears to be responsible for the recent OSX spike.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #395) > Going back on inbound, bug 834835 appears to be responsible for the recent > OSX spike. So I had a backout for this in my patch queue which I inadvertently pushed along with some other patches. No failures on OSX since (even with numerous retriggers).
cpearce.
Assignee: nobody → cpearce
Attached patch Patch v1: Fix (obsolete) — Splinter Review
I have been able to reproduce this test reliably locally. The problem seems to be that we're firing "canplay" but not autoplaying under some conditions. test_timeupdate relies on autoplay working. I think the problem is that HTMLMediaElement::CanActivateAutoplay() is returning false because the resource's download is "suspended", and so we never reach HAVE_ENOUGH_DATA readyState and so never start playing. I think this can happen when the decoder/resource is cloned. This patch also re-enables the test on Windows. Looks greenish, modulo known orange: https://tbpl.mozilla.org/?tree=Try&pusher=cpearce@mozilla.com
Attachment #740522 - Flags: review?(roc)
A cloned resource should reach HAVE_ENOUGH_DATA just like any other resource ... cloning should not affect the observable behavior.
My debugging seems to show that HTMLMediaElement::UpdateReadyStateForData with a cloned decoder is going wrong. We get to if (stats.mTotalBytes < 0 ? stats.mDownloadRateReliable : stats.mTotalBytes == stats.mDownloadPosition || mDecoder->CanPlayThrough()) which stays false. stats.mTotalBytes is correct, but stats.mDownloadPosition is short of mTotalBytes. Also, mDecoder->CanPlayThrough() returns false.
More information about what's going on: 1) An HTTP resource is opened and we read from it for a while. 2) For some reason (cache pressure probably) we suspend reading from the channel temporarily. 3) Then we resume reading. This causes MediaResource::mReopenOnError to be set, since the server might have timed out our connection while we were suspended and we want to silently reopen in that case. 4) We reach the end of the stream and get OnStopRequest from the HTTP stack. 5) This comment kicks in: // Note that aStatus might have succeeded --- this might be a normal close // --- even in situations where the server cut us off because we were // suspended. So we need to "reopen on error" in that case too. and we seek to the end of the stream and try to read some more, instead of sending NotifyDataEnded as you'd expect. 6) Meanwhile, we've cloned the resource for another media element. 7) Some of the underlying data has been evicted from the cache. We read some data for the new element, but then the cache is full, so we suspend reading. 8) We call MediaCacheStream::AreAllStreamsForResourceSuspended to see if we should just fire autoplay now to avoid the "never autoplay because there isn't room in the cache" problem. 9) That method sees that the original resource is still open and able to read from its HTTP channel, so it returns false. 10) We're deadlocked. One possible fix here is to make AreAllStreamsForResourceSuspended a bit smarter so streams that are reading ahead of the data we need are ignored ... after all, they're not going to provide the data we need. This does fix the bug. However, I also want to look into what happens to the seek triggered in step 5. It seems to me that seek should eventually result in another OnStopRequest, this time with mReopenOnError false, but it doesn't seem to.
Attachment #741164 - Flags: review?(cpearce) → review+
Comment on attachment 741165 [details] [diff] [review] Part 2: AreAllStreamsForResourceSuspended should ignore streams that can't read the resource data we need Review of attachment 741165 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/media/MediaCache.cpp @@ +1829,5 @@ > return mIsTransportSeekable; > } > > bool > MediaCacheStream::AreAllStreamsForResourceSuspended(MediaResource** aActiveStream) AFAICT aActiveStream isn't used by any of the caller of this function, so we should remove aActiveStream sometime.
Attachment #741165 - Flags: review?(cpearce) → review+
Attachment #741166 - Flags: review?(cpearce) → review+
Tryserver showed an intermittent failure in test_timeupdate_small_files on Windows XP in 1 run out of 10, but it's not a test timeout. https://hg.mozilla.org/integration/mozilla-inbound/rev/8668a6b76ef2 https://hg.mozilla.org/integration/mozilla-inbound/rev/3d8b98f8a48c https://hg.mozilla.org/integration/mozilla-inbound/rev/756845df5849
Flags: needinfo?(paul)
Whiteboard: [test disabled on Windows][leave open]
Status: NEW → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Comment on attachment 741164 [details] [diff] [review] Part 1: When receiving an HTTP 416 in response to a seek, tell the media cache that we reached the end of the resource cpearce, roc correct any of this if it's wrong, thanks! [Approval Request Comment] Bug caused by (feature/regressing bug #): don't know User impact if declined: intermittent problems with small files (which was why it triggered oranges). Blocks bug 834835 from landing in Aurora, which is needed for WebRTC. Testing completed (on m-c, etc.): On m-c and tests (re)enabled Risk to taking this patch (and alternatives if risky): very low risk String or IDL/UUID changes made by this patch: none Will need to land with the other two parts
Attachment #741164 - Flags: approval-mozilla-aurora?
Comment on attachment 741165 [details] [diff] [review] Part 2: AreAllStreamsForResourceSuspended should ignore streams that can't read the resource data we need [Approval Request Comment] See patch 1 in this bug
Attachment #741165 - Flags: approval-mozilla-aurora?
Comment on attachment 741166 [details] [diff] [review] Part 3: Reenable test on Windows [Approval Request Comment] See patch 1 in this bug
Attachment #741166 - Flags: approval-mozilla-aurora?
Approving in support of a tracked bug.
Attachment #741164 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #741165 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #741166 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: