Closed Bug 518659 Opened 15 years ago Closed 15 years ago

mochitest-plain: intermittent "test_mozLoadFrom.html | Test timed out"

Categories

(Core :: Audio/Video, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1
Tracking Status
status1.9.2 --- beta4-fixed

People

(Reporter: cpearce, Assigned: roc)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(7 files, 6 obsolete files)

233 bytes, text/html
Details
1.47 KB, patch
kinetik
: review+
Details | Diff | Splinter Review
3.90 KB, patch
kinetik
: review+
Details | Diff | Splinter Review
36.14 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
4.85 KB, patch
jst
: review+
Details | Diff | Splinter Review
44.50 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
900 bytes, patch
cajbir
: review+
Details | Diff | Splinter Review
content/media/test/test_mozLoadFrom.html | Test timed out.


29416 INFO Running /tests/content/media/test/test_mozLoadFrom.html...
29417 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29418 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29419 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29420 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29421 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29422 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29423 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29424 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29425 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29426 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29427 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29428 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29429 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29430 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29431 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29432 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29433 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29434 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_mozLoadFrom.html | Test timed out.

Only one instance so far:

Linux mozilla-1.9.2 unit test on 2009/09/24 02:43:15
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1253785395.1253794129.16697.gz

Note test_media_selection.html timed out in the above run too.

This first occurred after bug 512328 landed, it *may* be a regression from that bug.
Whiteboard: [orange]
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1253864500.1253867033.16082.gz
Linux mozilla-1.9.2 test mochitests on 2009/09/25 00:41:40

"Same".
Summary: content/media/test/test_mozLoadFrom.html timed out → mochitest-plain: intermittent "test_mozLoadFrom.html | Test timed out"
Version: 1.9.1 Branch → 1.9.2 Branch
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1253927251.1253929169.8220.gz
Linux mozilla-central test opt mochitests on 2009/09/25 18:07:31

Branch --> Trunk, per this failure & the one in comment 2.
Version: 1.9.2 Branch → Trunk
Some of these videos just aren't reaching loadeddata. In several cases we don't reach loadeddata for the source video, which suggests some kind of problem unrelated to mozLoadFrom. We really need to catch this in a debugger.
Attached patch fix (obsolete) — Splinter Review
I think I figured out what's going on here and probably in some of the other testcases (test_media_selection.html).

In this testcase we create some video elements, attach some event listeners, and start them loading. These elements aren't in the document and we don't hold JS references to them. There is in fact nothing keeping them alive. I can reproduce test failures in test_mozLoadFrom where we garbage collect one or more of the video elements while they're still loading.

This patch addresses the problem by having the media element add a reference to itself under certain conditions, which are outlined in the method AddRemoveSelfReference. Basically if the media element could fire an event in the future due to network or decoder activity, we keep it alive.

Note that if the element's owner document is torn down, so ContentDestroyed is called, then the element unconditionally drops its self-reference, so this should not be able to lead to leaks under normal circumstances. If a long-lived document starts playing a streaming media resource, and drops all references to that element, then the stream will keep playing indefinitely, but that's just what has to happen. We could do something heroic to allow GC of playing elements that have no sound track, or are muted, and have no relevant event listeners, but that's pretty crazy.
Assignee: nobody → roc
Attachment #403102 - Flags: review?(chris.double)
Comment on attachment 403102 [details] [diff] [review]
fix

>+  // Under certain conditions there may be no-one holding references to
>+  // this element from script, DOM parent, etc, but the element may still
>+  // fire meaningful events in the future so we can't destroy it yet:
> ...
>+  // It's very important that any change in state which could change the value of
>+  // needSelfReference be followed by  call to AddRemoveSelfReference before
>+  // this element could die!
>+  // It's especially important if needSelfReference would change to 'true',
>+  // since if we neglect to add a self-reference, this element might be
>+  // garbage collected while there are still event listeners that should
>+  // receive events.

This comment, or something like it, needs to be in the header file I think. Any implementer of a backend needs to know what it does and when they need to call it.
I'll move the comment near the top of nsHTMLMediaElement, but I deliberately designed this patch to not affect the decoders (or anything else outside nsHTMLMediaElement) at all.
Attachment #403102 - Flags: review?(chris.double) → review+
There's a compilation error on Windows on the try servers that I need to figure out.

I've also been wondering whether this can leak after all for elements that aren't in a document, where I don't think DestroyContent is called. Need to make sure it's tested.
Attached file testcase
Yeah, this leaks if you quit while the element is running.

Also, if you close the window, we don't stop playing the media. Again, DestroyContent isn't being called. We don't Freeze the element since it doesn't go into bfcache.
If we finish decoding in the metadata stage, go directly to COMPLETED since calling oggplay_step_decode again can be very bad.
Attachment #403102 - Attachment is obsolete: true
Attachment #404545 - Flags: review?(kinetik)
All these quirky Ogg file tests should be in gPlayTests instead of gSmallTests, since we test more things in gPlayTests. In gSmallTests we just need to make sure we exercise each decoder.
Attachment #404546 - Flags: review?(kinetik)
Attachment #404545 - Flags: review?(kinetik) → review+
Comment on attachment 404546 [details] [diff] [review]
Part 2: fix test manifest

This'll conflict with trunk because I fixed the comment for gPlayTests when I landed bug 518941.
Attachment #404546 - Flags: review?(kinetik) → review+
I'm not sure if these names are the best. IsActive corresponds to HTML5 terminology. IsVisible should perhaps be IsFullyActive to correspond to HTML5 too, but I don't like that name much.
Attachment #404547 - Flags: superreview?(jst)
Attachment #404547 - Flags: review?(jonas)
Attached patch Part 3: real patch (obsolete) — Splinter Review
Oops, attached the wrong patch.

I'm not sure if these names are the best. IsActive corresponds to HTML5
terminology. IsVisible should perhaps be IsFullyActive to correspond to HTML5
too, but I don't like that name much.
Attachment #404547 - Attachment is obsolete: true
Attachment #404548 - Flags: superreview?(jst)
Attachment #404548 - Flags: review?(jonas)
Attachment #404547 - Flags: superreview?(jst)
Attachment #404547 - Flags: review?(jonas)
Rework the way we handle media elements when the document is frozen, thawed, or removed from its content viewer (becomes inactive). In all these situations we call NotifyOwnerDocumentActivityChanged on the element, which then pauses or resumes itself. We also handle the case where an element is reassigned from an inactive document to an active document, and I've added a test for this. (Trunk would fail this test because currently ContentDestroyed destroys the decoder instead of just pausing it.)

This also has the AddRemoveSelfReference work from the previous attempt and ensures that we drop the self-reference if the owner document becomes inactive (no loading or playback will occur while the owner document is inactive). This fixes the leak.
Attachment #404552 - Flags: review?(chris.double)
Comment on attachment 404552 [details] [diff] [review]
Part 4: Fix media element lifecycle

Adding jst for review on the content parts.
Attachment #404552 - Attachment description: Part 4: → Part 4: Fix media element lifecycle
Attachment #404552 - Flags: review?(jst)
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254968020.1254970293.18299.gz
Linux mozilla-central test opt mochitests on 2009/10/07 19:13:40  
40840 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_mozLoadFrom.html | Test timed out.
Patch 4 actually causes a deadlock. nsMediaChannelStream::CacheClientResume (and CacheClientSeek) should not grab any locks (because we're currently holding the media cache lock), but it can get into nsHTMLMediaElement::AddRemoveSelfReference which calls IsEnded on the decoder which can get the decoder lock. I need to figure out the best way to fix this...
This fixes the problem. Basically in nsMediaCache::Update we hold the lock while we figure out what we're going to do to each stream's channel, then we drop the lock while we actually tell the stream what we want to do. This lets those stream methods do pretty much whatever they want without risking deadlocks. We don't introduce races because the stream data members we use when calling CacheClient* are only written by the main thread, and we're on the main thread.

This does mean that while we're implementing our decisions, other threads can be issuing seeks or reads that make our decisions obsolete. But that's OK, because another Update will run soon that will make new decisions.
Attachment #405379 - Flags: review?(chris.double)
Attached patch Part 3 v2: Fix IsActive (obsolete) — Splinter Review
Turns out the definition of IsActive isn't quite right. Checking for the script global object returns false after the document's SetContainer has put it in a content viewer, but before the script global object is set up, and this confuses nsVideoDocument when it creates the video element for a stand-alone video, which happens during that interval. Checking the container directly seems more accurate.
Attachment #405395 - Flags: superreview?(jst)
Attachment #405395 - Flags: review?
Attachment #405395 - Flags: review? → review?(jonas)
Fire NotifyOwnerDocumentActivityChanged from SetContainer now that we're actually using the container to determine IsActive.

Also updated the patch so that media elements with no owner document (which might be created from C++, say) behave as if their document is active.
Attachment #404548 - Attachment is obsolete: true
Attachment #404552 - Attachment is obsolete: true
Attachment #405398 - Flags: superreview?(jst)
Attachment #405398 - Flags: review?(chris.double)
Attachment #404552 - Flags: review?(jst)
Attachment #404552 - Flags: review?(chris.double)
Attachment #404548 - Flags: superreview?(jst)
Attachment #404548 - Flags: review?(jonas)
What we use for XMLHttpRequest is to check something like this:

w = document->scriptglobal->QI(window);
w == w->getOuter->getInner

if the test on the second line succeeds then we consider the document active.

This way does seem a lot simpler though. But I'm always wary of using two methods to test the same thing. Smaug, could we use the simpler method from attachment 405395 [details] [diff] [review]?
See comment #26, relying on the script global object doesn't work quite right because we need to know the document is active before that's been set up.
Jonas, which method are you talking about? IsVisible()?
That could work. Would have to test all the related security bugs.
And depends on how we want to handle XHR when document.open()
creates new inner window.
And because of document.open, inner window check isn't the same as
visibility check.
Sigh, the last patch wasn't quite right either. We get into a state where the document has a container but mRemovedFromDocshell has been set, and in that state it's not active.
Attachment #405395 - Attachment is obsolete: true
Attachment #405449 - Flags: superreview?(jst)
Attachment #405449 - Flags: review?(jonas)
Attachment #405395 - Flags: superreview?(jst)
Attachment #405395 - Flags: review?(jonas)
Updated to match part 3 v3 --- we need to notify freezable elements when a document becomes inactive because mRemovedFromDocshell has been set.
Attachment #405449 - Attachment is obsolete: true
Attachment #405450 - Flags: superreview?(jst)
Attachment #405450 - Flags: review?(chris.double)
Attachment #405449 - Flags: superreview?(jst)
Attachment #405449 - Flags: review?(jonas)
Attachment #405450 - Flags: superreview?(jst) → superreview+
Attachment #405450 - Flags: review?(chris.double) → review+
Attachment #405398 - Flags: review?(chris.double)
Comment on attachment 405398 [details] [diff] [review]
Part 4 v2: Fix media element lifecycle

This patch is obsolete, right?
Attachment #405379 - Flags: review?(chris.double) → review+
Attachment #405398 - Attachment is obsolete: true
Attachment #405398 - Flags: superreview?(jst)
Comment on attachment 405449 [details] [diff] [review]
Part 3 v3: IsActive

Oops, this patch was accidentally obsoleted. We need API to determine whether a document is "active" and "fully active" as per HTML5. Here, "fully active" can be implemented as doc->IsActive() && doc->IsVisible().
Attachment #405449 - Attachment is obsolete: false
Attachment #405449 - Flags: superreview?(jst)
Attachment #405449 - Flags: review?(jonas)
Attachment #405449 - Flags: superreview?(jst)
Attachment #405449 - Flags: superreview+
Attachment #405449 - Flags: review?(jonas)
Attachment #405449 - Flags: review+
Depends on: 522418
Linux mozilla-1.9.2 test mochitests on 2009/10/22 23:51:12
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1256280672.1256283723.21051.gz

(bug 515153 also occurred during this run)
Blocks: 524333
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256467621.1256468543.13756.gz
Linux mozilla-central test opt mochitests-1/5 on 2009/10/25 03:47:01
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256580445.1256581281.827.gz
Linux mozilla-central test opt mochitests-1/5 on 2009/10/26 11:07:25
Flags: in-testsuite+
Target Milestone: --- → mozilla1.9.3a1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256791548.1256792519.31917.gz
WINNT 5.2 mozilla-central test mochitests-1/5 on 2009/10/28 21:45:48  
40322 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_mozLoadFrom.html | Test timed out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 525319
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256940759.1256942090.28313.gz
Linux mozilla-central test opt mochitests-1/5 on 2009/10/30 15:12:39
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1257876913.1257880183.8997.gz
Linux mozilla-1.9.2 test mochitests Started 13:15, finished 14:10
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258147324.1258148557.18770.gz
Linux mozilla-central opt test mochitests-1/5 on 2009/11/13 13:22:04
s: moz2-linux-slave24
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1258147839.1258152678.885.gz#err0
Linux mozilla-1.9.2 test mochitests on 2009/11/13 13:30:39
s: moz2-linux-slave06
Your blessed +5 debugger named VMWare hits!

In nsMediaCache::Update, when we are checking to see if some other stream is already loading the data the current stream wants to load, we're comparing other->mChannelOffset to stream->mChannelOffset. But stream->mChannelOffset is the offset that the stream *was* loading from, not the offset that the stream wants to load from now.

This was causing a hang because occasionally we'd get into a situation where stream A was positioned at EOF but wanted to read some data earlier in the resource, and another stream B for the same resource was positioned at EOF and was happy to stay there (suspended). So because of this bug, stream A's read would be canceled and it would never get the data it needed.
Attachment #412753 - Flags: review?(chris.double)
Attachment #412753 - Flags: review?(chris.double) → review+
Whiteboard: [orange] → [orange][needs landing]
Is anyone keeping track of how many of our intermittent test failures are actually bugs in the product? Maybe that would be some motivation to fix orange...
That data would be pretty easy to collect. In my experience intermittent failures usually are bugs in the product. I think the problem is not lack of motivation to fix them, but substantial de-motivation due to the difficulty (both real and imagined) of reproducing and fixing them. If you have a fixed amount of time, and you can choose to work on bugs that are easy to reproduce or bugs that aren't, which ones are you going to choose?
It's quite true. We need to get the forge working overtime on replicating your Sword +5 against Orange.
This is not the right place to discuss about this, but...

(In reply to comment #57)
> In my experience intermittent
> failures usually are bugs in the product.
Well there have been quite a few bugs in the testcases too.
For example bug 487817/bug 513707 fixed one class of such bugs.
http://hg.mozilla.org/mozilla-central/rev/8548c0ac4a93

I did 1700 runs of test_mozLoadFrom.html on our test machine with no problems, so I hope we're done here.

These are real bug fixes which can lead to hangs, dead spinning threads, etc. I think we should land this stuff on 1.9.2. Chris, do you agree?
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Whiteboard: [orange][needs landing] → [orange]
Flags: wanted1.9.2?
Flags: wanted1.9.2? → blocking1.9.2+
Whiteboard: [orange] → [orange][needs 192 landing]
(In reply to comment #58)
> It's quite true. We need to get the forge working overtime on replicating your
> Sword +5 against Orange.

I have documented my replay debugging setup here: https://developer.mozilla.org/En/Debugging/Record_and_Replay_Debugging_Firefox
Depends on: 541465
Depends on: 557393
Whiteboard: [orange]
Blocks: 1319725
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: