mochitest-plain: intermittent "test_mozLoadFrom.html | Test timed out"

RESOLVED FIXED in mozilla1.9.3a1

Status

()

Core
Audio/Video
RESOLVED FIXED
8 years ago
7 months ago

People

(Reporter: cpearce, Assigned: roc)

Tracking

(Blocks: 2 bugs, {intermittent-failure})

Trunk
mozilla1.9.3a1
x86
All
intermittent-failure
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9.2 +
in-testsuite +

Firefox Tracking Flags

(status1.9.2 beta4-fixed)

Details

Attachments

(7 attachments, 6 obsolete attachments)

233 bytes, text/html
Details
1.47 KB, patch
kinetik
: review+
Details | Diff | Splinter Review
3.90 KB, patch
kinetik
: review+
Details | Diff | Splinter Review
36.14 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
4.85 KB, patch
jst
: review+
Details | Diff | Splinter Review
44.50 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
900 bytes, patch
cajbir
: review+
Details | Diff | Splinter Review
(Reporter)

Description

8 years ago
content/media/test/test_mozLoadFrom.html | Test timed out.


29416 INFO Running /tests/content/media/test/test_mozLoadFrom.html...
29417 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29418 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29419 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29420 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29421 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29422 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29423 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29424 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29425 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29426 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29427 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29428 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29429 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29430 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29431 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29432 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone loaded OK
29433 INFO TEST-PASS | /tests/content/media/test/test_mozLoadFrom.html | Clone source OK
29434 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_mozLoadFrom.html | Test timed out.

Only one instance so far:

Linux mozilla-1.9.2 unit test on 2009/09/24 02:43:15
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1253785395.1253794129.16697.gz

Note test_media_selection.html timed out in the above run too.

This first occurred after bug 512328 landed, it *may* be a regression from that bug.
(Reporter)

Updated

8 years ago
Whiteboard: [orange]
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1253864500.1253867033.16082.gz
Linux mozilla-1.9.2 test mochitests on 2009/09/25 00:41:40

"Same".
Summary: content/media/test/test_mozLoadFrom.html timed out → mochitest-plain: intermittent "test_mozLoadFrom.html | Test timed out"
Version: 1.9.1 Branch → 1.9.2 Branch
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1253866022.1253869751.14326.gz
Linux mozilla-central test mochitests on 2009/09/25 01:07:02
Blocks: 438871
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1253881459.1253883421.10813.gz
WINNT 5.2 mozilla-1.9.2 test mochitests on 2009/09/25 05:24:19
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1253927251.1253929169.8220.gz
Linux mozilla-central test opt mochitests on 2009/09/25 18:07:31

Branch --> Trunk, per this failure & the one in comment 2.
Version: 1.9.2 Branch → Trunk
Some of these videos just aren't reaching loadeddata. In several cases we don't reach loadeddata for the source video, which suggests some kind of problem unrelated to mozLoadFrom. We really need to catch this in a debugger.
Created attachment 403102 [details] [diff] [review]
fix

I think I figured out what's going on here and probably in some of the other testcases (test_media_selection.html).

In this testcase we create some video elements, attach some event listeners, and start them loading. These elements aren't in the document and we don't hold JS references to them. There is in fact nothing keeping them alive. I can reproduce test failures in test_mozLoadFrom where we garbage collect one or more of the video elements while they're still loading.

This patch addresses the problem by having the media element add a reference to itself under certain conditions, which are outlined in the method AddRemoveSelfReference. Basically if the media element could fire an event in the future due to network or decoder activity, we keep it alive.

Note that if the element's owner document is torn down, so ContentDestroyed is called, then the element unconditionally drops its self-reference, so this should not be able to lead to leaks under normal circumstances. If a long-lived document starts playing a streaming media resource, and drops all references to that element, then the stream will keep playing indefinitely, but that's just what has to happen. We could do something heroic to allow GC of playing elements that have no sound track, or are muted, and have no relevant event listeners, but that's pretty crazy.
Assignee: nobody → roc
Attachment #403102 - Flags: review?(chris.double)

Comment 7

8 years ago
Comment on attachment 403102 [details] [diff] [review]
fix

>+  // Under certain conditions there may be no-one holding references to
>+  // this element from script, DOM parent, etc, but the element may still
>+  // fire meaningful events in the future so we can't destroy it yet:
> ...
>+  // It's very important that any change in state which could change the value of
>+  // needSelfReference be followed by  call to AddRemoveSelfReference before
>+  // this element could die!
>+  // It's especially important if needSelfReference would change to 'true',
>+  // since if we neglect to add a self-reference, this element might be
>+  // garbage collected while there are still event listeners that should
>+  // receive events.

This comment, or something like it, needs to be in the header file I think. Any implementer of a backend needs to know what it does and when they need to call it.
I'll move the comment near the top of nsHTMLMediaElement, but I deliberately designed this patch to not affect the decoders (or anything else outside nsHTMLMediaElement) at all.

Updated

8 years ago
Attachment #403102 - Flags: review?(chris.double) → review+
There's a compilation error on Windows on the try servers that I need to figure out.

I've also been wondering whether this can leak after all for elements that aren't in a document, where I don't think DestroyContent is called. Need to make sure it's tested.
Created attachment 404189 [details]
testcase

Yeah, this leaks if you quit while the element is running.

Also, if you close the window, we don't stop playing the media. Again, DestroyContent isn't being called. We don't Freeze the element since it doesn't go into bfcache.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254513178.1254516380.12149.gz
Created attachment 404545 [details] [diff] [review]
Part 1: fix oggplay loop issue

If we finish decoding in the metadata stage, go directly to COMPLETED since calling oggplay_step_decode again can be very bad.
Attachment #403102 - Attachment is obsolete: true
Attachment #404545 - Flags: review?(kinetik)
Created attachment 404546 [details] [diff] [review]
Part 2: fix test manifest

All these quirky Ogg file tests should be in gPlayTests instead of gSmallTests, since we test more things in gPlayTests. In gSmallTests we just need to make sure we exercise each decoder.
Attachment #404546 - Flags: review?(kinetik)
Attachment #404545 - Flags: review?(kinetik) → review+
Comment on attachment 404546 [details] [diff] [review]
Part 2: fix test manifest

This'll conflict with trunk because I fixed the comment for gPlayTests when I landed bug 518941.
Attachment #404546 - Flags: review?(kinetik) → review+
Created attachment 404547 [details] [diff] [review]
Part 3: add nsIDocument::IsActive/IsVisible

I'm not sure if these names are the best. IsActive corresponds to HTML5 terminology. IsVisible should perhaps be IsFullyActive to correspond to HTML5 too, but I don't like that name much.
Attachment #404547 - Flags: superreview?(jst)
Attachment #404547 - Flags: review?(jonas)
Created attachment 404548 [details] [diff] [review]
Part 3: real patch

Oops, attached the wrong patch.

I'm not sure if these names are the best. IsActive corresponds to HTML5
terminology. IsVisible should perhaps be IsFullyActive to correspond to HTML5
too, but I don't like that name much.
Attachment #404547 - Attachment is obsolete: true
Attachment #404548 - Flags: superreview?(jst)
Attachment #404548 - Flags: review?(jonas)
Attachment #404547 - Flags: superreview?(jst)
Attachment #404547 - Flags: review?(jonas)
Created attachment 404552 [details] [diff] [review]
Part 4: Fix media element lifecycle

Rework the way we handle media elements when the document is frozen, thawed, or removed from its content viewer (becomes inactive). In all these situations we call NotifyOwnerDocumentActivityChanged on the element, which then pauses or resumes itself. We also handle the case where an element is reassigned from an inactive document to an active document, and I've added a test for this. (Trunk would fail this test because currently ContentDestroyed destroys the decoder instead of just pausing it.)

This also has the AddRemoveSelfReference work from the previous attempt and ensures that we drop the self-reference if the owner document becomes inactive (no loading or playback will occur while the owner document is inactive). This fixes the leak.
Attachment #404552 - Flags: review?(chris.double)
Comment on attachment 404552 [details] [diff] [review]
Part 4: Fix media element lifecycle

Adding jst for review on the content parts.
Attachment #404552 - Attachment description: Part 4: → Part 4: Fix media element lifecycle
Attachment #404552 - Flags: review?(jst)
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254761201.1254763927.26954.gz
Linux mozilla-central test mochitests on 2009/10/05 17:03:39
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254787419.1254790119.2999.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1254888916.1254891576.32363.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254968020.1254970293.18299.gz
Linux mozilla-central test opt mochitests on 2009/10/07 19:13:40  
40840 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_mozLoadFrom.html | Test timed out.
Blocks: 520417
Patch 4 actually causes a deadlock. nsMediaChannelStream::CacheClientResume (and CacheClientSeek) should not grab any locks (because we're currently holding the media cache lock), but it can get into nsHTMLMediaElement::AddRemoveSelfReference which calls IsEnded on the decoder which can get the decoder lock. I need to figure out the best way to fix this...
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1255044802.1255047319.6358.gz
Linux mozilla-central test mochitests on 2009/10/08 16:33:22
Created attachment 405379 [details] [diff] [review]
Part 2.5: Relax requirements on CacheClientSeek/Suspend/Resume

This fixes the problem. Basically in nsMediaCache::Update we hold the lock while we figure out what we're going to do to each stream's channel, then we drop the lock while we actually tell the stream what we want to do. This lets those stream methods do pretty much whatever they want without risking deadlocks. We don't introduce races because the stream data members we use when calling CacheClient* are only written by the main thread, and we're on the main thread.

This does mean that while we're implementing our decisions, other threads can be issuing seeks or reads that make our decisions obsolete. But that's OK, because another Update will run soon that will make new decisions.
Attachment #405379 - Flags: review?(chris.double)
Created attachment 405395 [details] [diff] [review]
Part 3 v2: Fix IsActive

Turns out the definition of IsActive isn't quite right. Checking for the script global object returns false after the document's SetContainer has put it in a content viewer, but before the script global object is set up, and this confuses nsVideoDocument when it creates the video element for a stand-alone video, which happens during that interval. Checking the container directly seems more accurate.
Attachment #405395 - Flags: superreview?(jst)
Attachment #405395 - Flags: review?
Attachment #405395 - Flags: review? → review?(jonas)
Created attachment 405398 [details] [diff] [review]
Part 4 v2: Fix media element lifecycle

Fire NotifyOwnerDocumentActivityChanged from SetContainer now that we're actually using the container to determine IsActive.

Also updated the patch so that media elements with no owner document (which might be created from C++, say) behave as if their document is active.
Attachment #404548 - Attachment is obsolete: true
Attachment #404552 - Attachment is obsolete: true
Attachment #405398 - Flags: superreview?(jst)
Attachment #405398 - Flags: review?(chris.double)
Attachment #404552 - Flags: review?(jst)
Attachment #404552 - Flags: review?(chris.double)
Attachment #404548 - Flags: superreview?(jst)
Attachment #404548 - Flags: review?(jonas)
What we use for XMLHttpRequest is to check something like this:

w = document->scriptglobal->QI(window);
w == w->getOuter->getInner

if the test on the second line succeeds then we consider the document active.

This way does seem a lot simpler though. But I'm always wary of using two methods to test the same thing. Smaug, could we use the simpler method from attachment 405395 [details] [diff] [review]?
See comment #26, relying on the script global object doesn't work quite right because we need to know the document is active before that's been set up.
Jonas, which method are you talking about? IsVisible()?
That could work. Would have to test all the related security bugs.
And depends on how we want to handle XHR when document.open()
creates new inner window.
And because of document.open, inner window check isn't the same as
visibility check.
Created attachment 405449 [details] [diff] [review]
Part 3 v3: IsActive

Sigh, the last patch wasn't quite right either. We get into a state where the document has a container but mRemovedFromDocshell has been set, and in that state it's not active.
Attachment #405395 - Attachment is obsolete: true
Attachment #405449 - Flags: superreview?(jst)
Attachment #405449 - Flags: review?(jonas)
Attachment #405395 - Flags: superreview?(jst)
Attachment #405395 - Flags: review?(jonas)
Created attachment 405450 [details] [diff] [review]
Part 4 v3 updated

Updated to match part 3 v3 --- we need to notify freezable elements when a document becomes inactive because mRemovedFromDocshell has been set.
Attachment #405449 - Attachment is obsolete: true
Attachment #405450 - Flags: superreview?(jst)
Attachment #405450 - Flags: review?(chris.double)
Attachment #405449 - Flags: superreview?(jst)
Attachment #405449 - Flags: review?(jonas)

Updated

8 years ago
Attachment #405450 - Flags: superreview?(jst) → superreview+

Updated

8 years ago
Attachment #405450 - Flags: review?(chris.double) → review+

Updated

8 years ago
Attachment #405398 - Flags: review?(chris.double)

Comment 34

8 years ago
Comment on attachment 405398 [details] [diff] [review]
Part 4 v2: Fix media element lifecycle

This patch is obsolete, right?

Updated

8 years ago
Attachment #405379 - Flags: review?(chris.double) → review+
Attachment #405398 - Attachment is obsolete: true
Attachment #405398 - Flags: superreview?(jst)
Comment on attachment 405449 [details] [diff] [review]
Part 3 v3: IsActive

Oops, this patch was accidentally obsoleted. We need API to determine whether a document is "active" and "fully active" as per HTML5. Here, "fully active" can be implemented as doc->IsActive() && doc->IsVisible().
Attachment #405449 - Attachment is obsolete: false
Attachment #405449 - Flags: superreview?(jst)
Attachment #405449 - Flags: review?(jonas)

Updated

8 years ago
Attachment #405449 - Flags: superreview?(jst)
Attachment #405449 - Flags: superreview+
Attachment #405449 - Flags: review?(jonas)
Attachment #405449 - Flags: review+
Depends on: 522418
Checked in parts 1 and 2:
http://hg.mozilla.org/mozilla-central/rev/17f4a71519b4
http://hg.mozilla.org/mozilla-central/rev/f992a1ed3e78
Linux mozilla-central test mochitests
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1255867386.1255869994.16563.gz
Linux mozilla-1.9.2 test mochitests on 2009/10/22 23:51:12
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1256280672.1256283723.21051.gz

(bug 515153 also occurred during this run)
Blocks: 524333
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256467621.1256468543.13756.gz
Linux mozilla-central test opt mochitests-1/5 on 2009/10/25 03:47:01
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256580445.1256581281.827.gz
Linux mozilla-central test opt mochitests-1/5 on 2009/10/26 11:07:25
Checked in the rest:
http://hg.mozilla.org/mozilla-central/rev/daf8284f97ff
http://hg.mozilla.org/mozilla-central/rev/d71007568c24
http://hg.mozilla.org/mozilla-central/rev/e79927f90d0b
And a bustage fix:
http://hg.mozilla.org/mozilla-central/rev/66a8b356b511
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Flags: in-testsuite+
Target Milestone: --- → mozilla1.9.3a1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256791548.1256792519.31917.gz
WINNT 5.2 mozilla-central test mochitests-1/5 on 2009/10/28 21:45:48  
40322 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_mozLoadFrom.html | Test timed out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Updated

8 years ago
Depends on: 525319
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256940759.1256942090.28313.gz
Linux mozilla-central test opt mochitests-1/5 on 2009/10/30 15:12:39
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1257551939.1257554961.30589.gz
Linux mozilla-1.9.2 test mochitests on 2009/11/06 15:58:59
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1257557487.1257561437.17535.gz
Linux mozilla-1.9.2 test mochitests on 2009/11/06 17:31:27
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1257693373.1257696071.28338.gz
Linux mozilla-1.9.2 test mochitests on 2009/11/08 07:16:13
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1257799560.1257802675.26883.gz
OS X 10.5.2 mozilla-1.9.2 test mochitests on 2009/11/09 12:46:00
OS: Linux → All
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1257876913.1257880183.8997.gz
Linux mozilla-1.9.2 test mochitests Started 13:15, finished 14:10
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1257996148.1257999433.536.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1258080137.1258084283.26840.gz
Linux mozilla-1.9.2 test mochitests on 2009/11/12 18:42:17
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258147324.1258148557.18770.gz
Linux mozilla-central opt test mochitests-1/5 on 2009/11/13 13:22:04
s: moz2-linux-slave24
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1258147839.1258152678.885.gz#err0
Linux mozilla-1.9.2 test mochitests on 2009/11/13 13:30:39
s: moz2-linux-slave06
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258405602.1258406957.20361.gz
Linux mozilla-central opt test mochitests-1/5
Depends on: 529105
Created attachment 412753 [details] [diff] [review]
Part 5: use desiredOffset instead of the old channel offset

Your blessed +5 debugger named VMWare hits!

In nsMediaCache::Update, when we are checking to see if some other stream is already loading the data the current stream wants to load, we're comparing other->mChannelOffset to stream->mChannelOffset. But stream->mChannelOffset is the offset that the stream *was* loading from, not the offset that the stream wants to load from now.

This was causing a hang because occasionally we'd get into a situation where stream A was positioned at EOF but wanted to read some data earlier in the resource, and another stream B for the same resource was positioned at EOF and was happy to stay there (suspended). So because of this bug, stream A's read would be canceled and it would never get the data it needed.
Attachment #412753 - Flags: review?(chris.double)

Updated

8 years ago
Attachment #412753 - Flags: review?(chris.double) → review+
Whiteboard: [orange] → [orange][needs landing]
Is anyone keeping track of how many of our intermittent test failures are actually bugs in the product? Maybe that would be some motivation to fix orange...
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6-Unittest/1258457486.1258461439.15360.gz
That data would be pretty easy to collect. In my experience intermittent failures usually are bugs in the product. I think the problem is not lack of motivation to fix them, but substantial de-motivation due to the difficulty (both real and imagined) of reproducing and fixing them. If you have a fixed amount of time, and you can choose to work on bugs that are easy to reproduce or bugs that aren't, which ones are you going to choose?
It's quite true. We need to get the forge working overtime on replicating your Sword +5 against Orange.
This is not the right place to discuss about this, but...

(In reply to comment #57)
> In my experience intermittent
> failures usually are bugs in the product.
Well there have been quite a few bugs in the testcases too.
For example bug 487817/bug 513707 fixed one class of such bugs.
http://hg.mozilla.org/mozilla-central/rev/8548c0ac4a93

I did 1700 runs of test_mozLoadFrom.html on our test machine with no problems, so I hope we're done here.

These are real bug fixes which can lead to hangs, dead spinning threads, etc. I think we should land this stuff on 1.9.2. Chris, do you agree?
Status: REOPENED → RESOLVED
Last Resolved: 8 years ago8 years ago
Resolution: --- → FIXED
Whiteboard: [orange][needs landing] → [orange]
Flags: wanted1.9.2?
Flags: wanted1.9.2? → blocking1.9.2+
Whiteboard: [orange] → [orange][needs 192 landing]
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/11933a68b1f7
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/dcce9bb370f1
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/330b5fc9e19d
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/de966a75c503
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/145dfa80911d
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/58c0a7b93b4f
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/ecd586c766c4
status1.9.2: --- → final-fixed
Whiteboard: [orange][needs 192 landing] → [orange]
(Reporter)

Comment 62

8 years ago
(In reply to comment #58)
> It's quite true. We need to get the forge working overtime on replicating your
> Sword +5 against Orange.

I have documented my replay debugging setup here: https://developer.mozilla.org/En/Debugging/Record_and_Replay_Debugging_Firefox
Depends on: 541465
Depends on: 557393
Keywords: intermittent-failure
Whiteboard: [orange]
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Updated

4 years ago
Depends on: 962986

Updated

7 months ago
Blocks: 1319725
You need to log in before you can comment on or make changes to this bug.