Closed Bug 687972 Opened 8 years ago Closed 7 years ago

Intermittent test_timeupdate_small_files.html | Test timed out, often followed by a shutdown timeout and crash

Categories

(Core :: Audio/Video, defect)

x86
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: graememcc, Assigned: roc)

References

Details

(Keywords: intermittent-failure)

Attachments

(8 files, 1 obsolete file)

This occurred on mozilla-inbound, testing my push https://hg.mozilla.org/integration/mozilla-inbound/rev/812cea5c372e

https://tbpl.mozilla.org/php/getParsedLog.php?id=6470454&full=1

TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_timeupdate_small_files.html | application timed out after 330 seconds with no output
INFO | automation.py | Application ran for: 0:14:05.875000
INFO | automation.py | Reading PID log: c:\docume~1\cltbld\locals~1\temp\tmp4blrrkpidlog
==> process 3416 launched child process 3720
INFO | automation.py | Checking for orphan process with PID: 3720
PROCESS-CRASH | /tests/content/media/test/test_timeupdate_small_files.html | application crashed (minidump found)
Whiteboard: [orange]
Looks like the crash was fixed.
Summary: Intermittent test_timeupdate_small_files.html | Test timed out. followed by crash → Intermittent test_timeupdate_small_files.html | Test timed out.
CC'ing test-author cpearce.

cpearce: In the past few days (starting 11/24-25), this test began randomly failing > 10 times per day, which (I think) puts it on track to be one of the most painful current randomoranges.

Any chance you could look into this? (possibly adding instrumentation or something to help figure out what's going on)
The increase in frequency in this test failing coincides with the landing of bug 703379. Roc, any ideas?
Yup, I was just about to say that exact same thing. :)

More details:
===
Comment 2 (where this got bad) was right after this media-related m-i push (1 push later):
  http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=09d5d6e6dd28
(and I don't see any media-related changes in the day or so of pushlog before that).

Moreover, Comment 3 was *on* the mi --> m-c merge that included that push (09d5d6e6dd28). 

So I suspect that something from the 09d5d6e6dd28 push was involved in making this go highly-randomorange.  Likely bug 703379, based on media-relatedness.

Tentatively marking as blocking bug 703379 (for the recent spike from comment 2 onwards).
===
Blocks: 703379
It's a bit confusing.

The test logs all contain the lines
77941 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - should see at least one timeupdate: 0.266664
77947 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - should see at least one timeupdate: 1
77950 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - should see at least one timeupdate: 0.276085
which indicates that those three media elements ended. So why didn't MediaTestManager spawn off the rest of the tests?
I retriggered a bunch of M1 on the push before Roc, and all of those are green so far.
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=a4492c6d02b0

I also triggered a Win PGO build on Roc's push, and waiting for it to finish to trigger some M1 there, as well. That should give some hint whether that changeset is involved.
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=74fed050ea42
I think my retriggers leave no doubts that Roc's push is involved in the random orange increase, even if indirectly: all green on one, all orange on the other one.
It looks like this mostly (only?) happens on PGO or opt builds, and given roc's comment 42 I'm suspcious of bug in the JS array code (possibly caused by compiler optimizations, we had a similar problem before causing media test orange).

So this patch adds some logging to verify the MediaTestManager's array storage is correctly maintained.
Attachment #577389 - Flags: review?(roc)
Attachment #577389 - Flags: review?(roc) → review+
Landed verification for MediaTestManager. Note this won't stop the orange!
https://hg.mozilla.org/integration/mozilla-inbound/rev/26597b0e3681
There seems to be a new issue here.  This seems to be showing up way more often on recent builds.
(In reply to Bill Gianopoulos from comment #74)
> There seems to be a new issue here.  This seems to be showing up way more
> often on recent builds.

see comment 41, comment 42 and comment 60.
(In reply to TinderboxPushlog Robot from comment #78)
> philor
> https://tbpl.mozilla.org/php/getParsedLog.php?id=7621836&tree=Mozilla-Inbound
> Rev3 WINNT 5.1 mozilla-inbound opt test mochitests-1/5 on 2011-11-28 17:49:26
> 
> 78857 ERROR TEST-UNEXPECTED-FAIL |
> /tests/content/media/test/test_timeupdate_small_files.html | Test timed out.

So this includes:

78835 INFO TEST-START | /tests/content/media/test/test_timeupdate_small_files.html
78836 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | Started Mon Nov 28 2011 18:02:17 GMT-0800 (Pacific Standard Time) (1322532137.548s)
78837 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [started] Length of array should match number of running tests - 1 should equal 1
78838 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [started] Length of array should match number of running tests - 2 should equal 2
78839 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - shouldn't get timeupdate after ended
78840 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended
78841 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - shouldn't get timeupdate after ended
78842 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - should see at least one timeupdate: 0.266664
78843 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended
78844 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended
78845 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [finished] Length of array should match number of running tests - 1 should equal 1
78846 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [started] Length of array should match number of running tests - 2 should equal 2
78847 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended
78848 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - shouldn't get timeupdate after ended
78849 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended
78850 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - should see at least one timeupdate: 1
78851 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - shouldn't get timeupdate after ended
78852 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - shouldn't get timeupdate after ended
78853 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - should see at least one timeupdate: 0.276085
78854 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [finished] Length of array should match number of running tests - 1 should equal 1
78855 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [started] Length of array should match number of running tests - 2 should equal 2
78856 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [finished] Length of array should match number of running tests - 1 should equal 1
78857 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_timeupdate_small_files.html | Test timed out.

So that looks like we're starting test_timeupate_small_files for seek.webm, but not receiving ended or timeupdate events for that media for some reason.
Attached patch probable fix (obsolete) — Splinter Review
This bug could have manifested before my patch, if one somehow happened to load a resource over HTTP (thus through the media cache) but really really fast. So hopefully this will fix the orange that existed before my patch.

I still can't reproduce so we'll have to check this in and hope it fixed tinderbox.
Assignee: nobody → roc
Attachment #577736 - Flags: review?(chris)
Attachment #577736 - Attachment is obsolete: true
Attachment #577736 - Flags: review?(chris)
(In reply to TinderboxPushlog Robot from comment #114)
> chris%pearce.org.nz
> https://tbpl.mozilla.org/php/getParsedLog.php?id=7645541&tree=Mozilla-Inbound
> Rev3 WINNT 5.1 mozilla-inbound opt test mochitests-1/5 on 2011-11-29 16:36:20
> 
> 78946 ERROR TEST-UNEXPECTED-FAIL |
> /tests/content/media/test/test_timeupdate_small_files.html | Test timed out.

Gah, we didn't fix it.
Note: if at least one stream for the resource is not suspended, then in the future the decoder should receive either a NotifyBytesDownloaded or NotifySuspendedStatusChanged notification, giving us a chance to fire canplaythrough then.
Attachment #577844 - Flags: review?(chris) → review+
This patch didn't seem to fix it either :-(.

Pushed a little bit more logging:
https://hg.mozilla.org/integration/mozilla-inbound/rev/72cb519c581a
Attached patch try 3Splinter Review
Maybe this'll fix it :-)
Attachment #578101 - Flags: review?(chris)
Attachment #578101 - Flags: review?(chris) → review+
Looks fixed. Four cycles with no test failure.

Pushed a patch to remove the logging code.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a6db0c6c94a3

At the next inbound merge to central, if there are still no test failures, this can be closed.
Not sure about all of those, but most of those were after 4679da64ba71
Yes. I'll reland the logging code.
Actually Brad Lassey already backed it out. So we just have to wait to get more logs.
8519 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got loadedmetadata
78520 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got loadeddata
78521 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got canplay
*** nsBuiltinDecoder::NotifySuspendedStatusChanged(0x30303920), suspended=0
78522 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got progress
78523 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got suspend
78524 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [finished small-shot.ogg-2] Length of array should match number of running tests - 1 should equal 1
78525 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/small-shot.ogg got timeupdate
78526 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/small-shot.ogg got pause
78527 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got progress
78528 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_timeupdate_small_files.html | Test timed out.

Need more logging...
The latest logging patch (09a0517358d9, 7b22bc7c9847) seems to have introduced a frequent leak in Linux debug crashtests (bug 706883).
Depends on: 706883
Even more logging.
https://hg.mozilla.org/integration/mozilla-inbound/rev/828c928eae5b

(In reply to Matt Brubeck (:mbrubeck) from comment #189)
> The latest logging patch (09a0517358d9, 7b22bc7c9847) seems to have
> introduced a frequent leak in Linux debug crashtests (bug 706883).

That seems .... surprising.

These will get backed out soon hopefully anyway.
Attached patch try 4Splinter Review
I spotted a bug in the previous patch.
Attachment #578489 - Flags: review?(chris.double)
Comment on attachment 578489 [details] [diff] [review]
try 4

Just nesting nits.

>+  for (PRUint32 i = 0; i < mStreams.Length(); ++i) {
>+    nsMediaCacheStream* stream = mStreams[i];
>+	nsresult rv;
>+    switch (actions[i]) {

Nesting of 'nsresult rv' line needs adjusting.

>     default:
>+	  rv = NS_OK;
>       break;
>     }

Nesting of 'rv' line.
Attachment #578489 - Flags: review?(chris.double) → review+
Frequency seems to have dropped considerably but there's still one instance of this after my latest landing :-(.
I'll start pushing more logging to try.
Attached patch try 5Splinter Review
Found another bug. Not sure whether this is it, but the code change should be correct.
Attachment #581025 - Flags: review?(chris)
Attachment #581025 - Flags: review?(chris) → review+
Attached patch remove loggingSplinter Review
Patch to remove the logging code added for this bug.
https://hg.mozilla.org/mozilla-central/rev/b4a7eba05092
https://hg.mozilla.org/mozilla-central/rev/24bde89bd5a9
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Whiteboard: [orange][inbound] → [orange]
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #593685 - Flags: review?(roc)
Attachment #593685 - Flags: review?(roc) → review+
http://hg.mozilla.org/integration/mozilla-inbound/rev/f586c1a6e632
Whiteboard: [orange] → [orange][leave open after merge]
Whiteboard: [orange][leave open after merge] → [orange]