Closed Bug 687972 Opened 14 years ago Closed 13 years ago

Intermittent test_timeupdate_small_files.html | Test timed out, often followed by a shutdown timeout and crash

Categories

(Core :: Audio/Video, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: graememcc, Assigned: roc)

References

Details

(Keywords: intermittent-failure)

Attachments

(8 files, 1 obsolete file)

This occurred on mozilla-inbound, testing my push https://hg.mozilla.org/integration/mozilla-inbound/rev/812cea5c372e https://tbpl.mozilla.org/php/getParsedLog.php?id=6470454&full=1 TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_timeupdate_small_files.html | application timed out after 330 seconds with no output INFO | automation.py | Application ran for: 0:14:05.875000 INFO | automation.py | Reading PID log: c:\docume~1\cltbld\locals~1\temp\tmp4blrrkpidlog ==> process 3416 launched child process 3720 INFO | automation.py | Checking for orphan process with PID: 3720 PROCESS-CRASH | /tests/content/media/test/test_timeupdate_small_files.html | application crashed (minidump found)
Whiteboard: [orange]
Looks like the crash was fixed.
Summary: Intermittent test_timeupdate_small_files.html | Test timed out. followed by crash → Intermittent test_timeupdate_small_files.html | Test timed out.
CC'ing test-author cpearce. cpearce: In the past few days (starting 11/24-25), this test began randomly failing > 10 times per day, which (I think) puts it on track to be one of the most painful current randomoranges. Any chance you could look into this? (possibly adding instrumentation or something to help figure out what's going on)
The increase in frequency in this test failing coincides with the landing of bug 703379. Roc, any ideas?
Yup, I was just about to say that exact same thing. :) More details: === Comment 2 (where this got bad) was right after this media-related m-i push (1 push later): http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=09d5d6e6dd28 (and I don't see any media-related changes in the day or so of pushlog before that). Moreover, Comment 3 was *on* the mi --> m-c merge that included that push (09d5d6e6dd28). So I suspect that something from the 09d5d6e6dd28 push was involved in making this go highly-randomorange. Likely bug 703379, based on media-relatedness. Tentatively marking as blocking bug 703379 (for the recent spike from comment 2 onwards). ===
Blocks: 703379
It's a bit confusing. The test logs all contain the lines 77941 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - should see at least one timeupdate: 0.266664 77947 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - should see at least one timeupdate: 1 77950 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - should see at least one timeupdate: 0.276085 which indicates that those three media elements ended. So why didn't MediaTestManager spawn off the rest of the tests?
I retriggered a bunch of M1 on the push before Roc, and all of those are green so far. https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=a4492c6d02b0 I also triggered a Win PGO build on Roc's push, and waiting for it to finish to trigger some M1 there, as well. That should give some hint whether that changeset is involved. https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=74fed050ea42
I think my retriggers leave no doubts that Roc's push is involved in the random orange increase, even if indirectly: all green on one, all orange on the other one.
It looks like this mostly (only?) happens on PGO or opt builds, and given roc's comment 42 I'm suspcious of bug in the JS array code (possibly caused by compiler optimizations, we had a similar problem before causing media test orange). So this patch adds some logging to verify the MediaTestManager's array storage is correctly maintained.
Attachment #577389 - Flags: review?(roc)
Attachment #577389 - Flags: review?(roc) → review+
Landed verification for MediaTestManager. Note this won't stop the orange! https://hg.mozilla.org/integration/mozilla-inbound/rev/26597b0e3681
There seems to be a new issue here. This seems to be showing up way more often on recent builds.
(In reply to Bill Gianopoulos from comment #74) > There seems to be a new issue here. This seems to be showing up way more > often on recent builds. see comment 41, comment 42 and comment 60.
(In reply to TinderboxPushlog Robot from comment #78) > philor > https://tbpl.mozilla.org/php/getParsedLog.php?id=7621836&tree=Mozilla-Inbound > Rev3 WINNT 5.1 mozilla-inbound opt test mochitests-1/5 on 2011-11-28 17:49:26 > > 78857 ERROR TEST-UNEXPECTED-FAIL | > /tests/content/media/test/test_timeupdate_small_files.html | Test timed out. So this includes: 78835 INFO TEST-START | /tests/content/media/test/test_timeupdate_small_files.html 78836 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | Started Mon Nov 28 2011 18:02:17 GMT-0800 (Pacific Standard Time) (1322532137.548s) 78837 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [started] Length of array should match number of running tests - 1 should equal 1 78838 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [started] Length of array should match number of running tests - 2 should equal 2 78839 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - shouldn't get timeupdate after ended 78840 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended 78841 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - shouldn't get timeupdate after ended 78842 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - should see at least one timeupdate: 0.266664 78843 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended 78844 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended 78845 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [finished] Length of array should match number of running tests - 1 should equal 1 78846 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [started] Length of array should match number of running tests - 2 should equal 2 78847 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended 78848 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - shouldn't get timeupdate after ended 78849 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended 78850 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - should see at least one timeupdate: 1 78851 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - shouldn't get timeupdate after ended 78852 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - shouldn't get timeupdate after ended 78853 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | small-shot.ogg - should see at least one timeupdate: 0.276085 78854 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [finished] Length of array should match number of running tests - 1 should equal 1 78855 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [started] Length of array should match number of running tests - 2 should equal 2 78856 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [finished] Length of array should match number of running tests - 1 should equal 1 78857 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_timeupdate_small_files.html | Test timed out. So that looks like we're starting test_timeupate_small_files for seek.webm, but not receiving ended or timeupdate events for that media for some reason.
Attached patch probable fix (obsolete) — — Splinter Review
This bug could have manifested before my patch, if one somehow happened to load a resource over HTTP (thus through the media cache) but really really fast. So hopefully this will fix the orange that existed before my patch. I still can't reproduce so we'll have to check this in and hope it fixed tinderbox.
Assignee: nobody → roc
Attachment #577736 - Flags: review?(chris)
Attached patch use mBegun instead — — Splinter Review
Attachment #577736 - Attachment is obsolete: true
Attachment #577736 - Flags: review?(chris)
(In reply to TinderboxPushlog Robot from comment #114) > chris%pearce.org.nz > https://tbpl.mozilla.org/php/getParsedLog.php?id=7645541&tree=Mozilla-Inbound > Rev3 WINNT 5.1 mozilla-inbound opt test mochitests-1/5 on 2011-11-29 16:36:20 > > 78946 ERROR TEST-UNEXPECTED-FAIL | > /tests/content/media/test/test_timeupdate_small_files.html | Test timed out. Gah, we didn't fix it.
Attached patch try 2 — — Splinter Review
Attachment #577844 - Flags: review?(chris)
Note: if at least one stream for the resource is not suspended, then in the future the decoder should receive either a NotifyBytesDownloaded or NotifySuspendedStatusChanged notification, giving us a chance to fire canplaythrough then.
Attachment #577844 - Flags: review?(chris) → review+
This patch didn't seem to fix it either :-(. Pushed a little bit more logging: https://hg.mozilla.org/integration/mozilla-inbound/rev/72cb519c581a
Attached patch try 3 — — Splinter Review
Maybe this'll fix it :-)
Attachment #578101 - Flags: review?(chris)
Attachment #578101 - Flags: review?(chris) → review+
Looks fixed. Four cycles with no test failure. Pushed a patch to remove the logging code. https://hg.mozilla.org/integration/mozilla-inbound/rev/a6db0c6c94a3 At the next inbound merge to central, if there are still no test failures, this can be closed.
Not sure about all of those, but most of those were after 4679da64ba71
Yes. I'll reland the logging code.
Actually Brad Lassey already backed it out. So we just have to wait to get more logs.
8519 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got loadedmetadata 78520 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got loadeddata 78521 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got canplay *** nsBuiltinDecoder::NotifySuspendedStatusChanged(0x30303920), suspended=0 78522 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got progress 78523 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got suspend 78524 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | [finished small-shot.ogg-2] Length of array should match number of running tests - 1 should equal 1 78525 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/small-shot.ogg got timeupdate 78526 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/small-shot.ogg got pause 78527 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got progress 78528 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_timeupdate_small_files.html | Test timed out. Need more logging...
The latest logging patch (09a0517358d9, 7b22bc7c9847) seems to have introduced a frequent leak in Linux debug crashtests (bug 706883).
Depends on: 706883
Even more logging. https://hg.mozilla.org/integration/mozilla-inbound/rev/828c928eae5b (In reply to Matt Brubeck (:mbrubeck) from comment #189) > The latest logging patch (09a0517358d9, 7b22bc7c9847) seems to have > introduced a frequent leak in Linux debug crashtests (bug 706883). That seems .... surprising. These will get backed out soon hopefully anyway.
Attached patch try 4 — — Splinter Review
I spotted a bug in the previous patch.
Attachment #578489 - Flags: review?(chris.double)
Comment on attachment 578489 [details] [diff] [review] try 4 Just nesting nits. >+ for (PRUint32 i = 0; i < mStreams.Length(); ++i) { >+ nsMediaCacheStream* stream = mStreams[i]; >+ nsresult rv; >+ switch (actions[i]) { Nesting of 'nsresult rv' line needs adjusting. > default: >+ rv = NS_OK; > break; > } Nesting of 'rv' line.
Attachment #578489 - Flags: review?(chris.double) → review+
Frequency seems to have dropped considerably but there's still one instance of this after my latest landing :-(.
I'll start pushing more logging to try.
Attached patch try 5 — — Splinter Review
Found another bug. Not sure whether this is it, but the code change should be correct.
Attachment #581025 - Flags: review?(chris)
Attachment #581025 - Flags: review?(chris) → review+
Attached patch remove logging — — Splinter Review
Patch to remove the logging code added for this bug.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [orange][inbound] → [orange]
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch remove more logging — — Splinter Review
Attachment #593685 - Flags: review?(roc)
Attachment #593685 - Flags: review?(roc) → review+
Whiteboard: [orange] → [orange][leave open after merge]
Whiteboard: [orange][leave open after merge] → [orange]
Summary: Intermittent test_timeupdate_small_files.html | Test timed out. → Intermittent test_timeupdate_small_files.html | Test timed out, often followed by a shutdown timeout and crash
Ftr/Fwiw, (In reply to TinderboxPushlog Robot from comment #623) > https://tbpl.mozilla.org/php/getParsedLog.php?id=10589787&tree=Try > Rev3 MacOSX Leopard 10.5.8 try opt test mochitests-1/5 on 2012-04-02 23:25:32 (In reply to TinderboxPushlog Robot from comment #625) > https://tbpl.mozilla.org/php/getParsedLog.php?id=10606048&tree=Mozilla- > Inbound > Rev3 MacOSX Leopard 10.5.8 mozilla-inbound opt test mochitests-1/5 on > 2012-04-03 11:46:35 (In reply to TinderboxPushlog Robot from comment #628) > https://tbpl.mozilla.org/php/getParsedLog.php?id=10631006&tree=Profiling > Rev4 MacOSX Snow Leopard 10.6 profiling opt test mochitests-1/5 on > 2012-04-04 03:39:59 These screenshots (for example) were taken too late: bug 637101. (In reply to TinderboxPushlog Robot from comment #629) > https://tbpl.mozilla.org/php/getParsedLog.php?id=10643302&tree=Try > Rev3 MacOSX Leopard 10.5.8 try opt test mochitests-1/5 on 2012-04-04 12:28:58 This screenshot (with bug 637101 patch Bv1) is correct.
Depends on: 637101
test_timeupdate_small_files.html | Test timed out
No longer blocks: 703379
Blocks: 703379
https://tbpl.mozilla.org/php/getParsedLog.php?id=12815219&full=1&branch=mozilla-inbound#error0 This log shows that we fail to ever get an "ended" event for seek.webm. These are the events we get: 96664 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got loadstart 96665 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/detodos.opus got loadstart 96666 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got progress 96667 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/detodos.opus got progress 96668 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got loadedmetadata 96669 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got loadeddata 96670 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got canplay 96671 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got play 96672 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got playing 96673 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got canplaythrough 96674 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got progress 96675 INFO TEST-INFO | /tests/content/media/test/test_timeupdate_small_files.html | http://mochi.test:8888/tests/content/media/test/seek.webm got suspend ... then loads of timeupdates, and then it stops. We do print: nsBuiltinDecoderStateMachine::RunStateMachine queuing nsBuiltinDecoder::PlaybackEnded But I think that is probably during shutdown, after the test has already timed out. I thought maybe we are running out of data and unable to play through to the end of the resource, but we never fire the "waiting" event (according to the logging), which I would have expected to see when we start trying to buffer. I don't understand how we can just get a series of timeupdate events and then ... nothing.
Whiteboard: [orange] → [orange][test disabled on linux][leave open]
Target Milestone: mozilla11 → ---
Blocks: 780492
This was fixed on mozilla-central at 2012-08-18 00:26:45 NZST by bug 761917. This test has now been re-enabled: https://hg.mozilla.org/integration/mozilla-inbound/rev/7a9ef73422de Closing this bug, bug 780490 will track merging the changeset to mozilla-central.
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
Whiteboard: [orange][test disabled on linux][leave open] → [orange]
Whiteboard: [orange]
See Also: → 1388612
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: