Closed Bug 916135 Opened 11 years ago Closed 10 years ago

Investigate webaudio mochitest timeouts on B2G

Categories

(Core :: Web Audio, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: emorley, Assigned: jwwang)

References

Details

Attachments

(3 files, 6 obsolete files)

The webaudio tests are currently extremely unreliable on B2G (see bug 820739 comment 787).

Disabling until the failures are resolved.
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/7e2551feeddd
Whiteboard: [test disabled on B2G][leave open]
Bug 916773 may be at least part of the problem here.
Depends on: 916773
(In reply to comment #3)
> Bug 916773 may be at least part of the problem here.

Should we re-enable these tests now that bug 916773 is fixed?
How about enabling them on Try with 20+ retriggers and then we can decide.
Oh shoot, this is mochitest-3, not crashtest.  :(
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #8)
> https://tbpl.mozilla.org/?tree=Try&rev=b1a2dc56e30b

lulz
Alright, I give up.  jwwang, any chance you could take a look at this please?
Flags: needinfo?(jwwang)
Depends on: 904839
Depends on: 919353
Sure, I am on it.
Flags: needinfo?(jwwang)
command timed out: 1200 seconds without output, attempting to kill
Bug 820739 - Intermittent B2G emulator "command timed out: 1200 seconds without output, attempting to kill" or "command timed out: 3600 seconds without output, attempting to kill" during run-marionette or b2g_emulator_unittest step
Bug 820811 - Intermittent Android "command timed out: 1200 seconds without output, attempting to kill" during "Install App on Device" step
Bug 873053 - Intermittent B2G xpcshell command timed out: 1200 seconds without output, attempting to kill from "AttributeError: 'NoneType' object has no attribute 'pid'"

Btw, what are the lines following the "command timed out: 1200" line in the light blue box?
I don't see them in the full log. Do they even matter?
(In reply to jwwang from comment #12)
> Btw, what are the lines following the "command timed out: 1200" line in the
> light blue box?
> I don't see them in the full log. Do they even matter?

Anything that's a link (and indented) in the TBPL blue box is a bug suggestion by TBPL, to help match up known intermittent failures with failures in automation. They are just bugzilla search results based on scraping the log for failure lines.
https://tbpl.mozilla.org/?tree=Try&rev=1932306fae09
20 successes in a row.

As content/media/test/* tests are disabled in my test, I wonder if there are some leftovers after running content/media/test which cause deadlock or something likewise in running content/media/webaudio/test.

Btw, test_audioBufferSourceNodeLazyLoopParam.html, test_audioBufferSourceNodeLoopStartEnd.html and test_audioBufferSourceNodeLoop.html are not found in my pushlog. Are they disabled elsewhere other than testing/mochitest/b2g.json?

Is it possible to change the order of tests? I would like to see how it plays out if we run content/media/webaudio/test first and then content/media/test.
(In reply to comment #14)
> https://tbpl.mozilla.org/?tree=Try&rev=1932306fae09
> 20 successes in a row.
> 
> As content/media/test/* tests are disabled in my test, I wonder if there are
> some leftovers after running content/media/test which cause deadlock or
> something likewise in running content/media/webaudio/test.

That's possible.

> Btw, test_audioBufferSourceNodeLazyLoopParam.html,
> test_audioBufferSourceNodeLoopStartEnd.html and
> test_audioBufferSourceNodeLoop.html are not found in my pushlog. Are they
> disabled elsewhere other than testing/mochitest/b2g.json?
> 
> Is it possible to change the order of tests? I would like to see how it plays
> out if we run content/media/webaudio/test first and then content/media/test.

You can try moving directories around.  AFAIK we run the tests based on the alphabetical order of their names (sub-directories first.)
https://tbpl.mozilla.org/?tree=Try&rev=5a93c866e2d2
21 runs, each with 5 duplicates of content/media/webaudio/test (webaudio/test2~6).

2 of them timed out in convolverNode (is this a known issuue?), but none of them exhibit the same symptom as the previous command timed out.

I really suspect the command timed out is resulted from content/media/test. I will try to duplicate content/media/test and see if it will show the same results.
Looks like test_convolverNodeChannelCount.html and subsequent tests are timing out waiting for the online AudioContext, which makes that a different issue from bug 921197, but perhaps related to bug 920338 (which is also unexplained).
Blocks: 896587
https://tbpl.mozilla.org/?tree=Try&rev=24ebdb331c6d
https://tbpl.mozilla.org/?tree=Try&rev=59dbdeddd772
where content/media/test/test_timeupdate_small_files.html and content/media/test/test_video_to_canvas.html are disabled.

https://tbpl.mozilla.org/?tree=Try&rev=71c827590484
https://tbpl.mozilla.org/?tree=Try&rev=67947612ee00
where content/media/test/test_timeupdate_small_files.html and content/media/test/test_video_to_canvas.html are enabled.

It looks like disabling the 2 files would help reduce the chance of command timeout right after test_preload_actions.html. However, test_preload_actions.html is run before the 2 files. I am confused about how files run later can affect the former one.

Btw, there is no timeout in webaudio test cases in my test. Once bug 919353 is landed, we will have stack trace to give a better clue.
https://tbpl.mozilla.org/?tree=Try&rev=e3e36c223293
66 successes in a row except 2 timeouts caused by app crashed.

There are several root causes here:
1. Only 1 omx.h264 codec available concurrently. Request for the 2nd one has to wait indefinitely until the 1st one is released. You might have to wait for a long time until GC is triggered since this is not like memory low condition.
2. Media elements not receiving loadedmetadata event in some test cases. The reason is unknown.
3. Implementation of OMXCodec::pause is broken. Calling OMXCodec::start later will return error and can't leave the paused state.

Fixes:
1. Setting the src attribute to an empty string to force releasing the underlying codec.
2. Setting the preload attribute to "auto".
3. Don't call OMXCodec::pause in OmxDecoder::Pause. However, this will break the fix for Bug 860760 and Bug 872462.
Priority: -- → P2
Attached patch Enable tests in B2G. (obsolete) — Splinter Review
Attachment #8348562 - Flags: review?(karlt)
JW, karlt is away until 3rd of January, maybe you want to redirect those review requests.
Flags: needinfo?(jwwang)
Attachment #8348555 - Flags: review?(karlt) → review?(paul)
Flags: needinfo?(jwwang)
Attachment #8348559 - Flags: review?(karlt) → review?(paul)
Attachment #8348562 - Flags: review?(karlt) → review?(paul)
Comment on attachment 8348555 [details] [diff] [review]
Fix timeout in acquiring omx codec and waiting for loadedmetadata event.

Review of attachment 8348555 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not sure if removing event listener matters that much. Does that keep the element alive?

::: content/media/test/manifest.js
@@ +561,5 @@
>  }
>  
> +function removeNodeAndSource(n) {
> +  if (n.parentNode) {
> +    n.parentNode.removeChild(n);

n.remove()

@@ +563,5 @@
> +function removeNodeAndSource(n) {
> +  if (n.parentNode) {
> +    n.parentNode.removeChild(n);
> +  }
> +  // force release of underlying codec

I'd rather say "decoder" than codec, here.

@@ +567,5 @@
> +  // force release of underlying codec
> +  n.src = "";
> +  while (n.firstChild) {
> +    n.removeChild(n.firstChild);
> +  }

I'm not sure why this is necessary. Even if the node has <source> children, they won't load until someone calls load() on the media element.

Also, if you remove a node, its children will get collected anyway (assuming we don't have a reference to the children, which should be fine).

::: content/media/test/reactivate_helper.html
@@ +21,5 @@
>    var test = parent.gSmallTests[i];
>    var elemType = /^audio/.test(test.type) ? "audio" : "video";
>    // Associate these elements with the subframe's document
>    var e = document.createElement(elemType);
> +  e.setAttribute("preload", "auto");

e.preload = "auto";

I see you've used this pattern a lot in this patch, please fix all occurrences.

::: content/media/test/test_bug654550.html
@@ +17,5 @@
>  
>    /* Test for Bug 654550 */
>  
> +  // Parallel test must be disabled for media.video_stats.enabled is a global setting
> +  // to prevent the setting from chaning unexpectedly in the middle of the test.

s/chaning/changing/
Comment on attachment 8348559 [details] [diff] [review]
Don't call OMXCodec::pause to avoid state error.

Review of attachment 8348559 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not really knowledgeable on this code, can you have a look, Chris?

::: content/media/omx/OmxDecoder.cpp
@@ +461,5 @@
>    }
>    return false;
>  }
>  
> +static bool isInEmulator()

s/isInEmulator/IsInEmulator/
Attachment #8348559 - Flags: review?(paul) → review?(chris.double)
Comment on attachment 8348562 [details] [diff] [review]
Enable tests in B2G.

Review of attachment 8348562 [details] [diff] [review]:
-----------------------------------------------------------------

Looking at the try push, r+.
Attachment #8348562 - Flags: review?(paul) → review+
Comment on attachment 8348559 [details] [diff] [review]
Don't call OMXCodec::pause to avoid state error.

Review of attachment 8348559 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with Paul Adenot's change.
Attachment #8348559 - Flags: review?(chris.double) → review+
Comment on attachment 8348555 [details] [diff] [review]
Fix timeout in acquiring omx codec and waiting for loadedmetadata event.

Review of attachment 8348555 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/test/manifest.js
@@ +561,5 @@
>  }
>  
> +function removeNodeAndSource(n) {
> +  if (n.parentNode) {
> +    n.parentNode.removeChild(n);

https://developer.mozilla.org/en-US/docs/Web/API/Node
I can't find the method in the DOM API. Where is it documented?

@@ +567,5 @@
> +  // force release of underlying codec
> +  n.src = "";
> +  while (n.firstChild) {
> +    n.removeChild(n.firstChild);
> +  }

The idea is to force releasing underlying decoder. So we have to clean the "src" attribute and all <source> child nodes.

Since it is not the memory low case, we might wait indefinitely for garbage collection. Therefore, I try to empty the "src" attribute and all <source> child nodes to force releasing the decoder without waiting for garbage collection to kick in.

::: content/media/test/reactivate_helper.html
@@ +21,5 @@
>    var test = parent.gSmallTests[i];
>    var elemType = /^audio/.test(test.type) ? "audio" : "video";
>    // Associate these elements with the subframe's document
>    var e = document.createElement(elemType);
> +  e.setAttribute("preload", "auto");

Will do.
Attachment #8348555 - Attachment is obsolete: true
Attachment #8348555 - Flags: review?(paul)
Attachment #8355972 - Flags: review?(paul)
(In reply to Paul Adenot (:padenot) from comment #25)
> Comment on attachment 8348555 [details] [diff] [review]
> Fix timeout in acquiring omx codec and waiting for loadedmetadata event.
> 
> Review of attachment 8348555 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm not sure if removing event listener matters that much. Does that keep
> the element alive?

Changing the "src" attribute will trigger events. We have to remove event listeners to prevent unexpected events that cause tests to fail.
Assign to JW since he is currently working on this fix
Assignee: nobody → jwwang
(In reply to JW Wang[:jwwang] from comment #29)
> Comment on attachment 8348555 [details] [diff] [review]
> Fix timeout in acquiring omx codec and waiting for loadedmetadata event.
> 
> Review of attachment 8348555 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: content/media/test/manifest.js
> @@ +561,5 @@
> >  }
> >  
> > +function removeNodeAndSource(n) {
> > +  if (n.parentNode) {
> > +    n.parentNode.removeChild(n);
> 
> https://developer.mozilla.org/en-US/docs/Web/API/Node
> I can't find the method in the DOM API. Where is it documented?

http://dxr.mozilla.org/mozilla-central/source/dom/interfaces/core/nsIDOMElement.idl#81
http://dom.spec.whatwg.org/#dfnReturnLink-0
Attachment #8355972 - Flags: review?(paul) → review+
Keywords: checkin-needed
And every other platform.
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #36)
> Backed out for Android mochitest orange.
> https://hg.mozilla.org/integration/b2g-inbound/rev/bcba9e0700e8
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=32639777&tree=B2g-Inbound

The failure is not about timeout. Should I also disable the tests to keep green?
It would be better if you understood why the test started failing before blindly disabling it. After all, tests are there to catch regressions :)
Attached patch Enable tests in B2G. (obsolete) — Splinter Review
Attachment #8348562 - Attachment is obsolete: true
Attachment #8357044 - Flags: review+
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #40)
> It would be better if you understood why the test started failing before
> blindly disabling it. After all, tests are there to catch regressions :)

https://tbpl.mozilla.org/?tree=Try&rev=082d10686313

Disable test_invalid_reject.html and test_load_same_resource.html and file bugs for them. Since this patch enables more tests than what it disables, we can still benefit from catching more regressions. For the failed tests, I would like to deal with them in other bugs.
Keywords: checkin-needed
Depends on: 957443
Comment on attachment 8357044 [details] [diff] [review]
Enable tests in B2G.

The test was failing across all platforms. This will only disable it on B2G.

Paul - Are you OK with disabling these tests on all platforms so the B2G tests can be re-enabled?
Attachment #8357044 - Flags: review+ → review-
Flags: needinfo?(paul)
If JW promises to fix the two disabled tests, we can land this, yes.
Flags: needinfo?(paul)
Sorry, I apparently misread comment 43. This can't land as-is, as it break tests on all platforms, whereas it should only affect b2g, which shows that we are clearly missing something here.
No longer depends on: 957443
Fix test failures caused by event handlers (onerror) not removed before calling removeNodeAndSource().
Attachment #8356384 - Attachment is obsolete: true
Attachment #8361571 - Flags: review?(padenot)
Attached patch Enable tests in B2G. (obsolete) — Splinter Review
Attachment #8357044 - Attachment is obsolete: true
Attachment #8361572 - Flags: review?(padenot)
There was an intermittent failure on one of the retriggers:
https://tbpl.mozilla.org/php/getParsedLog.php?id=33141611&tree=Try

{
891 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_progress.html | Test timed out.
913 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_reactivate.html | Test timed out.
1058 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_replay_metadata.html | Test timed out.
1159 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_seek_out_of_range.html | Test timed out.
1160 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
}
I noticed that too. It was suspicious the 4 failures were temporally close to each other. There might be some other root causes for the timeouts. Given the failure was one in the hundred, I would like to check in the patches first to enable webaudio tests again and then trace other media test timeouts in other bugs so that we can make progress.
Comment on attachment 8361571 [details] [diff] [review]
Fix timeout in acquiring omx codec and waiting for loadedmetadata event.

Review of attachment 8361571 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if you investigate the error.
Attachment #8361571 - Flags: review?(paul) → review+
Attachment #8361572 - Flags: review?(paul) → review+
Keywords: checkin-needed
https://tbpl.mozilla.org/?tree=Try&rev=4f6739947691
It looks like test_mediarecorder_creation_fail.html didn't show up in my push log (Neither in mochitest-2 or 3 on B2G ICS Emulator Opt). Did I miss something to include the test?
It landed in bug 926684 on 10-Jan, so maybe your parent rev was just too old to include it?
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #54)
> Backed out for B2G mochitest-2 failures.
> https://hg.mozilla.org/integration/b2g-inbound/rev/1a95ee2442c8
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=33344919&tree=B2g-Inbound

The failure existed on b2g without my patch. It looks like SpecialPowers.setBoolPref is async and it fails to disable Ogg in time when recorder starts.

Since the whole content/media/test/* and content/media/webaudio/test/* are disabled on b2g, there are some new tests which fail on b2g on their own not caught.

I will rebase my patch, disable the test on b2g, and run tests on try again. I would like to check in this patch as soon as possible in case there are more test failures not caught on b2g. Or is it preferred to fix the failures before this patch can check in?
I see no problem with disabling individual failing tests (on B2G *only*) in order to get the rest re-enabled.
Disable content/media/test/test_mediarecorder_creation_fail.html which fails on B2G on its own without these patches. I will file another bug to fix the failure.
Attachment #8361572 - Attachment is obsolete: true
Attachment #8363615 - Flags: review+
https://tbpl.mozilla.org/?tree=Try&rev=7782e7b3b910
all green except one orange which is not related to media/webaudio.
Parent rev. = 8f4ecbf938cd on 1/21.

Hi Ryan, can you try to land the patch again? Thanks.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/e4a5cc40929c
https://hg.mozilla.org/mozilla-central/rev/21c8ab0245dc
https://hg.mozilla.org/mozilla-central/rev/65a5026da078

*cues up the Hallelujah chorus* :)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
FYI - I filed a bunch of followups that I noticed from this bug in these bugs:

* bug 962871 - get these tests running on debug emulators
* bug 962874 - get these tests running on b2g desktop builds
* bug 962878 - fix test_mediarecord_4ch_audiocontext.html
* bug 962883 - fix test_mediarecorder_creation_fail.html
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: