Open Bug 1305136 Opened 9 years ago Updated 3 years ago

Intermittent dom/media/webaudio/test/test_audioContextSuspendResumeClose.html | Test timed out.

Categories

(Core :: Web Audio, defect, P5)

defect

Tracking

()

Tracking Status
firefox52 --- wontfix
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- disabled
firefox55 --- disabled

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, leave-open, Whiteboard: [stockwell disabled])

Attachments

(1 file, 2 obsolete files)

Rank: 35
Priority: -- → P3
:mreavy, this bug has increased in frequency around Feb 8th/9th, it looks to be primarily on linux- can we get someone to look into this sometime this week. Our goal is for all high frequency bugs to be resolved within 2 weeks (in the case end of the month)
Flags: needinfo?(mreavy)
Paul, Dan -- Looks like something landed in the last 2 weeks to bump the frequency of this. Thoughts? Dan -- Can I ask you to take this one? (At least to get the frequency back to where it was -- ideally I'd love to fix the root problem, but I'm not asking for that.)
Assignee: nobody → dminor
Rank: 35 → 21
Flags: needinfo?(padenot)
Flags: needinfo?(mreavy)
Flags: needinfo?(dminor)
Priority: P3 → P2
I don't know off hand what happened in this period. I would turn on MSG logging on (I think you need to modify part of the harness to do that). I recently landed a patch that makes MSG logging more useful. Look for GraphDriver messages and other driver things. Maybe add logging in `ApplyAudioCOntextOperation{Impl,Completed,}.
Flags: needinfo?(padenot)
Will do.
Flags: needinfo?(dminor)
All failures for the past two weeks have been here [1], which is inside testScriptProcessNodeSuspended, which tests that a ScriptProcessorNode does not receive events when it's context is suspended. I'll spend some time to try to identify a root cause for this failure, but ScriptProcessorNode is a deprecated feature of Web Audio and so at some point it might make more sense to remove this particular test case than to sink a bunch of time into a fix. That assumes, of course, that the problem doesn't just move to another test case once this one is removed. [1] http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/dom/media/webaudio/test/test_audioContextSuspendResumeClose.html#234
Actually no, when I had a closer look at the log, there are multiple failures, they just aren't showing up in the Treeherder summary.
I realize my credibility might be a little reduced after Comment 16, but it appears the problem is actually here [1]. I ran it on try with full logging here [2] and I'm only seeing "0 Offline" in the log. A successful run would contain both "0 Offline" and "O Realtime". I added some logging for finish() calls and it looks like we're still waiting on one last call to finish() when the test times out. [1] http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/dom/media/webaudio/test/test_audioContextSuspendResumeClose.html#77 [2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=71b7fe75f8337cf4fd9bd0cac2f4c53540196a5a&selectedJob=77331849
FWIW, the spike on Trunk looks like it has a pretty clear correlation to when bug 1336098 landed.
Blocks: 1336098
Flags: needinfo?(padenot)
Much to my annoyance, it's still a low-frequency intermittent on Beta (but not Aurora). Looks like something fixed it in mid/late-November until the recent spike. Anyway, I'm probably just going to disable this test on 52 soon since we're running out of time to backport any real fixes at this point.
There are a number of other media failures that stopped on trunk in mid-November but live on in 52. See my comment in bug 1293170 and some others.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #22) > FWIW, the spike on Trunk looks like it has a pretty clear correlation to > when bug 1336098 landed. The only reasonable explanation for that correlation is timing changes due to the added test, increasing exposure of a latent bug.
(In reply to Dan Minor [:dminor] from comment #19) > I realize my credibility might be a little reduced after Comment 16, but it > appears the problem is actually here [1]. I ran it on try with full logging > here [2] and I'm only seeing "0 Offline" in the log. A successful run would > contain both "0 Offline" and "O Realtime". I added some logging for finish() > calls and it looks like we're still waiting on one last call to finish() > when the test times out. > > [1] > http://searchfox.org/mozilla-central/rev/ > d3307f19d5dac31d7d36fc206b00b686de82eee4/dom/media/webaudio/test/ > test_audioContextSuspendResumeClose.html#77 > [2] > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=71b7fe75f8337cf4fd9bd0cac2f4c53540196a5a&selectedJob=7 > 7331849 Have you had a chance to log in the functions that deal with the promises, so we have the log from the test interleaved with the log from gecko ?
Flags: needinfo?(padenot) → needinfo?(dminor)
Flags: needinfo?(dminor)
It looks like the extra logging may be causing this to not reproduce. Here's a try run with the level reduced from 5 to 4: https://treeherder.mozilla.org/#/jobs?repo=try&revision=26dc18abfa4df7c4bb1d308597277afefc0878b0
Here's a try run with some simple fprintf logging (look for '>>>>') that reproduces which makes it look the like the decode never even started for the normal AudioContext case: https://treeherder.mozilla.org/#/jobs?repo=try&author=dminor@mozilla.com&selectedJob=79134145 I tried adding some extra "todos" to the loadFile function in this try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d29386a3d674cfb8dd91af6088ec95af3fd27300 to see if we were actually starting the decode process, but that seems to make this never reproduce. This might be a workaround if we were confident this was a test harness issue and not a real bug, but I'm not sure that is the case here. Here's a try run with MediaStreamGraph logging enabled that reproduces: https://treeherder.mozilla.org/#/jobs?repo=try&author=dminor@mozilla.com&selectedJob=79094920. Unfortunately, the log is not interleaved with the test execution or timestamped. I'll try diffing against a successful run to see if anything shows up.
Do we decode this with MediaDecoder? Have you tried adding logs there?
Attached patch Add extra "todos" (obsolete) — — Splinter Review
Based on this try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=eca47218b9740defcb8677e8468270318ef8f069 it looks like adding these extra "todos" might reduce the frequency of this intermittent while we investigate the root cause.
Attachment #8840068 - Flags: review?(padenot)
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #33) > Do we decode this with MediaDecoder? Have you tried adding logs there? I've been adding logs to the MediaDecodeTask stuff in MediaBufferDecoder.cpp, but so far it seems like either it all succeeds or it doesn't even start, so maybe the problem is before that point. Suggestions welcome :)
Attachment #8840068 - Flags: review?(padenot) → review+
Keywords: leave-open
Pushed by dminor@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7d15bc99dbf1 Add additional status messages to loadFile function; r=padenot
In this run [1], the last statement in the log is produced here [2] which makes it look like the ac.resume() just below is not completing. I've kicked off another try run with the MediaStreamGraph logs mixed in with the test output rather than in a separate file here [3] but so far no luck with getting it to reproduce with the extra logs in place. [1] https://treeherder.mozilla.org/#/jobs?repo=try&author=dminor@mozilla.com&selectedJob=79646925 [2] http://searchfox.org/mozilla-central/source/dom/media/webaudio/test/test_audioContextSuspendResumeClose.html#276 [3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=0af8b59e282d7f78a1b4cebf3ebd191b34100e97
Whiteboard: [stockwell needswork]
Attached patch Disable the test on Linux only (obsolete) — — Splinter Review
I discussed this with Maire and we thought it made sense to disable this test on Linux only as that seems to be the only place where we are running into trouble. I've had no luck getting this to reproduce locally or on Try with any sort of logging enabled.
Attachment #8840068 - Attachment is obsolete: true
Attachment #8842524 - Flags: review?(karlt)
(In reply to Dan Minor [:dminor] from comment #39) > In this run [1], the last statement in the log is produced here [2] which > makes it look like the ac.resume() just below is not completing. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=eaae566d5f462a4f44e29f8dc3d56f967cf547b5&selectedJob=79646925 [2] http://searchfox.org/mozilla-central/rev/31b6089ce26fa76459642765115605d50a6c67b4/dom/media/webaudio/test/test_audioContextSuspendResumeClose.html#276
Comment on attachment 8842524 [details] [diff] [review] Disable the test on Linux only Are all the subtests having intermittent timeouts? Can you point to evidence of this, or can logging be pushed to m-c to more clearly identify the remaining subtests before timeout? Is there a reason why all these tests run concurrently? Would it be reasonable to move the tests that are timing out to a different file? If not, I suggest removing only the problem subtests from the tests array when running on Linux.
Flags: needinfo?(dminor)
Attachment #8842524 - Flags: review?(karlt)
Depends on: 1285290
All of the failures I've seen are in the "testAudioContext" test case, and on Linux only, so I think it makes sense to disable just that test on Linux for now.
Attachment #8842524 - Attachment is obsolete: true
Flags: needinfo?(dminor)
Attachment #8843916 - Flags: review?(karlt)
Comment on attachment 8843916 [details] [diff] [review] Disable the testAudioContext test case on Linux only Thanks! I agree that there is no obvious indication that the order of the tests should matter (from a quick skim through).
Attachment #8843916 - Flags: review?(karlt) → review+
Whiteboard: [stockwell needswork] → [stockwell disabled]
Pushed by dminor@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0060ab5eefa8 Disable testAudioContext test in audioContextSuspendResumeClose.html on Linux; r=karlt
It looks like the patch to disable the test never made it to Aurora. It should be fine to land this a=test-only.
Keywords: checkin-needed
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
Bulk priority update of open intermittent test failure bugs. P3 => P5 https://bugzilla.mozilla.org/show_bug.cgi?id=1381960
Priority: P3 → P5
Unassigning myself, this continues to occur at low frequency on esr-52.
Assignee: dminor → nobody
Depends on: 1644647
No longer depends on: 1285290
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: