Intermittent test_bfcache.html | Test timed out

RESOLVED FIXED in Firefox 48

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: nigelb, Assigned: eeejay)

Tracking

({intermittent-failure})

unspecified
mozilla48
Points:
---

Firefox Tracking Flags

(firefox45 unaffected, firefox46 disabled, firefox47 disabled, firefox48 fixed, firefox-esr45 unaffected)

Details

(URL)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

3 years ago
https://treeherder.mozilla.org/logviewer.html#?job_id=19349166&repo=mozilla-inbound


 21:33:21     INFO -  [Parent 2131] WARNING: Audio Buffer is not full by the end of the callback.: 'Available() == 0 || mSampleWriteOffset == 0', file /builds/slave/m-in-l64-d-0000000000000000000/build/src/dom/media/AudioBufferUtils.h, line 87
 21:33:28     INFO -  --DOMWINDOW == 16 (0x7f8345f88400) [pid = 2131] [serial = 8] [outer = (nil)] [url = about:blank]
 21:33:28     INFO -  --DOMWINDOW == 15 (0x7f8346bd7c00) [pid = 2131] [serial = 12] [outer = (nil)] [url = about:blank]
 21:33:28     INFO -  --DOMWINDOW == 14 (0x7f8345d09400) [pid = 2131] [serial = 9] [outer = (nil)] [url = about:blank]
 21:33:28     INFO -  --DOMWINDOW == 13 (0x7f8356d7d000) [pid = 2131] [serial = 2] [outer = (nil)] [url = about:blank]
 21:33:28     INFO -  --DOMWINDOW == 12 (0x7f833eb74400) [pid = 2131] [serial = 14] [outer = (nil)] [url = about:blank]
 21:33:32     INFO -  --DOMWINDOW == 10 (0x7f738aa18000) [pid = 2185] [serial = 2] [outer = (nil)] [url = about:blank]
 21:33:32     INFO -  --DOMWINDOW == 9 (0x7f7389c90400) [pid = 2185] [serial = 5] [outer = (nil)] [url = about:blank]
 21:33:32     INFO -  --DOMWINDOW == 8 (0x7f73892ebc00) [pid = 2185] [serial = 6] [outer = (nil)] [url = about:blank]
 21:33:32     INFO -  --DOMWINDOW == 7 (0x7f7388f25400) [pid = 2185] [serial = 9] [outer = (nil)] [url = about:blank]
 21:33:36     INFO -  --DOMWINDOW == 6 (0x7f7388f2b400) [pid = 2185] [serial = 10] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/media/webspeech/synth/test/file_bfcache_frame.html]
 21:38:18     INFO -  TEST-INFO | started process screentopng
 21:38:19     INFO -  TEST-INFO | screentopng: exit 0
 21:38:19     INFO -  24 INFO SimpleTest START
 21:38:19     INFO -  25 INFO TEST-UNEXPECTED-FAIL | dom/media/webspeech/synth/test/test_bfcache.html | Test timed out.
 21:38:19     INFO -      reportError@SimpleTest/TestRunner.js:114:7
 21:38:19     INFO -      TestRunner._checkForHangs@SimpleTest/TestRunner.js:134:7
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:155:5
 21:38:19     INFO -      TestRunner.runTests@SimpleTest/TestRunner.js:366:5
 21:38:19     INFO -      RunSet.runtests@SimpleTest/setup.js:188:3
 21:38:19     INFO -      RunSet.runall@SimpleTest/setup.js:167:5
 21:38:19     INFO -      hookupTests@SimpleTest/setup.js:260:5
 21:38:19     INFO -  parseTestManifest@http://mochi.test:8888/manifestLibrary.js:36:5
 21:38:19     INFO -  getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:49:11
 21:38:19     INFO -  EventHandlerNonNull*getTestManifest@http://mochi.test:8888/manifestLibrary.js:45:3
 21:38:19     INFO -      hookup@SimpleTest/setup.js:240:5
 21:38:19     INFO -  EventHandlerNonNull*@http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&consoleLevel=INFO&hideResultsTable=1&manifestFile=tests.json&dumpOutputDirectory=%2Ftmp:11:1
 21:38:19     INFO -  MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
 21:38:19     INFO -  MEMORY STAT | vsize 508MB | residentFast 89MB | heapAllocated 15MB
 21:38:19     INFO -  26 INFO TEST-OK | dom/media/webspeech/synth/test/test_bfcache.html | took 301050ms
 21:38:19     INFO -  ++DOMWINDOW == 7 (0x7f73892de800) [pid = 2185] [serial = 12] [outer = 0x7f7389c8f800]
 21:38:19     INFO -  27 INFO TEST-START | dom/media/webspeech/synth/test/test_global_queue.html
 21:38:19     INFO -  ++DOMWINDOW == 8 (0x7f73892e0800) [pid = 2185] [serial = 13] [outer = 0x7f7389c8f800]
 21:38:19     INFO -  ++DOCSHELL 0x7f7389161800 == 4 [pid = 2185] [id = 4]
 21:38:19     INFO -  ++DOMWINDOW == 9 (0x7f7388e4d000) [pid = 2185] [serial = 14] [outer = (nil)]
 21:38:19     INFO -  ++DOMWINDOW == 10 (0x7f7389c93800) [pid = 2185] [serial = 15] [outer = 0x7f7388e4d000]
 21:38:20     INFO -  --DOCSHELL 0x7f7389158800 == 3 [pid = 2185] [id = 3]
 21:38:20     INFO -  ++DOMWINDOW == 11 (0x7f738a755000) [pid = 2185] [serial = 16] [outer = 0x7f7388e4d000]
 21:38:20     INFO -  ++DOCSHELL 0x7f738a79b000 == 4 [pid = 2185] [id = 5]
 21:38:20     INFO -  ++DOMWINDOW == 12 (0x7f738a75bc00) [pid = 2185] [serial = 17] [outer = (nil)]
 21:38:20     INFO -  ++DOCSHELL 0x7f738a7a5800 == 5 [pid = 2185] [id = 6]
 21:38:20     INFO -  ++DOMWINDOW == 13 (0x7f738a7cc000) [pid = 2185] [serial = 18] [outer = (nil)]
 21:38:20     INFO -  ++DOMWINDOW == 14 (0x7f7388e5b000) [pid = 2185] [serial = 19] [outer = 0x7f738a75bc00]
 21:38:20     INFO -  ++DOMWINDOW == 15 (0x7f738a7cf800) [pid = 2185] [serial = 20] [outer = 0x7f738a7cc000]
 21:38:20     INFO -  [Child 2185] WARNING: No inner window available!: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/dom/base/nsGlobalWindow.cpp, line 9258
 21:38:21     INFO -  MEMORY STAT | vsize 509MB | residentFast 90MB | heapAllocated 17MB
 21:38:21     INFO -  28 INFO TEST-OK | dom/media/webspeech/synth/test/test_global_queue.html | took 1986ms
 21:38:21     INFO -  ++DOMWINDOW == 16 (0x7f7388e56c00) [pid = 2185] [serial = 21] [outer = 0x7f7389c8f800]
 21:38:21     INFO -  29 INFO TEST-START | dom/media/webspeech/synth/test/test_global_queue_cancel.html
 21:38:21     INFO -  [Parent 2131] WARNING: Audio Buffer is not full by the end of the callback.: 'Available() == 0 || mSampleWriteOffset == 0', file /builds/slave/m-in-l64-d-0000000000000000000/build/src/dom/media/AudioBufferUtils.h, line 87
 21:38:22     INFO -  ++DOMWINDOW == 17 (0x7f738a9c9c00) [pid = 2185] [serial = 22] [outer = 0x7f7389c8f800]
 21:38:22     INFO -  ++DOCSHELL 0x7f7389cb1800 == 6 [pid = 2185] [id = 7]
 21:38:22     INFO -  ++DOMWINDOW == 18 (0x7f738a9cc400) [pid = 2185] [serial = 23] [outer = (nil)]
 21:38:22     INFO -  ++DOMWINDOW == 19 (0x7f738bbcfc00) [pid = 2185] [serial = 24] [outer = 0x7f738a9cc400]
 21:38:22     INFO -  ++DOMWINDOW == 20 (0x7f73892e6000) [pid = 2185] [serial = 25] [outer = 0x7f738a9cc400]
 21:38:22     INFO -  ++DOCSHELL 0x7f738a7a0000 == 7 [pid = 2185] [id = 8]
 21:38:22     INFO -  ++DOMWINDOW == 21 (0x7f7388f2e800) [pid = 2185] [serial = 26] [outer = (nil)]
 21:38:22     INFO -  ++DOCSHELL 0x7f738aa42800 == 8 [pid = 2185] [id = 9]
 21:38:22     INFO -  ++DOMWINDOW == 22 (0x7f7389c90400) [pid = 2185] [serial = 27] [outer = (nil)]
 21:38:22     INFO -  ++DOMWINDOW == 23 (0x7f738a75b800) [pid = 2185] [serial = 28] [outer = 0x7f7388f2e800]
 21:38:22     INFO -  ++DOMWINDOW == 24 (0x7f738a91c000) [pid = 2185] [serial = 29] [outer = 0x7f7389c90400]
 21:38:22     INFO -  MEMORY STAT | vsize 510MB | residentFast 91MB | heapAllocated 18MB
Comment hidden (Intermittent Failures Robot)
(Assignee)

Updated

3 years ago
Assignee: nobody → eitan
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
(Assignee)

Comment 4

3 years ago
I suspect that the pageshow listener on the iframe is not getting called. Investigating further...
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
I have a timeout of this test recorded in rr. I'll look into it shortly.
Comment hidden (Intermittent Failures Robot)
(Assignee)

Comment 10

3 years ago
Looks like bageshow events don't always work. onload is not an option because it makes pages not get cached. Simply putting the script in the body of the document is good enough for these tests.
Attachment #8707960 - Flags: review?(bugs)
Adding event listener for load doesn't affect to bfcache.
It is  beforeunload and unload listeners which do.
(Assignee)

Comment 12

3 years ago
(In reply to Robert O'Callahan (:roc) (offline Jan 16-24)  (Mozilla Corporation) from comment #8)
> I have a timeout of this test recorded in rr. I'll look into it shortly.

I missed that. Thanks. My fix is mostly a misundereducated guess. Logs show no speech ever starting, so I assume the JS call to speech are not executed.

(In reply to Olli Pettay [:smaug] from comment #11)
> Adding event listener for load doesn't affect to bfcache.
> It is  beforeunload and unload listeners which do.

Ah, right. Either way, no need for onload or onpageshow.
oh, silly me. I reviewed this and wasn't thinking the actual test is in iframe. That prevents all bfache. bfcache is for top level navigations only. But anyhow, the test is useful test, files just named wrongly.
Comment on attachment 8707960 [details] [diff] [review]
Robustify synth/test/test_bfcache.html to avoid intermittent timeouts.

But still, the existing code should work.
Did you perhaps push to try a version which logs when various events are handled.
If pageshow is not being dispatched, something is really broken.

So could you explain why the change is supposed to fix something.
Attachment #8707960 - Flags: review?(bugs) → review-
Comment hidden (Intermittent Failures Robot)
(In reply to Robert O'Callahan (:roc) (offline Jan 16-24)  (Mozilla Corporation) from comment #8)
> I have a timeout of this test recorded in rr. I'll look into it shortly.

I didn't get to this before I go on PTO for a week. I'll get back to it after that.
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
(Assignee)

Comment 20

3 years ago
OK, did some digging. Managed to reproduce this fairly reliably with cpulimit and capture it in rr. It looks like the MediaStreamListener does not get notified in a timely fashion after SourceMediaStream::EndAllTrackAndFinish is called.

A quick fix for this test would be to use an indirect audio service (tested locally, and I think it fixes this timeout). But, I imagine there may be something in MSG worth fixing?
Comment hidden (Intermittent Failures Robot)
(Assignee)

Comment 22

3 years ago
Disregard the previous comment :P
(Assignee)

Comment 23

3 years ago
At least in the rr recording it looks like the timeout happens between the FakeDirectAudioSynth::Speak and the time it takes the nested runnable to actually run. I'm not convinced this is the same orange failure.
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
We got a crash with a stack in this test over here:

  https://bugzilla.mozilla.org/show_bug.cgi?id=1242113

Related?
See Also: → 1242113
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Congratulations on making it to the top of the intermittent-failure list by a considerable margin. Your award is a skip-if.
Whiteboard: [test disabled on Linux][test disabled on Android]
Keywords: leave-open
"Oops."
Comment hidden (Intermittent Failures Robot)
With the help of rr chaos mode, I reproduced a timeout which happens because the "hi!" utterance never fires its end event. That happens like so:

-- nsFakeSynthServices calls into nsSpeechTask::Setup which calls mStream->AddListener. This queues a message to be sent to the graph at the next main-thread stable state.
-- nsFakeSynthServices calls into nsSpeechTask::SendAudioImpl with aDataLen > 0. We stuff some data into the SourceMediaStream queue.
-- nsFakeSynthServices calls into nsSpeechTask::SendAudioImpl with aDataLen == 0. We mark the SourceMediaStream tracks and stream ended from the producer's point of view.
-- The next main-thread stable state occurs. An MSG iteration has already started and is updating stream order, i.e. messages have already been processed, but SourceMediaStream::ExtractPendingInput has not yet been called.
-- So, SourceMediaStream::ExtractPendingInput runs and the AddListener message has not yet been processed.
-- In the same MSG iteration, MediaStreamGraphImpl::UpdateCurrentTimeForStreams sets stream->mNotifiedFinished and stream->mNotifiedBlocked (I guess because the fake utterance is very short and we're able to play it out in a single MSG iteration).
-- In the next MSG iteration, we process AddListener in MediaStream::AddListenerImpl. We call listener->NotifyBlockingChanged(BLOCKED). SynthStreamListener::NotifyBlockingChanged ignores that so nsSpeechTask never changes to STATE_SPEAKING!
-- MediaStream::AddListenerImpl calls listener->NotifyEvent(EVENT_FINISHED) which dispatches a DispatchEndInner. 
DispatchEndImpl runs and sees the nsSpeechTask is still in STATE_PENDING, and sets it to NONE without dispatching an "end" event.
I guess the core issue here is around the expectation that adding a listener can take effect after the MSG has processed the audio data, even though the audio data is put into the SourceMediaStream after adding the listener. Perhaps the best way to fix that would be to make SourceMediaStream's data-sending methods queue messages for the next stable state, like MSG control messages, when called on the main thread. But that's a bit complicated and potentially ugly.

Overall it might be a better fix just say that expectation is unjustified, and have SynthStreamListener::NotifyEvent(EVENT_FINISHED) enter the mStarted state and dispatch a DoNotifyStarted if NotifyBlockingChanged hasn't already done so. Sound OK, Eitan?

BTW I have some other timed-out rr runs which I'll take a look at to see if they have the same cause.
Flags: needinfo?(eitan)
All 4 other "Test timed out" runs are because of the same issue.

I don't know for sure that this issue is what caused the timeouts in test automation, but it's worth fixing.
Comment hidden (Intermittent Failures Robot)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #42)
> I guess the core issue here is around the expectation that adding a listener
> can take effect after the MSG has processed the audio data, even though the
> audio data is put into the SourceMediaStream after adding the listener.

Oops, I mixed this up.

The expectation of the nsSpeechTask code is that if you add a listener and then insert audio data into the SourceMediaStream, the listener will be added before the data is processed. That expectation is currently invalid.
(Assignee)

Comment 46

3 years ago
MozReview-Commit-ID: BJSQOiyXmRh
Attachment #8707960 - Attachment is obsolete: true
Attachment #8717640 - Flags: review?(roc)
(Assignee)

Comment 47

3 years ago
Thanks for the suggestion roc. I think this is what I saw in comment #20 in my own rr recording. Wasn't entirely convinced it was the same failure as the one on treeherder, because inserting LOG_NSPR_MODULES=SpeechSYnthesis:5 into the treeherder tests did not show output that would suggest this.

OTOH, using the indirect speech service that doesn't use MSG seemed to fix this. So maybe it is.

Anyway, it is still worth a fix as you say. Patch above.
Flags: needinfo?(eitan)
Comment on attachment 8717640 [details] [diff] [review]
Notify synth start if we get a finished event without a blocking-changed event. r?roc

Review of attachment 8717640 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webspeech/synth/nsSpeechTask.cpp
@@ +60,5 @@
>          {
> +          if (!mStarted) {
> +            nsCOMPtr<nsIRunnable> startRunnable =
> +              NS_NewRunnableMethod(this, &SynthStreamListener::DoNotifyStarted);
> +              aGraph->DispatchToMainThreadAfterStreamStateUpdate(startRunnable.forget());

I'd set mStarted to true here just for sanity's sake.
Attachment #8717640 - Flags: review?(roc) → review+
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Looks like you never re-enabled this test on trunk? Also, can you request Aurora approval for this uplift assuming the risk is low enough to warrant it?
Flags: needinfo?(eitan)
Keywords: leave-open
Whiteboard: [test disabled on Linux][test disabled on Android]
Comment hidden (Intermittent Failures Robot)
(Assignee)

Comment 57

3 years ago
Did a try run. It looks good, so I am re-enabling the test.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=53538223cf05

I missed the train for the previous Aurora, so there is no point in uplifting the fix.
Flags: needinfo?(eitan)

Comment 58

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/53538223cf05
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
You need to log in before you can comment on or make changes to this bug.