Closed Bug 1440169 Opened 2 years ago Closed 2 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_verifyAudioAfterRenegotiation.html | application timed out after 370 seconds with no output

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox58 --- unaffected
firefox59 --- unaffected
firefox60 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: pehrsons)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files)

Filed by: ncsoregi [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=163548191&repo=mozilla-central

https://queue.taskcluster.net/v1/task/WMCGAvmCQLGOVQtsTBnrRA/runs/0/artifacts/public/logs/live_backing.log

[task 2018-02-21T22:49:33.115Z] 22:49:33     INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_verifyAudioAfterRenegotiation.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts 
[task 2018-02-21T22:49:33.115Z] 22:49:33     INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_verifyAudioAfterRenegotiation.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts 
[task 2018-02-21T22:49:33.116Z] 22:49:33     INFO - Checking remote audio disabled
[task 2018-02-21T22:49:33.116Z] 22:49:33     INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_verifyAudioAfterRenegotiation.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts 
[task 2018-02-21T22:49:33.116Z] 22:49:33     INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_verifyAudioAfterRenegotiation.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts 
[task 2018-02-21T22:49:33.117Z] 22:49:33     INFO - Run step 55: PC_LOCAL_ADD_SECOND_STREAM
[task 2018-02-21T22:49:33.117Z] 22:49:33     INFO - Get 1 local streams
[task 2018-02-21T22:49:33.117Z] 22:49:33     INFO - Call getUserMedia for {"audio":{"autoGainControl":false,"echoCancellation":false,"noiseSuppression":false}}
[task 2018-02-21T22:49:33.118Z] 22:49:33     INFO - Buffered messages finished
[task 2018-02-21T22:49:33.118Z] 22:49:33    ERROR - TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_verifyAudioAfterRenegotiation.html | application timed out after 370 seconds with no output
[task 2018-02-21T22:49:33.119Z] 22:49:33    ERROR - Force-terminating active process(es).
[task 2018-02-21T22:49:33.120Z] 22:49:33     INFO - Determining child pids from psutil...
[task 2018-02-21T22:49:33.121Z] 22:49:33     INFO - [1141]
[task 2018-02-21T22:49:33.122Z] 22:49:33     INFO - ==> process 1055 launched child process 1075
[task 2018-02-21T22:49:33.123Z] 22:49:33     INFO - ==> process 1055 launched child process 1111
[task 2018-02-21T22:49:33.124Z] 22:49:33     INFO - ==> process 1055 launched child process 1141
[task 2018-02-21T22:49:33.125Z] 22:49:33     INFO - ==> process 1055 launched child process 1710
[task 2018-02-21T22:49:33.126Z] 22:49:33     INFO - Found child pids: set([1075, 1141, 1710, 1111])
[task 2018-02-21T22:49:33.127Z] 22:49:33     INFO - Failed to get child procs
[task 2018-02-21T22:49:33.127Z] 22:49:33     INFO - Killing process: 1075
From https://treeherder.mozilla.org/logviewer.html#?job_id=164249653&repo=mozilla-inbound&lineNumber=11068

[task 2018-02-26T00:08:47.402Z] 00:08:47     INFO - TEST-OK | dom/media/tests/mochitest/test_peerConnection_verifyAudioAfterRenegotiation.html | took 12746ms
[task 2018-02-26T00:08:48.515Z] 00:08:48     INFO - GECKO(1046) | ++DOMWINDOW == 8 (0x7f3635b84400) [pid = 1102] [serial = 105] [outer = 0x7f364c8c5000]
[task 2018-02-26T00:08:48.612Z] 00:08:48     INFO - GECKO(1046) | Assertion failure: i != mAllocations.NoIndex (Cannot stop track that we don't know about), at /builds/worker/workspace/build/src/dom/media/webrtc/MediaEngineWebRTCAudio.cpp:723
[task 2018-02-26T00:09:21.967Z] 00:09:21     INFO - GECKO(1046) | #01: mozilla::MediaDevice::Stop [dom/media/MediaManager.cpp:1001]
[task 2018-02-26T00:09:21.969Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.970Z] 00:09:21     INFO - GECKO(1046) | #02: mozilla::media::LambdaTask<mozilla::SourceListener::SetEnabledFor(mozilla::TrackID, bool)::<lambda(bool)> mutable::<lambda()> >::Run [dom/media/MediaManager.cpp:4057]
[task 2018-02-26T00:09:21.971Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.972Z] 00:09:21     INFO - GECKO(1046) | #03: nsThread::ProcessNextEvent [mfbt/Maybe.h:445]
[task 2018-02-26T00:09:21.972Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.973Z] 00:09:21     INFO - GECKO(1046) | #04: NS_ProcessNextEvent [xpcom/threads/nsThreadUtils.cpp:517]
[task 2018-02-26T00:09:21.974Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.974Z] 00:09:21     INFO - GECKO(1046) | #05: mozilla::ipc::MessagePumpForNonMainThreads::Run [ipc/glue/MessagePump.cpp:364]
[task 2018-02-26T00:09:21.975Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.976Z] 00:09:21     INFO - GECKO(1046) | #06: MessageLoop::RunInternal [ipc/chromium/src/base/message_loop.cc:327]
[task 2018-02-26T00:09:21.976Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.977Z] 00:09:21     INFO - GECKO(1046) | #07: MessageLoop::Run [ipc/chromium/src/base/message_loop.cc:298]
[task 2018-02-26T00:09:21.978Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.978Z] 00:09:21     INFO - GECKO(1046) | #08: base::Thread::ThreadMain [ipc/chromium/src/base/thread.cc:184]
[task 2018-02-26T00:09:21.979Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.980Z] 00:09:21     INFO - GECKO(1046) | #09: ThreadFunc [ipc/chromium/src/base/platform_thread_posix.cc:40]
[task 2018-02-26T00:09:21.980Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.980Z] 00:09:21     INFO - GECKO(1046) | #10: libpthread.so.0 + 0x76ba
[task 2018-02-26T00:09:21.980Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.981Z] 00:09:21     INFO - GECKO(1046) | #11: libc.so.6 + 0x10741d
[task 2018-02-26T00:09:21.982Z] 00:09:21     INFO - 
[task 2018-02-26T00:09:21.982Z] 00:09:21     INFO - GECKO(1046) | #12: ??? (???:???)
[task 2018-02-26T00:09:21.983Z] 00:09:21     INFO - GECKO(1046) | --DOMWINDOW == 9 (0x7f02e4c4d400) [pid = 1046] [serial = 51] [outer = (nil)] [url = about:blank]
[task 2018-02-26T00:09:21.984Z] 00:09:21     INFO - GECKO(1046) | --DOMWINDOW == 8 (0x7f02e4c45400) [pid = 1046] [serial = 53] [outer = (nil)] [url = about:blank]
[task 2018-02-26T00:09:21.985Z] 00:09:21     INFO - GECKO(1046) | [Parent 1046, Gecko_IOThread] WARNING: pipe error (78): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
[task 2018-02-26T00:09:21.986Z] 00:09:21     INFO - GECKO(1046) | ###!!! [Parent][MessageChannel] Error: (msgtype=0x15007F,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
[task 2018-02-26T00:09:21.986Z] 00:09:21     INFO - GECKO(1046) | ++DOCSHELL 0x7f02e6972800 == 5 [pid = 1046] [id = {0bde128e-9179-4e38-a408-8284f8b17ccd}]
[task 2018-02-26T00:09:21.987Z] 00:09:21     INFO - GECKO(1046) | ++DOMWINDOW == 9 (0x7f02e29c7400) [pid = 1046] [serial = 56] [outer = (nil)]
[task 2018-02-26T00:09:21.988Z] 00:09:21     INFO - GECKO(1046) | ++DOMWINDOW == 10 (0x7f02e29ca000) [pid = 1046] [serial = 57] [outer = 0x7f02e29c7400]
[task 2018-02-26T00:09:21.988Z] 00:09:21     INFO - GECKO(1046) | [Parent 1046, Main Thread] WARNING: NS_ENSURE_TRUE(aSecondURI) failed: file /builds/worker/workspace/build/src/dom/base/ThirdPartyUtil.cpp, line 98
[task 2018-02-26T00:09:21.988Z] 00:09:21     INFO - GECKO(1046) | ++DOMWINDOW == 11 (0x7f02e29cb000) [pid = 1046] [serial = 58] [outer = 0x7f02e29c7400]
[task 2018-02-26T00:09:21.989Z] 00:09:21     INFO - GECKO(1046) | --DOCSHELL 0x7f02e7115000 == 4 [pid = 1046] [id = {f27538a9-26e0-4da5-900c-334e3e17932c}]
[task 2018-02-26T00:09:21.990Z] 00:09:21     INFO - GECKO(1046) | --DOMWINDOW == 10 (0x7f02e4c16000) [pid = 1046] [serial = 54] [outer = (nil)] [url = chrome://browser/content/webrtcIndicator.xul]
[task 2018-02-26T00:09:21.990Z] 00:09:21     INFO - GECKO(1046) | --DOMWINDOW == 9 (0x7f02e29ca000) [pid = 1046] [serial = 57] [outer = (nil)] [url = about:blank]
[task 2018-02-26T00:09:21.990Z] 00:09:21     INFO - GECKO(1046) | --DOMWINDOW == 8 (0x7f02e4c18800) [pid = 1046] [serial = 55] [outer = (nil)] [url = about:blank]
[task 2018-02-26T00:15:31.984Z] 00:15:31     INFO - Buffered messages finished
[task 2018-02-26T00:15:31.985Z] 00:15:31    ERROR - TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_verifyAudioAfterRenegotiation.html | application timed out after 370 seconds with no output
Depends on: gUM_timeout
I'm taking this bug for fixing SineWaveGenerator::generate.

It's an implicit cast of the number of samples to generate from TrackTicks (int64_t) to the argument type int16_t. When this overflows ... boom.
Assignee: nobody → apehrson
Blocks: 1436694
Status: NEW → ASSIGNED
Rank: 15
Priority: P5 → P2
Comment on attachment 8955439 [details]
Bug 1440169 - Take TrackTicks samples in SineWaveGenerator::generate.

https://reviewboard.mozilla.org/r/224604/#review230600

stealing the review.
Attachment #8955439 - Flags: review+
Comment on attachment 8955439 [details]
Bug 1440169 - Take TrackTicks samples in SineWaveGenerator::generate.

https://reviewboard.mozilla.org/r/224604/#review230604

::: dom/media/webrtc/SineWaveGenerator.h:40
(Diff revision 1)
>  
>      while (remaining) {
> -      int16_t processSamples = 0;
> +      TrackTicks processSamples = 0;
>  
>        if (mTotalLength - mReadLength >= remaining) {
>          processSamples = remaining;

Now you have the same problem here (and later in the method). I think you need to apply the same change to `mTotalLength`and `mReadLength` in order to stay in the safe side.
Comment on attachment 8955439 [details]
Bug 1440169 - Take TrackTicks samples in SineWaveGenerator::generate.

https://reviewboard.mozilla.org/r/224604/#review230606

::: dom/media/webrtc/SineWaveGenerator.h:40
(Diff revision 1)
>  
>      while (remaining) {
> -      int16_t processSamples = 0;
> +      TrackTicks processSamples = 0;
>  
>        if (mTotalLength - mReadLength >= remaining) {
>          processSamples = remaining;

Sorry I put my pointer in the wrong line.
Comment on attachment 8955439 [details]
Bug 1440169 - Take TrackTicks samples in SineWaveGenerator::generate.

https://reviewboard.mozilla.org/r/224604/#review230604

> Now you have the same problem here (and later in the method). I think you need to apply the same change to `mTotalLength`and `mReadLength` in order to stay in the safe side.

No, mTotalLength is the length of mAudioBuffer, which is fixed (one sine wave period). mReadLength is always in the span [0, mTotalLength] so there's no risk of overflowing those.
Comment on attachment 8955439 [details]
Bug 1440169 - Take TrackTicks samples in SineWaveGenerator::generate.

https://reviewboard.mozilla.org/r/224604/#review230610

::: dom/media/webrtc/SineWaveGenerator.h:47
(Diff revision 1)
>          processSamples = mTotalLength - mReadLength;
>        }
>        memcpy(aBuffer, &mAudioBuffer[mReadLength], processSamples * bytesPerSample);
>        aBuffer += processSamples;
>        mReadLength += processSamples;
>        remaining -= processSamples;

That's the correct line and I am referring to the two lines above. I see that `processSamples` is gaurded to be be `mTotalLehgth` at most which is good. 

A second question is if that produces a compiler warning.
Attachment #8955439 - Flags: review?(achronop) → review+
I found the root of this. It's a regression from bug 1299515. Now when the fake audio device sees the sequence Start(); Stop(); Start(); we risk hitting this. On the second Start() we reset the internal counter for how much data has been appended so on the next Pull() we try to double it up, which is too much for the sine generator.
Blocks: 1299515
Component: WebRTC → WebRTC: Audio/Video
Keywords: regression
Comment on attachment 8955439 [details]
Bug 1440169 - Take TrackTicks samples in SineWaveGenerator::generate.

https://reviewboard.mozilla.org/r/224604/#review230610

> That's the correct line and I am referring to the two lines above. I see that `processSamples` is gaurded to be be `mTotalLehgth` at most which is good. 
> 
> A second question is if that produces a compiler warning.

I changed `processSamples` back to int16_t as it only says how many samples within mTotalLength that we should copy.

No compiler warnings, no static-analysis warnings.
Comment on attachment 8955456 [details]
Bug 1440169 - Don't reset mLastNotify in resuming Start().

https://reviewboard.mozilla.org/r/224618/#review230614

Please keep the other one as a normal width type, it has no reason to be shorter, though.
Attachment #8955456 - Flags: review?(padenot) → review+
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/df7b724956eb
Take TrackTicks samples in SineWaveGenerator::generate. r=achronop,padenot
https://hg.mozilla.org/integration/autoland/rev/91d6515d47cd
Don't reset mLastNotify in resuming Start(). r=padenot
https://hg.mozilla.org/mozilla-central/rev/df7b724956eb
https://hg.mozilla.org/mozilla-central/rev/91d6515d47cd
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=165899607&repo=mozilla-inbound&lineNumber=11303
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Andreea Pavel [:apavel] from comment #19)
> Recent failure log:
> https://treeherder.mozilla.org/logviewer.html#?job_id=165899607&repo=mozilla-
> inbound&lineNumber=11303

That log contains "Assertion failure: i != mAllocations.NoIndex". That would be bug 1439886 or bug 1441798.
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
:pehrsons thank you, those are duplicates according to the failure log. 

For this bug however, there is this: https://treeherder.mozilla.org/logviewer.html#?job_id=166430160&repo=mozilla-inbound&lineNumber=11273

noticed that [@ mozilla::ThreadSafeAutoRefCnt::get] is now [@ mozilla::MediaEngineWebRTCMicrophoneSource::Stop], however the log matches the initial log from when Natalia filled the bug. 

Can you please take a look?
Flags: needinfo?(apehrson)
Sure. Could we get this symptom filed against a new bug in the meantime though?

One failure mode down, another to go.
Flags: needinfo?(apehrson)
Blocks: 1443774
(In reply to Andreea Pavel [:apavel] from comment #22)
> :pehrsons thank you, those are duplicates according to the failure log. 
> 
> For this bug however, there is this:
> https://treeherder.mozilla.org/logviewer.html#?job_id=166430160&repo=mozilla-
> inbound&lineNumber=11273

This looks like the opt version of bug 1439886.


> noticed that [@ mozilla::ThreadSafeAutoRefCnt::get] is now [@
> mozilla::MediaEngineWebRTCMicrophoneSource::Stop], however the log matches
> the initial log from when Natalia filled the bug.

The initial log looks like a mid-test crash, whereas the one you linked now looks more like a shutdown issue. The initial one looks nasty but I believe it was fixed in bug 1440347.
No longer blocks: 1443774
See Also: → 1443774
You need to log in before you can comment on or make changes to this bug.