Closed Bug 1286041 Opened 4 years ago Closed 3 years ago

1,700 instances of "data callback fires before cubeb_stream_start() is called" emitted from dom/media/AudioStream.cpp during linux64 debug testing

Categories

(Core :: Audio/Video: cubeb, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: erahm, Assigned: achronop)

References

(Blocks 2 open bugs)

Details

Attachments

(6 files, 7 obsolete files)

2.02 KB, patch
jwwang
: review+
Details | Diff | Splinter Review
36.10 KB, patch
kinetik
: review+
Details | Diff | Splinter Review
1.25 KB, patch
Details | Diff | Splinter Review
1.28 KB, patch
padenot
: review+
Details | Diff | Splinter Review
688 bytes, text/html
Details
803 bytes, patch
jwwang
: review+
Details | Diff | Splinter Review
> 1657 WARNING: data callback fires before cubeb_stream_start() is called: file dom/media/AudioStream.cpp, line 597

This warning [1] shows up in the following test suites:

>    593 - [TC] Linux64 mochitest-media-e10s mda
>    583 - [TC] Linux64 mochitest-media mda
>     63 - [TC] Linux64 web-platform-tests-5 5
>     58 - [TC] Linux64 web-platform-tests-e10s-5 5
>     46 - [TC] Linux64 crashtest e10s C
>     42 - [TC] Linux64 crashtest C
>     42 - [TC] Linux64 mochitest-browser-chrome-2 bc2
>     41 - [TC] Linux64 mochitest-browser-chrome-e10s-4 bc4
>     40 - [TC] Linux64 mochitest-2 2
>     23 - [TC] Linux64 mochitest-browser-chrome-e10s-7 bc7
>     23 - [TC] Linux64 mochitest-browser-chrome-7 bc7
>     20 - [TC] Linux64 mochitest-gl gl
>     20 - [TC] Linux64 mochitest-gl-e10s gl
>     15 - [TC] Linux64 mochitest-clipboard cl
>     11 - [TC] Linux64 web-platform-tests-7 7
>     10 - [TC] Linux64 mochitest-plain-e10s-2 2
>      9 - [TC] Linux64 web-platform-tests-e10s-7 7
>      6 - [TC] Linux64 mochitest-plain-e10s-10 10
>      6 - [TC] Linux64 mochitest-10 10
>      1 - [TC] Linux64 mochitest-chrome-3 c3
>      1 - [TC] Linux64 mochitest-browser-chrome-3 bc3
>      1 - [TC] Linux64 mochitest-a11y a11y
>      1 - [TC] Linux64 mochitest-browser-chrome-e10s-5 bc5
>      1 - [TC] Linux64 mochitest-3 3
>      1 - [TC] Linux64 mochitest-plain-e10s-3 3

It shows up in 260 tests. A few of the most prevalent:

>     81 - [e10s] dom/media/test/test_playback.html
>     78 -        dom/media/test/test_playback.html
>     63 -        dom/media/test/test_played.html
>     62 - [e10s] dom/media/test/test_played.html
>     39 - [e10s] dom/media/test/test_fragment_play.html
>     38 -        dom/media/test/test_fragment_play.html
>     37 - [e10s] file:///home/worker/workspace/build/tests/reftest/tests/dom/media/test/crashtests/691096-1.html
>     33 -        dom/media/test/test_loop.html
>     32 - [e10s] dom/media/test/test_bug495145.html
>     32 - [e10s] dom/media/test/test_loop.html

[1] https://hg.mozilla.org/mozilla-central/annotate/214884d507ee/dom/media/AudioStream.cpp#l597
Paul -- can you prioritize this when you reply to the needinfo (setting priority and rank)?  Thanks.
Whiteboard: [needinfo 2016/07/11 to Padenot]
I'll try to fix that today.
Assignee: nobody → padenot
Rank: 15
Flags: needinfo?(padenot)
Priority: -- → P1
This is going to be fixed upstream: https://github.com/kinetiknz/cubeb/pull/125
Assignee: padenot → achronop
The upstream contains the fix for this issue. The patch is r+ by kinetik in cubeb github repo
Comment on attachment 8777778 [details] [diff] [review]
AudioStream crash when callback is called before sream start

Review of attachment 8777778 [details] [diff] [review]:
-----------------------------------------------------------------

Can you explain how this patch fixes the crash and what crash it is?
(In reply to JW Wang [:jwwang] from comment #14)
> Can you explain how this patch fixes the crash and what crash it is?

Cubeb data callback circulating on different thread and might start firing before `InvokeCubeb(cubeb_stream_start)` returns and check the return value. By moving to starting state indicating that case which is acceptable by the callback (asserts only when state == initialized).
I still don't get it. How do adding a new state and an assertion prevent data callbacks from happening before |InvokeCubeb(cubeb_stream_start)|?
There is a second patch attached which does the rest of the job in the PaulseAudio backend. PulseAudio used to fire a pre buffer callback during the setup. After consulting with PulseAudio developers we find a way to fix that. This patch is already review in cubeb repo. 

The second patch is the one I want you to review and the explanation is the above.
Let me make sure if I understand it correctly:
The patch I review add an assertion |MOZ_ASSERT(mState != INITIALIZED)| which is used to verify the fix in PaulseAudio is correctly, right?

So data callbacks should never come before calls to |InvokeCubeb(cubeb_stream_start)|. However, they might come before |InvokeCubeb(cubeb_stream_start)| returns, right?
Yeah that's exactly the case.
Comment on attachment 8777778 [details] [diff] [review]
AudioStream crash when callback is called before sream start

Review of attachment 8777778 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/AudioStream.cpp
@@ +593,5 @@
>  
>    auto writer = AudioBufferWriter(
>      reinterpret_cast<AudioDataValue*>(aBuffer), mOutChannels, aFrames);
>  
> +  MOZ_ASSERT(mState != INITIALIZED);

Adding a state "STARTING" for the assertion adds unnecessary complexity for each use of mState must be reviewed to make sure it takes the new state "STARTING" into account.

I would prefer to add a bool "mBeforeStartCubeb" for the debugging/assertion purpose.
Attachment #8777778 - Flags: review?(jwwang) → review-
If we add a flag we need to maintain the state of the flag in every case of start/stop/error. One solutiuon to avoid that and the new state is to set state to STARTED before the invoke and (in the same place STARTING is used now) and keep checking again after the invoke. This is safe and I know because I have tested manually and on try.

A second option would be to set the STARTED state on StateCallback. I am sure it's safe for PulseAudio but it affects all other backends and I need to try test it. I will do it happily if this is an acceptable solution.
Option 1 looks good to me.
Instead of creating a new state set the STARTED state before the invoke and check again after.
Attachment #8777778 - Attachment is obsolete: true
Attachment #8778188 - Flags: review?(jwwang)
Comment on attachment 8778188 [details] [diff] [review]
AudioStream crash when callback is called before sream start

Review of attachment 8778188 [details] [diff] [review]:
-----------------------------------------------------------------

The commit message doesn't match what you do in this patch. Please add comments to explain it.
Attachment #8778188 - Flags: review?(jwwang) → review+
Updated with comments
Attachment #8778188 - Attachment is obsolete: true
Whiteboard: [needinfo 2016/07/11 to Padenot]
Keywords: checkin-needed
For tree sheriff on duty: Both patches need to be checked in. 
Review on 1st patch exist on the following Pull Request:
https://github.com/kinetiknz/cubeb/pull/142
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/14d4b1e011ad
Cubeb PulseAudio backend stop calling callback before stream start. r=kinetik
https://hg.mozilla.org/integration/mozilla-inbound/rev/72e5792fe990
Assert audio callback is not called in initialized state. r=jwwang
Keywords: checkin-needed
One more case discussed on the cubeb repo and can be found in the following links
https://github.com/kinetiknz/cubeb/issues/147#issuecomment-241928950
https://github.com/kinetiknz/cubeb/issues/147#issuecomment-241928950

I will try to solve it by setting the mState value for start in the state change callback. I will push a new try with that since it affects all the platforms.
This is an updated version of the previous patch. I moved the setting of mState value to state change callback for the case of start. The reason I did that is described in comment 30 (https://bugzilla.mozilla.org/show_bug.cgi?id=1286041#c30). I executed all try runs and looks green for every platform.
Attachment #8778199 - Attachment is obsolete: true
Attachment #8785255 - Flags: review?(jwwang)
This patch solves the back out issue.
Flags: needinfo?(achronop)
Comment on attachment 8785255 [details] [diff] [review]
AudioStream crash when callback is called before sream start

Review of attachment 8785255 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/AudioStream.cpp
@@ +400,5 @@
>  {
>    MonitorAutoLock mon(mMonitor);
>    MOZ_ASSERT(mState == INITIALIZED);
>    auto r = InvokeCubeb(cubeb_stream_start);
> +  if (r != CUBEB_OK) {

I think you can do |mState = STARTED| before calling cubeb_stream_start. We just need to change mState to ERRORED only if r is not CUBEB_OK. Then you don't need to change StateCallback().
Attachment #8785255 - Flags: review?(jwwang) → review-
> I think you can do |mState = STARTED| before calling cubeb_stream_start. We
> just need to change mState to ERRORED only if r is not CUBEB_OK. Then you
> don't need to change StateCallback().

Indeed, it's another way to do the same thing, since it is more preferable I will update the patch.
Updated patch to set the mState = STARTED before the invoke and remove the StateCallback changes.
Attachment #8785255 - Attachment is obsolete: true
Attachment #8785832 - Flags: review?(jwwang)
Attachment #8785832 - Flags: review?(jwwang) → review+
For Sheriff:
Please check in all 3 patches.

Review of 1st patch on the following Pull Request:
https://github.com/kinetiknz/cubeb/pull/142
Review of 2nd patch on the following PR: 
https://github.com/kinetiknz/cubeb/pull/148
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/48bb5e3170a0
Cubeb PulseAudio backend stop calling callback before stream start. r=kinetik
https://hg.mozilla.org/integration/mozilla-inbound/rev/25dd85b13da9
Assert audio callback is not called in initialized state. r=jwwang
https://hg.mozilla.org/integration/mozilla-inbound/rev/5e08455b6011
Cubeb PulseAudio wait to drain when stop during draining. r=kinetik
Keywords: checkin-needed
Backout by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb93ad7b6307
Backed out 3 changesets for crashtest failures.
Comment on attachment 8787610 [details] [diff] [review]
AudioContext prevent from chage state to Suspended when in Closed

Review of attachment 8787610 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webaudio/AudioContext.cpp
@@ +842,5 @@
>      return;
>    }
>  
> +  // This can happen if shutdown close comes before driver close
> +  if (mAudioContextState == AudioContextState::Closed &&

I'd rephrase the comment like so:

> This can happen if this is called in reaction to a 
> MediaStreamGraph shutdown, and a AudioContext was being 
> suspended at the same time, for example if a page was being 
> closed.

if it actually represent what is happening here.
Attachment #8787610 - Flags: review?(padenot) → review+
The patch includes all updates of cubeb up to commit Update cubeb library on commit 2f3b9eb. All individuals commits have been r+ in the cubeb github repo. I set it for review just for a second check if everything is in place.
Attachment #8777774 - Attachment is obsolete: true
Attachment #8785257 - Attachment is obsolete: true
Attachment #8788209 - Flags: review?(kinetik)
Updated with comments from padenot
Attachment #8787610 - Attachment is obsolete: true
Comment on attachment 8788209 [details] [diff] [review]
Update cubeb 2f3b9eb including the patch

LGTM
Attachment #8788209 - Flags: review?(kinetik) → review+
For the sheriff: Please check in all 3 patches.
Keywords: checkin-needed
Pushed by paul@paul.cx:
https://hg.mozilla.org/integration/mozilla-inbound/rev/71b9d48efac6
Assert audio callback is not called in initialized state. r=jwwang
https://hg.mozilla.org/integration/mozilla-inbound/rev/232069c62626
Ignore AudioContext state change to Suspended when in Closed. r=padenot
https://hg.mozilla.org/integration/mozilla-inbound/rev/42eb493a3791
Update cubeb library on commit 2f3b9eb. r=kinetik
Keywords: checkin-needed
Pushed by paul@paul.cx:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb04b5569ab9
Disable an assert when using winmm, on a CLOSED TREE. r=achronop
I changed a little bit your patch to do exactly what used to do in the case of winmm. I have a green try for that:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d25954187f89
Attachment #8789800 - Flags: review?(padenot)
Comment on attachment 8789800 [details] [diff] [review]
AudioStream disable the assert when backend ois winmm

Review of attachment 8789800 [details] [diff] [review]:
-----------------------------------------------------------------

I think that's an ok fix for now. It would be good to implement what Matthew says in [0].

[0]: https://github.com/kinetiknz/cubeb/issues/150#issuecomment-245473198
Attachment #8789800 - Flags: review?(padenot) → review+
Alex, FYI: I've backed out the two changesets (b095fb7c464fc61adef90c15f82cc1014071d17d and eedae4c591d7e194d9a231c9afe87e71a01e73e8) for this bug from libcubeb's upstream, landed the resulting libcubeb version update as part of bug 1274479, then restored your two patches to upstream.  When you're ready to reland the patches in this bug, make sure you update your libcubeb patch to a317ba01a7b9afa0ba31b77ab9419244ebc4cf23 (or later).
The problem I am seeing (and backed out for) is shown in the following link:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a1c467c647d1&selectedJob=27386825

The test start playing a media file with vid.currentTime > 0 (5 or 5.5 in the link). On vid.onseeking() event checks again the vid.currentTime and expects to be the same with the one set previously. With my patch this fails and the current time has a slightly bigger value. 

The failure is sporadic and executing the mochitest locally never fails. I have though a code sample that repro the case. Attached file "fragment_onseeked.html" repro the problem (the correct expected value is 2.23 when bigger the problem exists). I should mention that testing now after some days and with an updated repo I get much less failures than initially but the problem still exists.

By removing my patches one by one I realized that the cubeb patch creates the issue. This patch changed the cubeb_stream_start() method which is called by AudioStream::Start(). A small description of that change is that makes the PulseAudio fires the ouput audio callback only after the call of cubeb_stream_start() method.

The currentTime of MediaElement is given by MediaDecoder::mLogicalPosition data member and the update of the latter happens in main thread by MediaDecoder::UpdateLogicalPositionInternal() method. The onseeking even is raised from MediaDecoder::Seek() (if I do not miss something) also from the main thread. This somehow is affected by AudioStream::Start and my patch but it's very hard for me to find how.
Flags: needinfo?(achronop) → needinfo?(jwwang)
This is not an issue.

Sometimes v.currentTime is observed to be slightly greater than the seek target in the 'seeked' handler if the video is not paused. Because the state machine continues playback immediately after seek is completed and there is a race between the MediaDecoder::mLogicalPosition (which is updated by mCurrentPosition which is mirrored from the state machine thread) being updated on the main thread and 'seeked' being fired on the main thread.

Btw, Chrome also present the same issue.

We should use fuzzy compare (+-0.1) instead of exact compare in the test case.
Flags: needinfo?(jwwang)
Well, that's good news. The fact that it appears more often than before isn't something to worry? 

One more minor question, I guess you include the minus sign (-0.1) in the fuzzy compare because you do not know which is minuend and which the subtrahend? My understanding is that the current time inside seeked event can only be greater than the seek target.
Flags: needinfo?(jwwang)
1. I guess your code kinda changes the contention between threads.

2. Yes.
Flags: needinfo?(jwwang)
Attachment #8794095 - Flags: review?(jwwang) → review+
This made bug 1256079 a new frequent intermittent for Linux x64 opt and pgo. This didn't get backed out because these jobs are Tier 2, but despite that it should be fixed soon, else these mda jobs would get hidden.
Flags: needinfo?(achronop)
Depends on: 1305601
Bug 1256079 landed and merged in m-c.
Flags: needinfo?(achronop)
Depends on: 1325215
Ryan: I believe we can remove part of the patch and not all of it. According to the comments Bug 1325215 leak is caused by this specific push of the patch: 

https://hg.mozilla.org/mozilla-central/rev/b3fe9952f090. 

This change and a small part of the first push https://hg.mozilla.org/mozilla-central/rev/1db5e766a1ec (change inside pulse_stream_stop) can be removed independently of the rest of the patch. It was originally added to prevent an independent sporadic asan failure described in #c28.

If you agree I can create a push for Aurora.
Flags: needinfo?(ryanvm)
I'll defer to your judgement here. I certainly am not opposed to landing the most minimal patch we can, though :)
Flags: needinfo?(ryanvm)
Please also reenable the leak checking for Aurora crash tests when you do that.
Depends on: 1403043
You need to log in before you can comment on or make changes to this bug.