Closed Bug 952893 Opened 6 years ago Closed 6 years ago

[B2G] AudioStream (from AudioDestinationNode) Continues to Write Silent Samples until Killing the Dialer App

Categories

(Core :: Web Audio, defect, P1, critical)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla29
blocking-b2g 1.3+
Tracking Status
firefox27 --- wontfix
firefox28 --- fixed
firefox29 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: mchen, Assigned: roc)

References

Details

(Keywords: perf, power, regression, Whiteboard: [c=power p= s= u=1.3] [CR 595623] [CR 608536] [CR 606932])

Attachments

(6 files, 8 obsolete files)

361 bytes, text/html
Details
5.64 KB, patch
padenot
: review+
Details | Diff | Splinter Review
2.05 KB, patch
padenot
: review+
Details | Diff | Splinter Review
1.06 KB, patch
Details | Diff | Splinter Review
25.16 KB, patch
karlt
: review+
Details | Diff | Splinter Review
32.00 KB, patch
Details | Diff | Splinter Review
Device: Inari

Build: Gecko : HG changeset:   159455:85196889c598
       Gaia : e2e7add56f545f2086ab15525ba20b111e5e386f

STR
1. Launch dialer app.
2. No matter dialer app is on foreground or background.

Expected
There is no any PCM data output to audio backend if user didn't press the key.

Actual
There is always silent PCM data output to backend.

Impact
1. This will prevent device from going to deep suspend mode. The normal sleep current will be about 2mA without SIM card attached but it will be more then 40mA in this case.

2. Also impact the idle current from about 150mA to 160mA.


Additional Notes
Hi Ehsan,

I found this issue caused by Gaia to use Web Audio but maybe you can provide some suggestion from perspective of Gecko.

Thanks.
Flags: needinfo?(ehsan)
That depends on what the dialer app is doing in order to generate the sound.  For example, do you create the AudioContext on demand?
Flags: needinfo?(ehsan) → needinfo?(etienne)
Nominate to 1.3+.

User Impact:
  Once dialer app is launched in foreground or background, the suspend time of device will become shorter (ex: 1400 (mA/H) /  40 (mA/H) = 37 hours).

  (ideal case : 1400 / 10 = 140 Hours)
blocking-b2g: --- → 1.3?
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #2)
> That depends on what the dialer app is doing in order to generate the sound.
> For example, do you create the AudioContext on demand?

We create/trash it based on visibility changes, just tested and the methods [1] are properly called.
Is there something else we need to do to make sure the context gets properly trashed?

[1] https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/dialer/js/tone_player.js#L19-29
Flags: needinfo?(etienne)
One more information.

I added log into AudioContext's constructor, deconstruct and shutdown API.

  1. Launch dialer app : Constructor is called.
  2. Put dialer app into background : nothing.
  3. put dialer app into foreground : Constructor called again.
  4. repeat step 2 and step 3 : see constructor called again and again.
  5. kill dialer app : only shutdown is called by multiple times for these AudioContext created by step 2 to 4.

There are also memory leakage there I suspect it is related to strong reference between AudioContext, media stream and mediaStreamGraphic.
Hmm, then I'm not sure what keeps these objects alive without debugging.
This will impact the battery life seriously so I set it to p1 for v1.3. 

Hi Michael,

Did you notice this power issue on your side?
Severity: normal → critical
Flags: needinfo?(mvines)
Priority: -- → P1
Flags: needinfo?(mvines)
Flags: needinfo?(bhargavg1)
Reg power, only Rx PCM will get logged when dialer app is launched, for playing DTMF tones AFE port (RX Path) is opened and when there is no key pressed silent will get logged. And this shouldn't prevent any low power modes being reached. This behavior is same as in LA
Flags: needinfo?(bhargavg1)
Depends on: 955961
Blocks: 955961
No longer depends on: 955961
(In reply to bhargavg1 from comment #8)
> Reg power, only Rx PCM will get logged when dialer app is launched, for
> playing DTMF tones AFE port (RX Path) is opened and when there is no key
> pressed silent will get logged. And this shouldn't prevent any low power
> modes being reached. This behavior is same as in LA

This is not related to LPA.

What I checked with kernel message is listed as below.

  <6>[01-08 09:08:20.861] [2455: kworker/u:1]active wake lock adsp
  <6>[01-08 09:08:20.861] [2455: kworker/u:1]active wake lock audio_pcm

It shows that when dialer app is launched without any additional action and user presses power key to suspend.
The system can't go into power collapse mode. (the measured current report is on description)
Maybe you can also check it from power monitor.
Flags: needinfo?(bhargavg1)
does not seem like a dialer bug
change component to general
Component: Gaia::Dialer → General
(In reply to Joe Cheng [:jcheng] from comment #10)
> does not seem like a dialer bug
> change component to general

If this isn't a Gaia bug, then it's likely a Web Audio bug.
Component: General → Web Audio
Product: Firefox OS → Core
blocking+ also, as this has severe impact to power consumption.
blocking-b2g: 1.3? → 1.3+
Keywords: perf, power, regression
Adding regression keyword this is considered a fallout from switching over to Web Audio.
Rob -- We need a fix for this pretty quickly, and Karl and Paul are pretty swamped with other bugs.  Can I give this bug to you?  If you think you won't have time for this bug this week, please let me know so I can reassign it to someone else.  Thanks.
Assignee: nobody → roc
Flags: needinfo?(roc)
(In reply to Marco Chen [:mchen] from comment #9)
> (In reply to bhargavg1 from comment #8)
> > Reg power, only Rx PCM will get logged when dialer app is launched, for
> > playing DTMF tones AFE port (RX Path) is opened and when there is no key
> > pressed silent will get logged. And this shouldn't prevent any low power
> > modes being reached. This behavior is same as in LA
> 
> This is not related to LPA.
> 
> What I checked with kernel message is listed as below.
> 
>   <6>[01-08 09:08:20.861] [2455: kworker/u:1]active wake lock adsp
>   <6>[01-08 09:08:20.861] [2455: kworker/u:1]active wake lock audio_pcm
> 
> It shows that when dialer app is launched without any additional action and
> user presses power key to suspend.
> The system can't go into power collapse mode. (the measured current report
> is on description)
> Maybe you can also check it from power monitor.

Yes I see that we aren't hitting the low power modes.

FYI, Another quick way to check low power modes w/o power monitor is to check for,

cat /sys/kernel/debug/rpm_stats 

vmin ctr should be incrementing, (ofcourse USB has to be disconnected for entering into low power)
Flags: needinfo?(bhargavg1)
(In reply to Marco Chen [:mchen] from comment #5)
> One more information.
> 
> I added log into AudioContext's constructor, deconstruct and shutdown API.
> 
>   1. Launch dialer app : Constructor is called.
>   2. Put dialer app into background : nothing.
>   3. put dialer app into foreground : Constructor called again.
>   4. repeat step 2 and step 3 : see constructor called again and again.
>   5. kill dialer app : only shutdown is called by multiple times for these
> AudioContext created by step 2 to 4.
> 
> There are also memory leakage there I suspect it is related to strong
> reference between AudioContext, media stream and mediaStreamGraphic.

Did a cycle collection run at any point during this process? Cycle collection has to run before any AudioContexts will be destroyed, and we can run for quite a long time between cycle collections. Can you try triggering one (about:memory can trigger one via "minimize memory usage", but I don't know how to do that on b2g)?

Maybe we should "minimize memory usage" automatically when (or soon after) an app has gone into the background? I don't know what GC heuristics we're using in B2G.
Flags: needinfo?(roc)
> Did a cycle collection run at any point during this process? Cycle
> collection has to run before any AudioContexts will be destroyed, and we can
> run for quite a long time between cycle collections. Can you try triggering
> one (about:memory can trigger one via "minimize memory usage", but I don't
> know how to do that on b2g)?

https://github.com/mozilla-b2g/B2G/blob/master/tools/get_about_memory.py is the script used to get memory reports on B2G, and you can pass it the --minimize flag, which triggers minimization before measuring.

> Maybe we should "minimize memory usage" automatically when (or soon after)
> an app has gone into the background?

I'm pretty sure we do that, and if we don't, we should. mwu, do you know?
Flags: needinfo?(mwu)
We also cycle collect roughly every 10 seconds when the browser is active.
Not sure. (I'd normally ask jlebar)
Flags: needinfo?(mwu)
I performed "./get_about_memory.py --minimize" during this issue happened and dialer app on the background.
This issue is still there.
Flags: needinfo?(mchen)
After hooking to addref and release functions, the ref count will be remained to 1 after app on the background.

The sequence of addref and release when launching is
  1. addref by AudioDestinationNode
  2. addref by AudioContext::Constructor() returned a already_AddRefed<AudioContext>
  3. addref by AudioContextBinding::Wrap()
  
  4. release by 2 is out of scope in AudioContextBinding::_construct.

  5. addref by AudioContext::CreateGain()

The sequence when app falls to background is
  6. release by GC to triggered from "memory-pressure"
  7. release by CC to reclaim AudioDestinationNode -> AudioNode::unlink().
  5.
Thank you! It sounds like we were able to collect the DestinationNode but the GainNode leaked.
Although we create a lot more nodes than just a GainNode and DestinationNode, so I wonder why you didn't see AddRefs for those.
What buttons are you pressing to make this bug happen? Are you pressing buttons that invoke _startAt() with duration == 0? Does the bug go away if you don't press such buttons?

Also, it seems to me that if you call _startAt() twice with duration == 0, we leak the GainNode and oscillators and never stop playing them. But I guess that's not happening here since it should be audible.

What we really need to do here is dump the CC graph so we can see why CC isn't collecting the AudioContext. njn, do you know how to do that with B2G?
Flags: needinfo?(n.nethercote)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #24)
> Although we create a lot more nodes than just a GainNode and
> DestinationNode, so I wonder why you didn't see AddRefs for those.
The reproduces steps of this bug is
  1. Launch dialer app in the foreground.
  2. Then just press home to put dialer app into background.

In this cause, just destination and gain nodes are created.
https://github.com/mozilla-b2g/gaia/blob/7e84e2342b81cca00694efa5245d6e0790228f27/apps/communications/dialer/js/tone_player.js#L24
OK, that's interesting. Thanks.

We can definitely collect a simple setup like that in a simple test. So the CC dump would be a very useful way to proceed.

An addref/release log for the GainNode might also be enough.
The suggestion from Rob in comment 25 will try to figure out why AudioContext didn't be collected when dialer app in background.

But there is still an issue we need to solve when dialer app is on the foreground.
Now once dialer app is on the foreground (we expected AudioContext alive in gaia code now), the silent PCM frames will be pushed from MediaStreamGraph even user didn't press any buttons. This will be a problem also.
We guess we can pause the cubeb output stream when the destination node has nothing connected to it.
... but that might hurt latency when a new sound is started.
If we allow silent frames to keep going then system can't go into power collapse mode due to AudioFlinger held a wakelock for PCM coming. We must avoid such situation or power consumption will be impacted deeply.
> What we really need to do here is dump the CC graph so we can see why CC
> isn't collecting the AudioContext. njn, do you know how to do that with B2G?

The GC/CC logs are collected by default by the get_about_memory.py script: https://github.com/mozilla-b2g/B2G/blob/master/tools/get_about_memory.py#L9
Flags: needinfo?(n.nethercote)
I know.

I think we should do these things:
1) Block the DestinationNode's MediaStream while it has no inputs connected. To ensure audioContext's currentTime keeps running, we'll measure real time while the DestinationNode is blocked and add that to the currentTime.
This probably won't help unless we fix the leak of the GainNode.
2) While a MediaStream with audio output is blocked, we'll pause the cubeb stream.
3) When all MediaStreams in a MediaStreamGraph are blocked, we'll allow the entire graph to go to sleep. Right now we keep it running to keep feeding silence to the audio output streams.
These two patches should fix this bug *if* we also change TonePlayer to create a fresh GainNode in every start() instead of sharing a single GainNode and keeping it around, and we make sure that those GainNodes don't leak (for which we probably still need to collect a CC graph for debugging).

We could expand these optimizations to detect more "no audio is currently being produced" cases, but I think doing that in the MSG itself makes more sense and I want to fix bug 897796 first.
Attachment #8359045 - Attachment is obsolete: true
Attachment #8359045 - Flags: review?(padenot)
Attachment #8359046 - Attachment is obsolete: true
Attachment #8359046 - Flags: review?(padenot)
Attachment #8359235 - Attachment description: Part 2: Block the AudioDestinationNode when it's the only node in the AudioContext (rebased) → Part 2: When all streams in the MSG are blocked, allow the MSG to suspend indefinitely and pause AudioStream outputs while suspended (rebased)
Comment on attachment 8359236 [details] [diff] [review]
Part 1: Block the AudioDestinationNode when it's the only node in the AudioContext (rebased)

Review of attachment 8359236 [details] [diff] [review]:
-----------------------------------------------------------------

Consider this scenario:

- The author creates an AudioContext at the start of the page
- compressed data are loaded from XHR, this is quite long
- decodeAudioData is then used to decode the assets for playback
- the real application starts after the loading time: the first AudioNode is created

At this point, ExtraCurrentTime() returns the amount of time we spent loading the assets, and everything is delayed by ExtraCurrentTime() seconds.

Maybe we should special-case the time between the creation of the context and the creation of the first node.

Also, trying to troubleshot the delay, I found that for some reason, we always have another graph iteration scheduled when we would like to pause the graph because there is only the destination node, so we don't pause the AudioStream.

Attached is the minimal testcase that does _approximately_ what the dialer do.
Attachment #8359236 - Flags: review?(padenot)
Attached file testcase
(In reply to Paul Adenot (:padenot) from comment #39)
> - The author creates an AudioContext at the start of the page
> - compressed data are loaded from XHR, this is quite long
> - decodeAudioData is then used to decode the assets for playback
> - the real application starts after the loading time: the first AudioNode is
> created
> 
> At this point, ExtraCurrentTime() returns the amount of time we spent
> loading the assets, and everything is delayed by ExtraCurrentTime() seconds.
> 
> Maybe we should special-case the time between the creation of the context
> and the creation of the first node.

You mean all audio output is delayed by ExtraCurrentTime() seconds? I don't see anything like that. My testcase:

<script>
var ac = new AudioContext();
function addNode() {
  var buf = ac.createBuffer(1, 48000, 48000);
  var d = buf.getChannelData(0);
  for (var i = 0; i < d.length; ++i) {
    d[i] = Math.sin((i/48000)*4500*Math.PI)*0.3;
  }
  var beep = ac.createBufferSource();
  beep.buffer = buf;
  beep.connect(ac.destination);
  beep.start(0);
}
</script>
<p> Wait some time, and then hit the button. </p>
<button onclick="addNode()"> add node </button>

The sounds plays immediately when I click.

> Also, trying to troubleshot the delay, I found that for some reason, we always have another graph iteration
> scheduled when we would like to pause the graph because there is only the destination node, so we don't pause the
> AudioStream.

Yes, I'm also seeing a problem like that. Looking into it.
Attached patch Part 1 v2 (obsolete) — Splinter Review
Fixed a bug where we kept adding the elapsed time since blocking started to mExtraCurrentTime, so if you query AudioContext.currentTime regularly it grows quadratically.
Attachment #8359236 - Attachment is obsolete: true
Attachment #8360328 - Flags: review?(padenot)
Attached patch Part 1 v3 (obsolete) — Splinter Review
Rebased
Attachment #8360328 - Attachment is obsolete: true
Attachment #8360328 - Flags: review?(padenot)
Attachment #8360337 - Flags: review?(padenot)
Attached patch Part 2 v2Splinter Review
My rebase :-). Patch is unchanged.
Attachment #8359235 - Attachment is obsolete: true
Attachment #8359235 - Flags: review?(padenot)
Attachment #8360338 - Flags: review?(padenot)
In my test, this is the bug that caused the MSG to not actually sleep. We weren't setting allBlockedForever because the MediaStream's mBlocked had mCurrent true but also had an mChange entry setting the value to true, so we thought the mBlocked value was going to change.
Attachment #8360339 - Flags: review?(padenot)
Comment on attachment 8360337 [details] [diff] [review]
Part 1 v3

Review of attachment 8360337 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/webaudio/AudioDestinationNode.h
@@ +92,5 @@
>    nsCOMPtr<nsIAudioChannelAgent> mAudioChannelAgent;
>  
>    // Audio Channel Type.
>    AudioChannel mAudioChannel;
> +<<<<<<< /home/roc/mozilla-central/content/media/webaudio/AudioDestinationNode.h

One of the rebase didn't go that well apparently :)
Status: NEW → ASSIGNED
Whiteboard: [c=power p= s= u=1.3]
Assignee: roc → paul
Assignee: paul → roc
Comment on attachment 8360337 [details] [diff] [review]
Part 1 v3

Review of attachment 8360337 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the rebase fixed.
Attachment #8360337 - Flags: review?(paul) → review+
Attachment #8360338 - Flags: review?(paul) → review+
Attachment #8360339 - Flags: review?(paul) → review+
Duplicate of this bug: 963220
Blocks: 942267
Duplicate of this bug: 964026
Ryan

We need an uplift here for 1.3
Flags: needinfo?(ryanvm)
There's nothing landed on mozilla-central here to uplift. As usual, I'll take care of it once there is.
Flags: needinfo?(ryanvm)
Whiteboard: [c=power p= s= u=1.3] → [c=power p= s= u=1.3] [CR 595623]
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #52)
> There's nothing landed on mozilla-central here to uplift. As usual, I'll
> take care of it once there is.

Hi Paul, can you help move this along?  This is a critical fix for us on 1.3.

Thanks,

-Greg
Flags: needinfo?(paul)
roc was on vacations. I believe he's back, now. I'll rebase and land those patches tomorrow if it's not already done.
Flags: needinfo?(paul)
Could we please get a v1.3 version of these patches today so we can check if the issues we attribute to this bug are resolved?
Flags: needinfo?(paul)
I have a rebased patch for mozilla-central but I can't land it there right now because inbound is closed.
Can we get a v1.3 patch attached to this bug in parallel please?  Apparently the m-c patch doesn't apply cleanly.
Attached patch Rollup patch for b2g28 branch (obsolete) — Splinter Review
Someone could can try this patch on the b2g28 branch.
Ta, :tk is on it
Flags: needinfo?(paul) → needinfo?(tkundu)
Whiteboard: [c=power p= s= u=1.3] [CR 595623] → [c=power p= s= u=1.3] [CR 595623] [CR 608536]

(In reply to Michael Vines [:m1] [:evilmachines] from comment #59)
> Ta, :tk is on it

It works fine . Communication app is not running AudioTrack thread after ending call on phone.
Flags: needinfo?(tkundu)
Whiteboard: [c=power p= s= u=1.3] [CR 595623] [CR 608536] → [c=power p= s= u=1.3] [CR 595623] [CR 608536] [CR 606932]
(In reply to Tapas Kumar Kundu from comment #60)
> 
> (In reply to Michael Vines [:m1] [:evilmachines] from comment #59)
> > Ta, :tk is on it
> 
> It works fine . Communication app is not running AudioTrack thread after
> ending call on phone.

It sounds like we have a confirmed fix, and inbound is now open.  I'll ask Paul (:padenot) to land this on inbound and Ryan can then uplift it to Aurora.
Summary: AudioStream (from AudioDestinationNode) Continues to Write Silent Samples until Killing the Dialer App → [B2G] AudioStream (from AudioDestinationNode) Continues to Write Silent Samples until Killing the Dialer App
Backed out for test failures on all platforms.
https://hg.mozilla.org/integration/mozilla-inbound/rev/27c8e496895a

https://tbpl.mozilla.org/php/getParsedLog.php?id=33755339&tree=Mozilla-Inbound

Please make sure the Aurora/b2g28 patch gets updated too :)
Fixed those errors with two simple changes:
-- Disabled part 1 for offline audio contexts since it's just wrong. We shouldn't be accumulating ExtraCurrentTime before StartRendering is called, and we shouldn't be suspending the destination node's stream anyway while there are no input nodes, since the resulting silence must be accumulated in the OfflineAudioContext result buffer instead of being thrown away. That fixes the test_mediaDecoding failures.
-- The test_audioParamTimelineDestinationOffset failures are due to AudioParam setter methods not compensating for ExtraCurrentTime.
Attached patch Part 1 v4 (obsolete) — Splinter Review
Attachment #8360337 - Attachment is obsolete: true
Attachment #8367784 - Flags: review?(karlt)
Attached patch Part 1 v5Splinter Review
Attachment #8367784 - Attachment is obsolete: true
Attachment #8367784 - Flags: review?(karlt)
Attachment #8367785 - Flags: review?(karlt)
Comment on attachment 8367785 [details] [diff] [review]
Part 1 v5

I reviewed the changes from v3.
Attachment #8367785 - Flags: review?(karlt) → review+
Try is taking forever to finish, but previously failing tests now succeed.  Since this fix is very much wanted by our B2G folks ASAP, I've asked Jesup to land the patches on inbound now and see if they stick.
https://tbpl.mozilla.org/?tree=Try&rev=eeb07dafc25b is looking successful on a few platforms.  I don't know whether or not that depends on changes from bug 961996 and bug 941873.
That try push is looking good so I pushed the lot to inbound.
Thanks, Rob -- Can you create an updated rollup patch for b2g28 (v1.3)?
Flags: needinfo?(roc)
Attached patch rollup for b2g28Splinter Review
Attachment #8366994 - Attachment is obsolete: true
Flags: needinfo?(roc)
Ryan - Can you land the rollup patch for us once we're on mozilla-central?
Flags: needinfo?(ryanvm)
(In reply to Maire Reavy [:mreavy] from comment #76)
> Ryan - Can you land the rollup patch for us once we're on mozilla-central?

As always ;)
This hasn't merged to m-c yet, but it's going to stick.

https://hg.mozilla.org/releases/mozilla-aurora/rev/f98bf9a4df97
Flags: needinfo?(ryanvm)
https://hg.mozilla.org/mozilla-central/rev/1424d2e16349
https://hg.mozilla.org/mozilla-central/rev/62a1f3ee8c28
https://hg.mozilla.org/mozilla-central/rev/8d4c383026cd
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Duplicate of this bug: 955961
Depends on: 970773
Duplicate of this bug: 974844
in-testsuite? here to track that we need a test for the regression in bug 988760.
(I would normally mark the regression bug but 988760 is not in the Web Audio component, so I would never find it.)
Depends on: 988760
Flags: in-testsuite+ → in-testsuite?
Depends on: 1053011
You need to log in before you can comment on or make changes to this bug.