Closed Bug 1152439 Opened 9 years ago Closed 9 years ago

[Crash] [@ mozilla::StreamBuffer::TrackIter::FindMatch | mozilla::MediaStreamGraphImpl::UpdateStreamOrder ]

Categories

(Firefox OS Graveyard :: Stability, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ntroast, Assigned: slee)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 566][caf priority: p3][CR 819442])

Attachments

(10 files, 3 obsolete files)

We observed the following crash signature during testing.

[@ mozilla::StreamBuffer::TrackIter::FindMatch | mozilla::MediaStreamGraphImpl::UpdateStreamOrder | mozilla::MediaStreamGraphImpl::UpdateGraph | mozilla::MediaStreamGraphImpl::OneIteration ]

Cafbot will upload the decoded minidump and extra file.

this crash was produced during stability tests which involves monkey testing for several hours and there is no clear STR for this. If we are not able to identify the issue using provided logs then please feel free to provide us a debug patch with additional logging to identify the issue.
Whiteboard: [CR 819442]
Whiteboard: [CR 819442] → [caf priority: p1][CR 819442]
Whiteboard: [caf priority: p1][CR 819442] → [b2g-crash][caf-crash 566][caf priority: p1][CR 819442]
Keywords: crash
Hi Maire,

Please have Paul or someone else on your team pick this up. It's affecting our ability to reach the fxOS 2.2 MTBF goal per CAF's tests. If this isn't code your team handles please help route this to the team that does.

Thanks!
Mike
Flags: needinfo?(mreavy)
Please see bug 1152431 comment 7: https://bugzilla.mozilla.org/show_bug.cgi?id=1152431#c7. We currently believe that this bug and that one have the same root cause.

We really need a regression range from someone who can easily repro this.  This crash does not appear to reproduce on Nightly.  (I asked for help getting a regression range in the other bug.)
Flags: needinfo?(mreavy)
Thanks Maire.

(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7)
> 
> We really need a regression range from someone who can easily repro this. 
> This crash does not appear to reproduce on Nightly.

Nick,
Can CAF provide a regression range for this issue?

Thanks,
Mike
Flags: needinfo?(ntroast)
blocking-b2g: 2.2? → 2.2+
(In reply to Mike Lee [:mlee] from comment #8)
> Thanks Maire.
> 
> (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7)
> > 
> > We really need a regression range from someone who can easily repro this. 
> > This crash does not appear to reproduce on Nightly.
> 
> Nick,
> Can CAF provide a regression range for this issue?
> 
> Thanks,
> Mike

We have seen this issue once on March 13th and once on April 8th
Flags: needinfo?(ntroast)
Thanks Nick. Is it possible to provide more detailed build and commit information for last known good and first failed runs similar to what's provided in comment 3?

(In reply to cafbot (PoC: ggrisco) from comment #3)
> Observed on: 
> 
> Device: msm8909
> Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123
> Moz BuildID: 20150405002503
> Manifest:
> https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/
> caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123.xml?h=release
> Gecko Version: 37.0
> Gaia: 
> http://git.mozilla.org/?p=releases/gaia.git;a=commit;
> h=a6351e1197d54f8624523c2db9ba1418f2aa046f
> Gecko:
> http://git.mozilla.org/?p=releases/gecko.git;a=commit;
> h=6bb2afcce9872a7cbc65b4a58f752e2d5ac02345
> Patches: bug 1145724, bug 1143694, bug 1146987, bug 1133398, bug 1152095,
> bug 1150924, bug 1133147, bug 1150271, bug 1150916

Thanks,
Mike
Flags: needinfo?(ntroast)
I'm sorry. The tester did provide a note that they were randomly changing the ringtones when this crash occurred.

We saw this on AU 100 back on March 13th.
Flags: needinfo?(ntroast)
Could bug 1107534 be related?
What do you think? is bug 1107534 related to this bug?
Flags: needinfo?(slee)
(In reply to Ken Chang[:ken] from comment #13)
> What do you think? is bug 1107534 related to this bug?

It looks like dangling pointer. For the minidump below, it crashed at 0x5a5a5a62 and r2 is 0x5a5a5a5a. It means that r2 has already been deleted somewhere. 

Hi ntroast,

Why do you think this bug is related to bug 1107534?

Crash address: 0x5a5a5a62
Thread 22 (crashed)
 0  libxul.so!mozilla::StreamBuffer::TrackIter::FindMatch [StreamBuffer.h : 95 + 0x0]
     r0 = 0xafd83b48    r1 = 0xb37dc680    r2 = 0x5a5a5a5a    r3 = 0x00000001
Flags: needinfo?(slee) → needinfo?(ntroast)
I see the Ringtones app in all of the logs, and that was the most recent bug that I know of that dealt with the Ringtones app. Just trying to look at the big picture here since we are seeing a lot of Ringtones app in these MSG bugs.

(In reply to StevenLee[:slee] from comment #17)
> (In reply to Ken Chang[:ken] from comment #13)
> > What do you think? is bug 1107534 related to this bug?
> 
> It looks like dangling pointer. For the minidump below, it crashed at
> 0x5a5a5a62 and r2 is 0x5a5a5a5a. It means that r2 has already been deleted
> somewhere. 
> 
> Hi ntroast,
> 
> Why do you think this bug is related to bug 1107534?
> 
> Crash address: 0x5a5a5a62
> Thread 22 (crashed)
>  0  libxul.so!mozilla::StreamBuffer::TrackIter::FindMatch [StreamBuffer.h :
> 95 + 0x0]
>      r0 = 0xafd83b48    r1 = 0xb37dc680    r2 = 0x5a5a5a5a    r3 = 0x00000001
Flags: needinfo?(ntroast)
We were seeing crashes like this even before bug 1107534 was fixed.  I used to be able to crash it manually by simply changing the ringtone selection multiple times.  But then at some point I wasn't able to reproduce it manually anymore.  Now we are seeing multiple crashes in ringtone with several different signatures during stability test runs.  I'll open a couple new bugs to share those logs.
Regression-window was added to a specific party that's not QAnalysts (see comment 8). Also, this seems to be happening on a device that we don't have. Currently we have ringtone related bugs on Flame device: bug 1147386 and bug 1139157.

Adding keyword to exclude this in our queries.
QA Whiteboard: QAExclude
Flags: needinfo?(ktucker)
QA Whiteboard: QAExclude → [QAnalyst-Triage+] QAExclude
Flags: needinfo?(ktucker)
(In reply to Greg Grisco from comment #19)
> We were seeing crashes like this even before bug 1107534 was fixed.  I used
> to be able to crash it manually by simply changing the ringtone selection
> multiple times.  But then at some point I wasn't able to reproduce it
About when? Maybe I can checkout the corresponsive code and see if I can reproduce on nexus.

> manually anymore.  Now we are seeing multiple crashes in ringtone with
> several different signatures during stability test runs.  I'll open a couple
> new bugs to share those logs.
Can you cc me in these bugs?
(In reply to StevenLee[:slee] from comment #21)
> (In reply to Greg Grisco from comment #19)
> > We were seeing crashes like this even before bug 1107534 was fixed.  I used
> > to be able to crash it manually by simply changing the ringtone selection
> > multiple times.  But then at some point I wasn't able to reproduce it
> About when? Maybe I can checkout the corresponsive code and see if I can
> reproduce on nexus.

I think I was seeing this on AU77 and then it went away for a long time.  You could try AU77 which had these gaia/gecko revisions:

gaia: da509caa7395d3d090ce973e8de082b4680a590d
gecko: 782bc95dce066e2b262f556adf947a9a1409e6a0

> > manually anymore.  Now we are seeing multiple crashes in ringtone with
> > several different signatures during stability test runs.  I'll open a couple
> > new bugs to share those logs.
> Can you cc me in these bugs?

Sure, I'll cc: you on these:

bug 1154030, bug 1154042, bug 1154051, bug 1152439, bug 1150918
Attached file logcat
Hi roc, 
I removed some unused log(should be from the vendor) for making the log easier to read. I found that some MediaStream was deleted more than once, ex 0xb094e280 is deleted at least thrice. Is that correct? Could it cause some problem?
Flags: needinfo?(roc)
(In reply to StevenLee[:slee] from comment #23)
> I removed some unused log(should be from the vendor) for making the log
> easier to read. I found that some MediaStream was deleted more than once, ex
> 0xb094e280 is deleted at least thrice. Is that correct? Could it cause some
> problem?

Sorry, it's hard to tell because between two destruction calls for a particular address, the object might have been constructed at that address. I guess we should log the constructor too.

The mention of bug 1107534 and the Ringtones app makes me wonder whether this is a bug where we connect streams from different MSGs. I can't exactly see how the Ringtones app would trigger this, but it does create perhaps more than one new AudioContext and connect the same element to each one. I'm pretty sure that we don't have anything preventing an element from being connected to two different MSGs and then crashing (since the element's decoder MediaStream would be created in one MSG and then connected to a MediaStream in a different graph). Alastor, can you test creating two AudioContexts with different channel types and connecting the same element to both of them? We need to block that!
Flags: needinfo?(roc) → needinfo?(alwu)
Attached patch more assertions, more logging (obsolete) — Splinter Review
This patch supercedes the previous patch in bug 1149494. It adds logging for MediaStream constructors as well as destructors. It also adds some asserts to check that we don't connect MediaStreams between different MediaStreamGraphs.

Can we get this into the CodeAurora tests?
Flags: needinfo?(ntroast)
Attached file patch (obsolete) —
Hi roc,

As I know, CAF does not enable DEBUG. Do you think MOZ_CRASH help?
Attachment #8592836 - Flags: feedback?(roc)
Comment on attachment 8592836 [details]
patch

I uploaded the wrong patch.
Attachment #8592836 - Attachment is obsolete: true
Attachment #8592836 - Flags: feedback?(roc)
Comment on attachment 8592739 [details] [diff] [review]
more assertions, more logging

Review of attachment 8592739 [details] [diff] [review]:
-----------------------------------------------------------------

As comment 26, CAF doesn't enable DEBUG. We should use other way to log.
OK, I will test it.
keep ni for tracking.
Attached patch msgTest.patchSplinter Review
I write a test app to connect two differences MSG, but it didn't crash.
I will take a look for more details.
Flags: needinfo?(alwu)
Here's an updated patch using MOZ_RELEASE_ASSERT that should work in non-DEBUG.
Attachment #8592739 - Attachment is obsolete: true
See Also: → 1152431
I added attachment 8593277 [details] [diff] [review] to our build. I also removed the patch from bug 1150923 due to conflict, and this one I believe supersedes it.

It should be included within the next two AUs
Flags: needinfo?(ntroast)
I found some operations that could cause memory pressure. I'm not sure whether it could cause the problem. 

Repeat doing the 3 steps. 
1. Open ringtone app from settings
2. Select a ringtone to preview
3. go to homescreen

In the ringtone app, it creates an AudioElement in [1]. In step 2, it creates an AudioContext and connect to the AudioElement, [2]. In step 3, AudioContext is set to null, [3], but cycle collector finds that it cannot delete the AudioContext. That's because the AudioContext is pointed by the AudioElement. 

[1] https://git.mozilla.org/?p=releases/gaia.git;a=blob;f=apps/ringtones/js/tone_player.js;h=aa92cd42d894dc991a23ba99f9282c5cc7ca8df8;hb=refs/heads/v2.2#l17
[2] https://git.mozilla.org/?p=releases/gaia.git;a=blob;f=apps/ringtones/js/tone_player.js;h=aa92cd42d894dc991a23ba99f9282c5cc7ca8df8;hb=refs/heads/v2.2#l146
[3] https://git.mozilla.org/?p=releases/gaia.git;a=blob;f=apps/ringtones/js/tone_player.js;h=aa92cd42d894dc991a23ba99f9282c5cc7ca8df8;hb=refs/heads/v2.2#l151
Whiteboard: [b2g-crash][caf-crash 566][caf priority: p1][CR 819442] → [b2g-crash][caf-crash 566][caf priority: p3][CR 819442]
(In reply to StevenLee[:slee] from comment #34)
> I found some operations that could cause memory pressure. I'm not sure
> whether it could cause the problem. 
> 
> Repeat doing the 3 steps. 
> 1. Open ringtone app from settings
> 2. Select a ringtone to preview
> 3. go to homescreen
> 
> In the ringtone app, it creates an AudioElement in [1]. In step 2, it
> creates an AudioContext and connect to the AudioElement, [2]. In step 3,
> AudioContext is set to null, [3], but cycle collector finds that it cannot
> delete the AudioContext. That's because the AudioContext is pointed by the
> AudioElement. 

Are we permanently leaking? What keeps the AudioElement alive?
Flags: needinfo?(slee)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #35)
> Are we permanently leaking? What keeps the AudioElement alive?
No, but as long as the AudioElement is alive, all the MediaStreams(held by AudioContext) will not be deleted. In a simplified situation is that Gaia developers may create an AudioElement and input this AudioElement to many AudioContext. As long as the AudioElement is alive, all the AudioContexts cannot be deleted.

In this case, TonePlayer is created in a page and this page is always alive(except we kill ringtone app).
1. TonePlayer creates one AudioElement. 
2. When users select one ringtone to preview, it creates an AudioContext and link this AudioElement to the AudioContext. 
3. When it detects the window is hidden, it sets AudioContext to null.

If we repeat 2 and 3(go to homescreen then go back to ringtone app and preview), we get many AudioContexts that are related to an AudioElement.
Flags: needinfo?(slee)
OK, how does the AudioElement keep links to many AudioContexts?
Flags: needinfo?(slee)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #37)
> OK, how does the AudioElement keep links to many AudioContexts?
In [1], this._player is the AudioElement. Since TonePlayer keeps the same AudioElement and TonePlayer keeps alive, the reference count of AudioContext will never be zero. 

[1] https://git.mozilla.org/?p=releases/gaia.git;a=blob;f=apps/ringtones/js/tone_player.js;h=aa92cd42d894dc991a23ba99f9282c5cc7ca8df8;hb=refs/heads/v2.2#l147
Flags: needinfo?(slee)
Hmm, that doesn't answer what I asked. I think the answer to my question is that HTMLMediaElement::mOutputStreams never has elements removed from it (except for when they have mFinishWhenEnded set, which they won't here), so that array will just keep growing with new streams every time createMediaElementSource is called.

We should fix that, but I think a fix will be quite risky, so it would be good to find a workaround as well.

It seems to me that _setExclusiveMode(false) is supposed to cause the ringer AudioContext to be (eventually) destroyed, letting background music play again. However, currently it seems to me the ringer AudioContext will never be destroyed so that doesn't work. Am I right?

If so, then we might fix the crashes just by making _setExclusiveMode(false) not do anything and _setExclusiveMode(true) not creating a new AudioContext and MediaElementAudioSourceNode, if they already exist. How does that sound?
Flags: needinfo?(slee)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #39)
> Hmm, that doesn't answer what I asked. I think the answer to my question is
> that HTMLMediaElement::mOutputStreams never has elements removed from it
> (except for when they have mFinishWhenEnded set, which they won't here), so
> that array will just keep growing with new streams every time
> createMediaElementSource is called.
Sorry for late response, I spent some time tracing and verifying the scenario. Yes, I found that if I force mFinishWhenEnded to true, AudioContext will be deleted on the same operations. 

> It seems to me that _setExclusiveMode(false) is supposed to cause the ringer
> AudioContext to be (eventually) destroyed, letting background music play
> again. However, currently it seems to me the ringer AudioContext will never
> be destroyed so that doesn't work. Am I right?
> 
> If so, then we might fix the crashes just by making _setExclusiveMode(false)
> not do anything and _setExclusiveMode(true) not creating a new AudioContext
> and MediaElementAudioSourceNode, if they already exist. How does that sound?
Sure, I also want to fix the problem from the app side. I will upload a patch later. Thanks for the suggestion.
Flags: needinfo?(slee)
Attached file prevent creating multiple AudioContext (obsolete) —
Hi David,

Per comment 36 and comment 39, we want ringtone app not to create multiple AudioContext. Please help review this patch. Thanks.
Attachment #8596519 - Flags: review?(dflanagan)
Sorry, I pasted the wrong link.
Attachment #8596519 - Attachment is obsolete: true
Attachment #8596519 - Flags: review?(dflanagan)
Attachment #8596520 - Flags: review?(dflanagan)
Comment on attachment 8596520 [details] [review]
prevent creating multiple AudioContext

Passing this review to Jim Porter who is in charge of the Ringtones app.
Attachment #8596520 - Flags: review?(dflanagan) → review?(squibblyflabbetydoo)
I think this is a UX change (the AudioContext stuff is designed to mute all background audio once you start previewing a ringtone, and then let the other audio resume as soon as the ringtones app is minimized), so you'll need UX feedback before I can do much with this. Personally, if things are so busted that we can't just fix the Gecko issue for 2.2, I'd rather we drop the AudioContext code entirely.

Tiffanie Shakespeare would probably be a good person to ask about the UX side.
Assignee: nobody → slee
Hi, Jim,
After landing the bug1027172, the audio context would be muted when its destination node doesn't have input.
Therefore, the UX designed feature doesn't take effect now. I think we need other methods to implement this feature.
If that's the case (I haven't tested it), let's just drop the AudioContext stuff entirely, since it doesn't do anything. I guess we could feed the AudioContext some empty audio data after the ringtone stops, but I've never been quite sure how to do that with AudioContext, and I'm not sure it's worth the effort anyway.

For 3.0, it sounds like this will be fixed with the new audio competing policy code.
Alastor, are you or Steven working on what Jim said in comment 46? Or do you have any other idea?
Hi, Ken,
We don't involve on this preview feature. I think that might be solved by Jim's proposal which is to send the AudioContext some empty audio data after the ringtone stops. 
As I know, the audio competing policy is not related with solving that problem. Because we just move the decision logic from the Gecko to Gaia, the background music would still be resume in the new competing rule when the ringer previews ended.
Hi Nicholas,

As I know, Greg resolved many crash problem as "WORKSFORME". Can you still reproduce this issue? If not, I will remove the blocker flag. 

Thanks.
Flags: needinfo?(ntroast)
(In reply to StevenLee[:slee] from comment #49)
> Hi Nicholas,
> 
> As I know, Greg resolved many crash problem as "WORKSFORME". Can you still
> reproduce this issue? If not, I will remove the blocker flag. 
> 
> Thanks.

Hi Steven, we haven't seen this crash since AU 126.  I didn't close this bug in particular because it looked like there was a patch being worked on, but I'm ok if you want to close it for now until it reproduces again.
Flags: needinfo?(ntroast)
(In reply to Greg Grisco from comment #50)
> Hi Steven, we haven't seen this crash since AU 126.  I didn't close this bug
> in particular because it looked like there was a patch being worked on, but
> I'm ok if you want to close it for now until it reproduces again.
Greg, thanks for the information.
blocking-b2g: 2.2+ → ---
No longer blocks: CAF-v2.2-metabug
Steven & Jim,

Are you actively working on patch attachment 8596520 [details] [review] [1]? If not since the reported issue isn't appearing in CAF testing, please close this.

Thanks,
Mike
Flags: needinfo?(squibblyflabbetydoo)
Flags: needinfo?(slee)
(In reply to Mike Lee [:mlee] from comment #52)
> Steven & Jim,
> 
> Are you actively working on patch attachment 8596520 [details] [review]? If not
> since the reported issue isn't appearing in CAF testing, please close this.
OK. We will fix the AudioContext-cannot-be-released problem in bug 1156586.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(slee)
Resolution: --- → WONTFIX
Flags: needinfo?(squibblyflabbetydoo)
Comment on attachment 8596520 [details] [review]
prevent creating multiple AudioContext

Clearing review since this bug is WONTFIX.
Attachment #8596520 - Flags: review?(squibblyflabbetydoo)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: