[Crash] [@ mozilla::StreamBuffer::TrackIter::FindMatch | mozilla::MediaStreamGraphImpl::UpdateStreamOrder ]

RESOLVED WONTFIX

Status

Firefox OS
Stability
RESOLVED WONTFIX
3 years ago
3 years ago

People

(Reporter: ntroast, Assigned: slee)

Tracking

({crash})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [b2g-crash][caf-crash 566][caf priority: p3][CR 819442])

Attachments

(10 attachments, 3 obsolete attachments)

144.18 KB, text/plain
Details
221.63 KB, text/plain
Details
144.18 KB, text/plain
Details
221.63 KB, text/plain
Details
141.15 KB, text/plain
Details
218.23 KB, text/plain
Details
94.11 KB, text/plain
Details
2.62 KB, patch
Details | Diff | Splinter Review
3.55 KB, patch
Details | Diff | Splinter Review
46 bytes, text/x-github-pull-request
Details | Review | Splinter Review
(Reporter)

Description

3 years ago
We observed the following crash signature during testing.

[@ mozilla::StreamBuffer::TrackIter::FindMatch | mozilla::MediaStreamGraphImpl::UpdateStreamOrder | mozilla::MediaStreamGraphImpl::UpdateGraph | mozilla::MediaStreamGraphImpl::OneIteration ]

Cafbot will upload the decoded minidump and extra file.

this crash was produced during stability tests which involves monkey testing for several hours and there is no clear STR for this. If we are not able to identify the issue using provided logs then please feel free to provide us a debug patch with additional logging to identify the issue.
Created attachment 8589774 [details]
EXTRA file attachment -
Created attachment 8589775 [details]
decoded minidump -

Updated

3 years ago
Whiteboard: [CR 819442]

Updated

3 years ago
Whiteboard: [CR 819442] → [caf priority: p1][CR 819442]

Updated

3 years ago
Whiteboard: [caf priority: p1][CR 819442] → [b2g-crash][caf-crash 566][caf priority: p1][CR 819442]

Updated

3 years ago
Keywords: crash
Created attachment 8589821 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123
Created attachment 8589822 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123

Comment 6

3 years ago
Hi Maire,

Please have Paul or someone else on your team pick this up. It's affecting our ability to reach the fxOS 2.2 MTBF goal per CAF's tests. If this isn't code your team handles please help route this to the team that does.

Thanks!
Mike
Flags: needinfo?(mreavy)
Please see bug 1152431 comment 7: https://bugzilla.mozilla.org/show_bug.cgi?id=1152431#c7. We currently believe that this bug and that one have the same root cause.

We really need a regression range from someone who can easily repro this.  This crash does not appear to reproduce on Nightly.  (I asked for help getting a regression range in the other bug.)
Flags: needinfo?(mreavy)

Comment 8

3 years ago
Thanks Maire.

(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7)
> 
> We really need a regression range from someone who can easily repro this. 
> This crash does not appear to reproduce on Nightly.

Nick,
Can CAF provide a regression range for this issue?

Thanks,
Mike
Flags: needinfo?(ntroast)
Keywords: regressionwindow-wanted

Updated

3 years ago
blocking-b2g: 2.2? → 2.2+
(Reporter)

Comment 9

3 years ago
(In reply to Mike Lee [:mlee] from comment #8)
> Thanks Maire.
> 
> (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7)
> > 
> > We really need a regression range from someone who can easily repro this. 
> > This crash does not appear to reproduce on Nightly.
> 
> Nick,
> Can CAF provide a regression range for this issue?
> 
> Thanks,
> Mike

We have seen this issue once on March 13th and once on April 8th
Flags: needinfo?(ntroast)

Comment 10

3 years ago
Thanks Nick. Is it possible to provide more detailed build and commit information for last known good and first failed runs similar to what's provided in comment 3?

(In reply to cafbot (PoC: ggrisco) from comment #3)
> Observed on: 
> 
> Device: msm8909
> Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123
> Moz BuildID: 20150405002503
> Manifest:
> https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/
> caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123.xml?h=release
> Gecko Version: 37.0
> Gaia: 
> http://git.mozilla.org/?p=releases/gaia.git;a=commit;
> h=a6351e1197d54f8624523c2db9ba1418f2aa046f
> Gecko:
> http://git.mozilla.org/?p=releases/gecko.git;a=commit;
> h=6bb2afcce9872a7cbc65b4a58f752e2d5ac02345
> Patches: bug 1145724, bug 1143694, bug 1146987, bug 1133398, bug 1152095,
> bug 1150924, bug 1133147, bug 1150271, bug 1150916

Thanks,
Mike
Flags: needinfo?(ntroast)
(Reporter)

Comment 11

3 years ago
I'm sorry. The tester did provide a note that they were randomly changing the ringtones when this crash occurred.

We saw this on AU 100 back on March 13th.
Flags: needinfo?(ntroast)
(Reporter)

Comment 12

3 years ago
Could bug 1107534 be related?

Comment 13

3 years ago
What do you think? is bug 1107534 related to this bug?
Flags: needinfo?(slee)
Created attachment 8591510 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.126
Created attachment 8591511 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.126
(Assignee)

Comment 17

3 years ago
(In reply to Ken Chang[:ken] from comment #13)
> What do you think? is bug 1107534 related to this bug?

It looks like dangling pointer. For the minidump below, it crashed at 0x5a5a5a62 and r2 is 0x5a5a5a5a. It means that r2 has already been deleted somewhere. 

Hi ntroast,

Why do you think this bug is related to bug 1107534?

Crash address: 0x5a5a5a62
Thread 22 (crashed)
 0  libxul.so!mozilla::StreamBuffer::TrackIter::FindMatch [StreamBuffer.h : 95 + 0x0]
     r0 = 0xafd83b48    r1 = 0xb37dc680    r2 = 0x5a5a5a5a    r3 = 0x00000001
Flags: needinfo?(slee) → needinfo?(ntroast)
(Reporter)

Comment 18

3 years ago
I see the Ringtones app in all of the logs, and that was the most recent bug that I know of that dealt with the Ringtones app. Just trying to look at the big picture here since we are seeing a lot of Ringtones app in these MSG bugs.

(In reply to StevenLee[:slee] from comment #17)
> (In reply to Ken Chang[:ken] from comment #13)
> > What do you think? is bug 1107534 related to this bug?
> 
> It looks like dangling pointer. For the minidump below, it crashed at
> 0x5a5a5a62 and r2 is 0x5a5a5a5a. It means that r2 has already been deleted
> somewhere. 
> 
> Hi ntroast,
> 
> Why do you think this bug is related to bug 1107534?
> 
> Crash address: 0x5a5a5a62
> Thread 22 (crashed)
>  0  libxul.so!mozilla::StreamBuffer::TrackIter::FindMatch [StreamBuffer.h :
> 95 + 0x0]
>      r0 = 0xafd83b48    r1 = 0xb37dc680    r2 = 0x5a5a5a5a    r3 = 0x00000001
Flags: needinfo?(ntroast)

Comment 19

3 years ago
We were seeing crashes like this even before bug 1107534 was fixed.  I used to be able to crash it manually by simply changing the ringtone selection multiple times.  But then at some point I wasn't able to reproduce it manually anymore.  Now we are seeing multiple crashes in ringtone with several different signatures during stability test runs.  I'll open a couple new bugs to share those logs.
Regression-window was added to a specific party that's not QAnalysts (see comment 8). Also, this seems to be happening on a device that we don't have. Currently we have ringtone related bugs on Flame device: bug 1147386 and bug 1139157.

Adding keyword to exclude this in our queries.
QA Whiteboard: QAExclude
Flags: needinfo?(ktucker)
QA Whiteboard: QAExclude → [QAnalyst-Triage+] QAExclude
Flags: needinfo?(ktucker)
(Assignee)

Comment 21

3 years ago
(In reply to Greg Grisco from comment #19)
> We were seeing crashes like this even before bug 1107534 was fixed.  I used
> to be able to crash it manually by simply changing the ringtone selection
> multiple times.  But then at some point I wasn't able to reproduce it
About when? Maybe I can checkout the corresponsive code and see if I can reproduce on nexus.

> manually anymore.  Now we are seeing multiple crashes in ringtone with
> several different signatures during stability test runs.  I'll open a couple
> new bugs to share those logs.
Can you cc me in these bugs?

Comment 22

3 years ago
(In reply to StevenLee[:slee] from comment #21)
> (In reply to Greg Grisco from comment #19)
> > We were seeing crashes like this even before bug 1107534 was fixed.  I used
> > to be able to crash it manually by simply changing the ringtone selection
> > multiple times.  But then at some point I wasn't able to reproduce it
> About when? Maybe I can checkout the corresponsive code and see if I can
> reproduce on nexus.

I think I was seeing this on AU77 and then it went away for a long time.  You could try AU77 which had these gaia/gecko revisions:

gaia: da509caa7395d3d090ce973e8de082b4680a590d
gecko: 782bc95dce066e2b262f556adf947a9a1409e6a0

> > manually anymore.  Now we are seeing multiple crashes in ringtone with
> > several different signatures during stability test runs.  I'll open a couple
> > new bugs to share those logs.
> Can you cc me in these bugs?

Sure, I'll cc: you on these:

bug 1154030, bug 1154042, bug 1154051, bug 1152439, bug 1150918
(Assignee)

Comment 23

3 years ago
Created attachment 8592609 [details]
logcat

Hi roc, 
I removed some unused log(should be from the vendor) for making the log easier to read. I found that some MediaStream was deleted more than once, ex 0xb094e280 is deleted at least thrice. Is that correct? Could it cause some problem?
Flags: needinfo?(roc)
(In reply to StevenLee[:slee] from comment #23)
> I removed some unused log(should be from the vendor) for making the log
> easier to read. I found that some MediaStream was deleted more than once, ex
> 0xb094e280 is deleted at least thrice. Is that correct? Could it cause some
> problem?

Sorry, it's hard to tell because between two destruction calls for a particular address, the object might have been constructed at that address. I guess we should log the constructor too.

The mention of bug 1107534 and the Ringtones app makes me wonder whether this is a bug where we connect streams from different MSGs. I can't exactly see how the Ringtones app would trigger this, but it does create perhaps more than one new AudioContext and connect the same element to each one. I'm pretty sure that we don't have anything preventing an element from being connected to two different MSGs and then crashing (since the element's decoder MediaStream would be created in one MSG and then connected to a MediaStream in a different graph). Alastor, can you test creating two AudioContexts with different channel types and connecting the same element to both of them? We need to block that!
Flags: needinfo?(roc) → needinfo?(alwu)
Created attachment 8592739 [details] [diff] [review]
more assertions, more logging

This patch supercedes the previous patch in bug 1149494. It adds logging for MediaStream constructors as well as destructors. It also adds some asserts to check that we don't connect MediaStreams between different MediaStreamGraphs.

Can we get this into the CodeAurora tests?
Flags: needinfo?(ntroast)
(Assignee)

Comment 26

3 years ago
Created attachment 8592836 [details]
patch

Hi roc,

As I know, CAF does not enable DEBUG. Do you think MOZ_CRASH help?
Attachment #8592836 - Flags: feedback?(roc)
(Assignee)

Comment 27

3 years ago
Comment on attachment 8592836 [details]
patch

I uploaded the wrong patch.
Attachment #8592836 - Attachment is obsolete: true
Attachment #8592836 - Flags: feedback?(roc)
(Assignee)

Comment 28

3 years ago
Comment on attachment 8592739 [details] [diff] [review]
more assertions, more logging

Review of attachment 8592739 [details] [diff] [review]:
-----------------------------------------------------------------

As comment 26, CAF doesn't enable DEBUG. We should use other way to log.
OK, I will test it.
keep ni for tracking.
Created attachment 8593184 [details] [diff] [review]
msgTest.patch

I write a test app to connect two differences MSG, but it didn't crash.
I will take a look for more details.
Flags: needinfo?(alwu)
Created attachment 8593277 [details] [diff] [review]
more assertions, more logging

Here's an updated patch using MOZ_RELEASE_ASSERT that should work in non-DEBUG.
Attachment #8592739 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
See Also: → bug 1152431
(Reporter)

Comment 32

3 years ago
I added attachment 8593277 [details] [diff] [review] to our build. I also removed the patch from bug 1150923 due to conflict, and this one I believe supersedes it.

It should be included within the next two AUs
Flags: needinfo?(ntroast)
(Assignee)

Comment 34

3 years ago
I found some operations that could cause memory pressure. I'm not sure whether it could cause the problem. 

Repeat doing the 3 steps. 
1. Open ringtone app from settings
2. Select a ringtone to preview
3. go to homescreen

In the ringtone app, it creates an AudioElement in [1]. In step 2, it creates an AudioContext and connect to the AudioElement, [2]. In step 3, AudioContext is set to null, [3], but cycle collector finds that it cannot delete the AudioContext. That's because the AudioContext is pointed by the AudioElement. 

[1] https://git.mozilla.org/?p=releases/gaia.git;a=blob;f=apps/ringtones/js/tone_player.js;h=aa92cd42d894dc991a23ba99f9282c5cc7ca8df8;hb=refs/heads/v2.2#l17
[2] https://git.mozilla.org/?p=releases/gaia.git;a=blob;f=apps/ringtones/js/tone_player.js;h=aa92cd42d894dc991a23ba99f9282c5cc7ca8df8;hb=refs/heads/v2.2#l146
[3] https://git.mozilla.org/?p=releases/gaia.git;a=blob;f=apps/ringtones/js/tone_player.js;h=aa92cd42d894dc991a23ba99f9282c5cc7ca8df8;hb=refs/heads/v2.2#l151

Updated

3 years ago
Whiteboard: [b2g-crash][caf-crash 566][caf priority: p1][CR 819442] → [b2g-crash][caf-crash 566][caf priority: p3][CR 819442]
(In reply to StevenLee[:slee] from comment #34)
> I found some operations that could cause memory pressure. I'm not sure
> whether it could cause the problem. 
> 
> Repeat doing the 3 steps. 
> 1. Open ringtone app from settings
> 2. Select a ringtone to preview
> 3. go to homescreen
> 
> In the ringtone app, it creates an AudioElement in [1]. In step 2, it
> creates an AudioContext and connect to the AudioElement, [2]. In step 3,
> AudioContext is set to null, [3], but cycle collector finds that it cannot
> delete the AudioContext. That's because the AudioContext is pointed by the
> AudioElement. 

Are we permanently leaking? What keeps the AudioElement alive?
Flags: needinfo?(slee)
(Assignee)

Comment 36

3 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #35)
> Are we permanently leaking? What keeps the AudioElement alive?
No, but as long as the AudioElement is alive, all the MediaStreams(held by AudioContext) will not be deleted. In a simplified situation is that Gaia developers may create an AudioElement and input this AudioElement to many AudioContext. As long as the AudioElement is alive, all the AudioContexts cannot be deleted.

In this case, TonePlayer is created in a page and this page is always alive(except we kill ringtone app).
1. TonePlayer creates one AudioElement. 
2. When users select one ringtone to preview, it creates an AudioContext and link this AudioElement to the AudioContext. 
3. When it detects the window is hidden, it sets AudioContext to null.

If we repeat 2 and 3(go to homescreen then go back to ringtone app and preview), we get many AudioContexts that are related to an AudioElement.
Flags: needinfo?(slee)
OK, how does the AudioElement keep links to many AudioContexts?
Flags: needinfo?(slee)
(Assignee)

Comment 38

3 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #37)
> OK, how does the AudioElement keep links to many AudioContexts?
In [1], this._player is the AudioElement. Since TonePlayer keeps the same AudioElement and TonePlayer keeps alive, the reference count of AudioContext will never be zero. 

[1] https://git.mozilla.org/?p=releases/gaia.git;a=blob;f=apps/ringtones/js/tone_player.js;h=aa92cd42d894dc991a23ba99f9282c5cc7ca8df8;hb=refs/heads/v2.2#l147
Flags: needinfo?(slee)
Hmm, that doesn't answer what I asked. I think the answer to my question is that HTMLMediaElement::mOutputStreams never has elements removed from it (except for when they have mFinishWhenEnded set, which they won't here), so that array will just keep growing with new streams every time createMediaElementSource is called.

We should fix that, but I think a fix will be quite risky, so it would be good to find a workaround as well.

It seems to me that _setExclusiveMode(false) is supposed to cause the ringer AudioContext to be (eventually) destroyed, letting background music play again. However, currently it seems to me the ringer AudioContext will never be destroyed so that doesn't work. Am I right?

If so, then we might fix the crashes just by making _setExclusiveMode(false) not do anything and _setExclusiveMode(true) not creating a new AudioContext and MediaElementAudioSourceNode, if they already exist. How does that sound?
Flags: needinfo?(slee)
(Assignee)

Comment 40

3 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #39)
> Hmm, that doesn't answer what I asked. I think the answer to my question is
> that HTMLMediaElement::mOutputStreams never has elements removed from it
> (except for when they have mFinishWhenEnded set, which they won't here), so
> that array will just keep growing with new streams every time
> createMediaElementSource is called.
Sorry for late response, I spent some time tracing and verifying the scenario. Yes, I found that if I force mFinishWhenEnded to true, AudioContext will be deleted on the same operations. 

> It seems to me that _setExclusiveMode(false) is supposed to cause the ringer
> AudioContext to be (eventually) destroyed, letting background music play
> again. However, currently it seems to me the ringer AudioContext will never
> be destroyed so that doesn't work. Am I right?
> 
> If so, then we might fix the crashes just by making _setExclusiveMode(false)
> not do anything and _setExclusiveMode(true) not creating a new AudioContext
> and MediaElementAudioSourceNode, if they already exist. How does that sound?
Sure, I also want to fix the problem from the app side. I will upload a patch later. Thanks for the suggestion.
Flags: needinfo?(slee)
(Assignee)

Comment 41

3 years ago
Created attachment 8596519 [details] [review]
prevent creating multiple AudioContext

Hi David,

Per comment 36 and comment 39, we want ringtone app not to create multiple AudioContext. Please help review this patch. Thanks.
Attachment #8596519 - Flags: review?(dflanagan)
(Assignee)

Comment 42

3 years ago
Created attachment 8596520 [details] [review]
prevent creating multiple AudioContext

Sorry, I pasted the wrong link.
Attachment #8596519 - Attachment is obsolete: true
Attachment #8596519 - Flags: review?(dflanagan)
Attachment #8596520 - Flags: review?(dflanagan)
Comment on attachment 8596520 [details] [review]
prevent creating multiple AudioContext

Passing this review to Jim Porter who is in charge of the Ringtones app.
Attachment #8596520 - Flags: review?(dflanagan) → review?(squibblyflabbetydoo)

Comment 44

3 years ago
I think this is a UX change (the AudioContext stuff is designed to mute all background audio once you start previewing a ringtone, and then let the other audio resume as soon as the ringtones app is minimized), so you'll need UX feedback before I can do much with this. Personally, if things are so busted that we can't just fix the Gecko issue for 2.2, I'd rather we drop the AudioContext code entirely.

Tiffanie Shakespeare would probably be a good person to ask about the UX side.

Updated

3 years ago
Assignee: nobody → slee
Hi, Jim,
After landing the bug1027172, the audio context would be muted when its destination node doesn't have input.
Therefore, the UX designed feature doesn't take effect now. I think we need other methods to implement this feature.

Comment 46

3 years ago
If that's the case (I haven't tested it), let's just drop the AudioContext stuff entirely, since it doesn't do anything. I guess we could feed the AudioContext some empty audio data after the ringtone stops, but I've never been quite sure how to do that with AudioContext, and I'm not sure it's worth the effort anyway.

For 3.0, it sounds like this will be fixed with the new audio competing policy code.

Comment 47

3 years ago
Alastor, are you or Steven working on what Jim said in comment 46? Or do you have any other idea?
Hi, Ken,
We don't involve on this preview feature. I think that might be solved by Jim's proposal which is to send the AudioContext some empty audio data after the ringtone stops. 
As I know, the audio competing policy is not related with solving that problem. Because we just move the decision logic from the Gecko to Gaia, the background music would still be resume in the new competing rule when the ringer previews ended.
(Assignee)

Comment 49

3 years ago
Hi Nicholas,

As I know, Greg resolved many crash problem as "WORKSFORME". Can you still reproduce this issue? If not, I will remove the blocker flag. 

Thanks.
Flags: needinfo?(ntroast)

Comment 50

3 years ago
(In reply to StevenLee[:slee] from comment #49)
> Hi Nicholas,
> 
> As I know, Greg resolved many crash problem as "WORKSFORME". Can you still
> reproduce this issue? If not, I will remove the blocker flag. 
> 
> Thanks.

Hi Steven, we haven't seen this crash since AU 126.  I didn't close this bug in particular because it looked like there was a patch being worked on, but I'm ok if you want to close it for now until it reproduces again.
Flags: needinfo?(ntroast)
(Assignee)

Comment 51

3 years ago
(In reply to Greg Grisco from comment #50)
> Hi Steven, we haven't seen this crash since AU 126.  I didn't close this bug
> in particular because it looked like there was a patch being worked on, but
> I'm ok if you want to close it for now until it reproduces again.
Greg, thanks for the information.
blocking-b2g: 2.2+ → ---

Updated

3 years ago
No longer blocks: 1063044

Comment 52

3 years ago
Steven & Jim,

Are you actively working on patch attachment 8596520 [details] [review] [1]? If not since the reported issue isn't appearing in CAF testing, please close this.

Thanks,
Mike
Flags: needinfo?(squibblyflabbetydoo)
Flags: needinfo?(slee)
(Assignee)

Comment 53

3 years ago
(In reply to Mike Lee [:mlee] from comment #52)
> Steven & Jim,
> 
> Are you actively working on patch attachment 8596520 [details] [review]? If not
> since the reported issue isn't appearing in CAF testing, please close this.
OK. We will fix the AudioContext-cannot-be-released problem in bug 1156586.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Flags: needinfo?(slee)
Resolution: --- → WONTFIX

Updated

3 years ago
Blocks: 1063044

Updated

3 years ago
Flags: needinfo?(squibblyflabbetydoo)

Comment 54

3 years ago
Comment on attachment 8596520 [details] [review]
prevent creating multiple AudioContext

Clearing review since this bug is WONTFIX.
Attachment #8596520 - Flags: review?(squibblyflabbetydoo)
Keywords: regressionwindow-wanted
You need to log in before you can comment on or make changes to this bug.