Closed Bug 1152431 Opened 10 years ago Closed 10 years ago

[Crash] [@ ? | mozilla::MediaStreamGraphImpl::EnsureRunInStableState ]

Tracking

(blocking-b2g:2.2+)

Status:

RESOLVED WORKSFORME

Project Flags:

blocking-b2g

2.2+

People

(Reporter: ntroast, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 606][caf priority: p3][CR 819433])

Attachments

(4 files)

EXTRA file attachment - 10 years ago cafbot (PoC: ggrisco) 141.31 KB, text/plain		Details
decoded minidump - 10 years ago cafbot (PoC: ggrisco) 207.31 KB, text/plain		Details
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123 10 years ago cafbot (PoC: ggrisco) 141.41 KB, text/plain		Details
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123 10 years ago cafbot (PoC: ggrisco) 207.31 KB, text/plain		Details

Nicholas Troast [:ntroast]

Reporter

Description

•

10 years ago

We observed the following crash signature during testing. [@ ? | mozilla::MediaStreamGraphImpl::EnsureRunInStableState | mozilla::MediaStreamGraphImpl::AppendMessage | mozilla::MediaStreamGraphImpl::UpdateConsumptionState ] Cafbot will upload the decoded minidump and extra file. This crash was produced during stability tests which involves monkey testing for several hours and there is no clear STR for this. If we are not able to identify the issue using provided logs then please feel free to provide us a debug patch with additional logging to identify the issue.

cafbot (PoC: ggrisco)

Comment 1

•

10 years ago

Attached file EXTRA file attachment - — Details

cafbot (PoC: ggrisco)

Comment 2

•

10 years ago

Attached file decoded minidump - — Details

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [CR 819433]

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [CR 819433] → [caf priority: p1][CR 819433]

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [caf priority: p1][CR 819433] → [b2g-crash][caf-crash 606][caf priority: p1][CR 819433]

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Keywords: crash

cafbot (PoC: ggrisco)

Comment 3

•

10 years ago

Observed on: Device: msm8909 Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123 Moz BuildID: 20150405002503 Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123.xml?h=release Gecko Version: 37.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=a6351e1197d54f8624523c2db9ba1418f2aa046f Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=6bb2afcce9872a7cbc65b4a58f752e2d5ac02345 Patches: bug 1145724, bug 1143694, bug 1146987, bug 1133398, bug 1152095, bug 1150924, bug 1133147, bug 1150271, bug 1150916

cafbot (PoC: ggrisco)

Comment 4

•

10 years ago

Attached file EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123 — Details

cafbot (PoC: ggrisco)

Comment 5

•

10 years ago

Attached file decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123 — Details

Mike Lee [:mlee]

Comment 6

•

10 years ago

Hi Maire, Please have Paul or someone else on your team pick this up. It's affecting our ability to reach the fxOS 2.2 MTBF goal per CAF's tests. If this isn't code your team handles please help route this to the team that does. Thanks! Mike

Flags: needinfo?(mreavy)

Maire Reavy [:mreavy]

Comment 7

•

10 years ago

Paul and I talked about the MSG crashes in our 1:1 earlier today. The current theory (per my conversation with Paul) is that all the crashes have the same root cause. These include bug 1152439, bug 1152431, bug 1152439 and this bug. Paul -- if you have new info that changes this theory, please chime in. What's not clear is whether Rob (roc) or Paul should take the lead on fixing the issue. I know they have been talking intently about the possible root cause. Paul, Rob -- Who makes the most sense to take the lead on this? I was thinking Rob, but I don't know what else Rob is working on. Also, can we dupe the other 3 MSG crasher bugs to one bug or is that premature? Mike -- We REALLY need a regression range if we can get it. Per Paul, this issue does NOT repro in Nightly -- so it is likely something that got fixed (perhaps when we did a refactor). And it is not easy for Paul to repro. I believe he was able to repro it twice on tbpl/treeherder. If it is happening regularly for someone else, can we get a regression range?

Flags: needinfo?(roc)

Flags: needinfo?(padenot)

Flags: needinfo?(mreavy)

Flags: needinfo?(mlee)

Mike Lee [:mlee]

Comment 8

•

10 years ago

Thanks Maire. (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7) > > Mike -- We REALLY need a regression range if we can get it. Per Paul, this > issue does NOT repro in Nightly -- so it is likely something that got fixed > (perhaps when we did a refactor). And it is not easy for Paul to repro. I > believe he was able to repro it twice on tbpl/treeherder. If it is > happening regularly for someone else, can we get a regression range? Nick, Can CAF provide a regression range for this issue? Thanks, Mike

Flags: needinfo?(mlee) → needinfo?(ntroast)

Keywords: regressionwindow-wanted

bhavana bajaj [:bajaj]

Updated

•

10 years ago

blocking-b2g: 2.2? → 2.2+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 9

•

10 years ago

(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7) > Mike -- We REALLY need a regression range if we can get it. Per Paul, this > issue does NOT repro in Nightly -- so it is likely something that got fixed > (perhaps when we did a refactor). And it is not easy for Paul to repro. I > believe he was able to repro it twice on tbpl/treeherder. How? If we can ever reproduce anything like this on our test machines, that would be a big help, but I'll need to know how. (In reply to cafbot (PoC: ggrisco) from comment #5) > Created attachment 8589818 [details] > decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123 This decoded stack looks corrupt. MediaStreamGraphImpl::UpdateConsumptionState doesn't call AppendMessage, and in fact AppendMessage should not be called on the MSG thread along any path. (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7) > What's not clear is whether Rob (roc) or Paul should take the lead on fixing > the issue. I know they have been talking intently about the possible root > cause. > > Paul, Rob -- Who makes the most sense to take the lead on this? I was > thinking Rob, but I don't know what else Rob is working on. Also, can we > dupe the other 3 MSG crasher bugs to one bug or is that premature? It's probably premature. I don't really care who takes the lead. The problem is that there is very little data to go on. It looks like memory corruption, maybe limited to corruption of the MSG graph, maybe something more general. ntroast: it would be helpful to know if memory gets low before we crash. It would also be helpful to know which process crashed, i.e. which FirefoxOS app. I can't see that in any of the crash dumps.

Flags: needinfo?(roc)

Nicholas Troast [:ntroast]

Reporter

Comment 10

•

10 years ago

(In reply to Mike Lee [:mlee] from comment #8) > Thanks Maire. > > (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7) > > > > Mike -- We REALLY need a regression range if we can get it. Per Paul, this > > issue does NOT repro in Nightly -- so it is likely something that got fixed > > (perhaps when we did a refactor). And it is not easy for Paul to repro. I > > believe he was able to repro it twice on tbpl/treeherder. If it is > > happening regularly for someone else, can we get a regression range? > > Nick, > Can CAF provide a regression range for this issue? > > Thanks, > Mike This particular issue was first seen on April 8th. Taking into consideration all of the MSG issues I see critical mass around March 16th, but please take that with a grain of salt.

Flags: needinfo?(ntroast)

Mike Lee [:mlee]

Comment 11

•

10 years ago

Thanks Nick. Is it possible to provide more detailed build and commit information for last known good and first failed runs similar to what's provided in comment 3? (In reply to cafbot (PoC: ggrisco) from comment #3) > Observed on: > > Device: msm8909 > Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123 > Moz BuildID: 20150405002503 > Manifest: > https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/ > caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123.xml?h=release > Gecko Version: 37.0 > Gaia: > http://git.mozilla.org/?p=releases/gaia.git;a=commit; > h=a6351e1197d54f8624523c2db9ba1418f2aa046f > Gecko: > http://git.mozilla.org/?p=releases/gecko.git;a=commit; > h=6bb2afcce9872a7cbc65b4a58f752e2d5ac02345 > Patches: bug 1145724, bug 1143694, bug 1146987, bug 1133398, bug 1152095, > bug 1150924, bug 1133147, bug 1150271, bug 1150916

Flags: needinfo?(ntroast)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

10 years ago

ntroast: it would be helpful to know if memory gets low before we crash. It would also be helpful to know which process crashed, i.e. which FirefoxOS app. I can't see that in any of the crash dumps.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 13

•

10 years ago

> It would also be helpful to know which process crashed, i.e. which FirefoxOS app. I can't see that in > any of the crash dumps. Right now this is the most useful thing I can think of. Anyone know how to get it?

Nicholas Troast [:ntroast]

Reporter

Comment 14

•

10 years ago

Unfortunately there were no additional logs collected other than what I already provided. If more logs become available I will make sure to put them here. Also, for Mike's question, the info in comment 3 is the first failed. The last known good would be AU 122 which is the AU just before 123

Flags: needinfo?(ntroast)

Maire Reavy [:mreavy]

Comment 15

•

10 years ago

Nick - (Referencing comment 13 from Rob) Do you know which FirefoxOS app was running when you saw this crash? Is it the ringtone app? (Same as bug 1152439?)

Flags: needinfo?(ntroast)

Pi Wei Cheng [:piwei] (inactive)

Comment 16

•

10 years ago

Regression-window was added to a specific party that's not QAnalysts (see comment 8). Also, this seems to be happening on a device that we don't have. Currently we have ringtone related bugs on Flame device: bug 1147386 and bug 1139157. Adding keyword to exclude this in our queries.

QA Whiteboard: QAExclude

Flags: needinfo?(ktucker)

KTucker [:KTucker][Inactive 3/4/2016]

Updated

•

10 years ago

QA Whiteboard: QAExclude → [QAnalyst-Triage+] QAExclude

Flags: needinfo?(ktucker)

Nicholas Troast [:ntroast]

Reporter

Comment 17

•

10 years ago

(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #15) > Nick - (Referencing comment 13 from Rob) Do you know which FirefoxOS app was > running when you saw this crash? Is it the ringtone app? (Same as bug > 1152439?) Sorry, there was no additional information included about this crash. It happened in automation, so I don't know which app was running.

Flags: needinfo?(ntroast)

Paul Adenot (:padenot)

Updated

•

10 years ago

Flags: needinfo?(padenot)

StevenLee[:slee]

Comment 18

•

10 years ago

Hi roc, If the process is killed by low memory killer or OOM killer, it won't have minidump on FxOS. I am guessing that it could be related to bug 1152439. As you mentioned in [1], if a MediaStream is connected between different MSGs, there could be problem. Could it possible that the MediaStream in [2] has been deleted and for some reason the memory address is allocated for other object, another MSG? Then we get the weird call stack. I also found that on main-thread, it is doing some releasing jobs. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1152439#c24 [2] http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=dom/media/MediaStreamGraph.cpp;h=4a50ea3185be2f8f01ea44d2e52444e119c2e11a;hb=6bb2afcce9872a7cbc65b4a58f752e2d5ac02345#l149

Updated

•

10 years ago

Whiteboard: [b2g-crash][caf-crash 606][caf priority: p1][CR 819433] → [b2g-crash][caf-crash 606][caf priority: p3][CR 819433]

Bobby Chien

Comment 19

•

10 years ago

ni :roc per comment 18.

Flags: needinfo?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 20

•

10 years ago

(In reply to StevenLee[:slee] from comment #18) > If the process is killed by low memory killer or OOM killer, it won't have > minidump on FxOS. You mean, this crash can't be caused by OOM? > I am guessing that it could be related to bug 1152439. As > you mentioned in [1], if a MediaStream is connected between different MSGs, > there could be problem. Could it possible that the MediaStream in [2] has > been deleted and for some reason the memory address is allocated for other > object, another MSG? Then we get the weird call stack. Which weird call stack? > I also found that on > main-thread, it is doing some releasing jobs. I'm not sure what you mean by that.

Flags: needinfo?(roc)

bhavana bajaj [:bajaj]

Comment 21

•

10 years ago

NI :greg, to confirm if he is still hitting this. Greg, I saw you closing a couple of related crashes and wnted to check if you guys are still hitting this one?

Flags: needinfo?(ggrisco)

Greg Grisco

Comment 22

•

10 years ago

(In reply to bhavana bajaj [:bajaj] from comment #21) > NI :greg, to confirm if he is still hitting this. > > Greg, I saw you closing a couple of related crashes and wnted to check if > you guys are still hitting this one? Thanks for the ni? on this. We aren't seeing this crash either in past few builds. I'm ok with closing it if you are.

Flags: needinfo?(bbajaj)

Greg Grisco

Updated

•

10 years ago

Flags: needinfo?(ggrisco)

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WORKSFORME

cafbot (PoC: ggrisco)

Comment 23

•

10 years ago

"Closing issue which has not been seen since 04/05/15 18:41"

bhavana bajaj [:bajaj]

Updated

•

10 years ago

Flags: needinfo?(bbajaj)

KTucker [:KTucker][Inactive 3/4/2016]

Updated

•

10 years ago

Keywords: regressionwindow-wanted

You need to log in before you can comment on or make changes to this bug.