Closed
Bug 1152431
Opened 10 years ago
Closed 10 years ago
[Crash] [@ ? | mozilla::MediaStreamGraphImpl::EnsureRunInStableState ]
Categories
(Firefox OS Graveyard :: Stability, defect)
Tracking
(blocking-b2g:2.2+)
RESOLVED
WORKSFORME
blocking-b2g | 2.2+ |
People
(Reporter: ntroast, Unassigned)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 606][caf priority: p3][CR 819433])
Attachments
(4 files)
We observed the following crash signature during testing.
[@ ? | mozilla::MediaStreamGraphImpl::EnsureRunInStableState | mozilla::MediaStreamGraphImpl::AppendMessage | mozilla::MediaStreamGraphImpl::UpdateConsumptionState ]
Cafbot will upload the decoded minidump and extra file.
This crash was produced during stability tests which involves monkey testing for several hours and there is no clear STR for this. If we are not able to identify the issue using provided logs then please feel free to provide us a debug patch with additional logging to identify the issue.
Comment 1•10 years ago
|
||
Comment 2•10 years ago
|
||
Updated•10 years ago
|
Whiteboard: [CR 819433]
Updated•10 years ago
|
Whiteboard: [CR 819433] → [caf priority: p1][CR 819433]
Updated•10 years ago
|
Whiteboard: [caf priority: p1][CR 819433] → [b2g-crash][caf-crash 606][caf priority: p1][CR 819433]
Comment 3•10 years ago
|
||
Observed on:
Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123
Moz BuildID: 20150405002503
Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123.xml?h=release
Gecko Version: 37.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=a6351e1197d54f8624523c2db9ba1418f2aa046f
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=6bb2afcce9872a7cbc65b4a58f752e2d5ac02345
Patches: bug 1145724, bug 1143694, bug 1146987, bug 1133398, bug 1152095, bug 1150924, bug 1133147, bug 1150271, bug 1150916
Comment 4•10 years ago
|
||
Comment 5•10 years ago
|
||
Comment 6•10 years ago
|
||
Hi Maire,
Please have Paul or someone else on your team pick this up. It's affecting our ability to reach the fxOS 2.2 MTBF goal per CAF's tests. If this isn't code your team handles please help route this to the team that does.
Thanks!
Mike
Flags: needinfo?(mreavy)
Comment 7•10 years ago
|
||
Paul and I talked about the MSG crashes in our 1:1 earlier today. The current theory (per my conversation with Paul) is that all the crashes have the same root cause. These include bug 1152439, bug 1152431, bug 1152439 and this bug. Paul -- if you have new info that changes this theory, please chime in.
What's not clear is whether Rob (roc) or Paul should take the lead on fixing the issue. I know they have been talking intently about the possible root cause.
Paul, Rob -- Who makes the most sense to take the lead on this? I was thinking Rob, but I don't know what else Rob is working on. Also, can we dupe the other 3 MSG crasher bugs to one bug or is that premature?
Mike -- We REALLY need a regression range if we can get it. Per Paul, this issue does NOT repro in Nightly -- so it is likely something that got fixed (perhaps when we did a refactor). And it is not easy for Paul to repro. I believe he was able to repro it twice on tbpl/treeherder. If it is happening regularly for someone else, can we get a regression range?
Flags: needinfo?(roc)
Flags: needinfo?(padenot)
Flags: needinfo?(mreavy)
Flags: needinfo?(mlee)
Comment 8•10 years ago
|
||
Thanks Maire.
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7)
>
> Mike -- We REALLY need a regression range if we can get it. Per Paul, this
> issue does NOT repro in Nightly -- so it is likely something that got fixed
> (perhaps when we did a refactor). And it is not easy for Paul to repro. I
> believe he was able to repro it twice on tbpl/treeherder. If it is
> happening regularly for someone else, can we get a regression range?
Nick,
Can CAF provide a regression range for this issue?
Thanks,
Mike
Flags: needinfo?(mlee) → needinfo?(ntroast)
Keywords: regressionwindow-wanted
Updated•10 years ago
|
blocking-b2g: 2.2? → 2.2+
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7)
> Mike -- We REALLY need a regression range if we can get it. Per Paul, this
> issue does NOT repro in Nightly -- so it is likely something that got fixed
> (perhaps when we did a refactor). And it is not easy for Paul to repro. I
> believe he was able to repro it twice on tbpl/treeherder.
How? If we can ever reproduce anything like this on our test machines, that would be a big help, but I'll need to know how.
(In reply to cafbot (PoC: ggrisco) from comment #5)
> Created attachment 8589818 [details]
> decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123
This decoded stack looks corrupt. MediaStreamGraphImpl::UpdateConsumptionState doesn't call AppendMessage, and in fact AppendMessage should not be called on the MSG thread along any path.
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7)
> What's not clear is whether Rob (roc) or Paul should take the lead on fixing
> the issue. I know they have been talking intently about the possible root
> cause.
>
> Paul, Rob -- Who makes the most sense to take the lead on this? I was
> thinking Rob, but I don't know what else Rob is working on. Also, can we
> dupe the other 3 MSG crasher bugs to one bug or is that premature?
It's probably premature.
I don't really care who takes the lead. The problem is that there is very little data to go on. It looks like memory corruption, maybe limited to corruption of the MSG graph, maybe something more general.
ntroast: it would be helpful to know if memory gets low before we crash. It would also be helpful to know which process crashed, i.e. which FirefoxOS app. I can't see that in any of the crash dumps.
Flags: needinfo?(roc)
Reporter | ||
Comment 10•10 years ago
|
||
(In reply to Mike Lee [:mlee] from comment #8)
> Thanks Maire.
>
> (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #7)
> >
> > Mike -- We REALLY need a regression range if we can get it. Per Paul, this
> > issue does NOT repro in Nightly -- so it is likely something that got fixed
> > (perhaps when we did a refactor). And it is not easy for Paul to repro. I
> > believe he was able to repro it twice on tbpl/treeherder. If it is
> > happening regularly for someone else, can we get a regression range?
>
> Nick,
> Can CAF provide a regression range for this issue?
>
> Thanks,
> Mike
This particular issue was first seen on April 8th.
Taking into consideration all of the MSG issues I see critical mass around March 16th, but please take that with a grain of salt.
Flags: needinfo?(ntroast)
Comment 11•10 years ago
|
||
Thanks Nick. Is it possible to provide more detailed build and commit information for last known good and first failed runs similar to what's provided in comment 3?
(In reply to cafbot (PoC: ggrisco) from comment #3)
> Observed on:
>
> Device: msm8909
> Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123
> Moz BuildID: 20150405002503
> Manifest:
> https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/
> caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.123.xml?h=release
> Gecko Version: 37.0
> Gaia:
> http://git.mozilla.org/?p=releases/gaia.git;a=commit;
> h=a6351e1197d54f8624523c2db9ba1418f2aa046f
> Gecko:
> http://git.mozilla.org/?p=releases/gecko.git;a=commit;
> h=6bb2afcce9872a7cbc65b4a58f752e2d5ac02345
> Patches: bug 1145724, bug 1143694, bug 1146987, bug 1133398, bug 1152095,
> bug 1150924, bug 1133147, bug 1150271, bug 1150916
Flags: needinfo?(ntroast)
ntroast: it would be helpful to know if memory gets low before we crash. It would also be helpful to know which process crashed, i.e. which FirefoxOS app. I can't see that in any of the crash dumps.
> It would also be helpful to know which process crashed, i.e. which FirefoxOS app. I can't see that in
> any of the crash dumps.
Right now this is the most useful thing I can think of. Anyone know how to get it?
Reporter | ||
Comment 14•10 years ago
|
||
Unfortunately there were no additional logs collected other than what I already provided. If more logs become available I will make sure to put them here.
Also, for Mike's question, the info in comment 3 is the first failed. The last known good would be AU 122 which is the AU just before 123
Flags: needinfo?(ntroast)
Comment 15•10 years ago
|
||
Nick - (Referencing comment 13 from Rob) Do you know which FirefoxOS app was running when you saw this crash? Is it the ringtone app? (Same as bug 1152439?)
Flags: needinfo?(ntroast)
Comment 16•10 years ago
|
||
Regression-window was added to a specific party that's not QAnalysts (see comment 8). Also, this seems to be happening on a device that we don't have. Currently we have ringtone related bugs on Flame device: bug 1147386 and bug 1139157.
Adding keyword to exclude this in our queries.
QA Whiteboard: QAExclude
Flags: needinfo?(ktucker)
Updated•10 years ago
|
QA Whiteboard: QAExclude → [QAnalyst-Triage+] QAExclude
Flags: needinfo?(ktucker)
Reporter | ||
Comment 17•10 years ago
|
||
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #15)
> Nick - (Referencing comment 13 from Rob) Do you know which FirefoxOS app was
> running when you saw this crash? Is it the ringtone app? (Same as bug
> 1152439?)
Sorry, there was no additional information included about this crash. It happened in automation, so I don't know which app was running.
Flags: needinfo?(ntroast)
Updated•10 years ago
|
Flags: needinfo?(padenot)
Comment 18•10 years ago
|
||
Hi roc,
If the process is killed by low memory killer or OOM killer, it won't have minidump on FxOS. I am guessing that it could be related to bug 1152439. As you mentioned in [1], if a MediaStream is connected between different MSGs, there could be problem. Could it possible that the MediaStream in [2] has been deleted and for some reason the memory address is allocated for other object, another MSG? Then we get the weird call stack. I also found that on main-thread, it is doing some releasing jobs.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1152439#c24
[2] http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=dom/media/MediaStreamGraph.cpp;h=4a50ea3185be2f8f01ea44d2e52444e119c2e11a;hb=6bb2afcce9872a7cbc65b4a58f752e2d5ac02345#l149
See Also: → 1152439
Updated•10 years ago
|
Whiteboard: [b2g-crash][caf-crash 606][caf priority: p1][CR 819433] → [b2g-crash][caf-crash 606][caf priority: p3][CR 819433]
(In reply to StevenLee[:slee] from comment #18)
> If the process is killed by low memory killer or OOM killer, it won't have
> minidump on FxOS.
You mean, this crash can't be caused by OOM?
> I am guessing that it could be related to bug 1152439. As
> you mentioned in [1], if a MediaStream is connected between different MSGs,
> there could be problem. Could it possible that the MediaStream in [2] has
> been deleted and for some reason the memory address is allocated for other
> object, another MSG? Then we get the weird call stack.
Which weird call stack?
> I also found that on
> main-thread, it is doing some releasing jobs.
I'm not sure what you mean by that.
Flags: needinfo?(roc)
Comment 21•10 years ago
|
||
NI :greg, to confirm if he is still hitting this.
Greg, I saw you closing a couple of related crashes and wnted to check if you guys are still hitting this one?
Flags: needinfo?(ggrisco)
Comment 22•10 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #21)
> NI :greg, to confirm if he is still hitting this.
>
> Greg, I saw you closing a couple of related crashes and wnted to check if
> you guys are still hitting this one?
Thanks for the ni? on this. We aren't seeing this crash either in past few builds. I'm ok with closing it if you are.
Flags: needinfo?(bbajaj)
Updated•10 years ago
|
Flags: needinfo?(ggrisco)
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Comment 23•10 years ago
|
||
"Closing issue which has not been seen since 04/05/15 18:41"
Updated•10 years ago
|
Flags: needinfo?(bbajaj)
Updated•9 years ago
|
Keywords: regressionwindow-wanted
You need to log in
before you can comment on or make changes to this bug.
Description
•