Closed
Bug 832109
Opened 12 years ago
Closed 12 years ago
crash in mozilla::dom::telephony::Telephony::RILTelephonyCallback::CallStateChanged
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:tef+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 verified, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)
People
(Reporter: m1, Assigned: bent.mozilla)
Details
(Keywords: crash, Whiteboard: [b2g-crash][BTG-1000][LOE:M])
Crash Data
Attachments
(5 files)
See multiple times during stability test.
Test Steps:
1. Start Airplane mode, SMS, Call script.
Top frames:
Crash reason: SIGSEGV
Crash address: 0x414566c0
Thread 0 (crashed)
0 0x414566c0
r4 = 0x414566c0 r5 = 0xbeeaa108 r6 = 0xbeeaa108 r7 = 0xbeeaa028
r8 = 0x00000005 r9 = 0x411aab78 r10 = 0xbeeaa330 fp = 0x43ffe8e0
sp = 0xbeea9fe0 lr = 0x408300cf pc = 0x414566c0
Found by: given as instruction pointer in context
1 libxul.so!mozilla::dom::telephony::Telephony::RILTelephonyCallback::CallStateChanged [Telephony.h : 109 + 0x11]
sp = 0xbeea9ff0 pc = 0x408300b9
Found by: stack scanning
2 0xbeeaa31e
r4 = 0xbeeaa108 r5 = 0x00000001 sp = 0xbeeaa008 pc = 0xbeeaa320
Found by: call frame info
3 libxul.so!XPCWrappedNative::CallMethod [XPCWrappedNative.cpp : 3083 + 0xd]
sp = 0xbeeaa058 pc = 0x408f17fb
Found by: stack scanning
Reporter | ||
Comment 1•12 years ago
|
||
(minidump is from AU 177)
Comment 2•12 years ago
|
||
(In reply to Michael Vines [:m1] from comment #0)
> Created attachment 703696 [details]
> decoded minidump of crash
>
> See multiple times during stability test.
>
> Test Steps:
> 1. Start Airplane mode, SMS, Call script.
>
Hi Michael,
Would you please provide more dump details, such as 'r0' address? That would be great help for us. It would be also very helpful that you could clarify the detailed test steps here. Thanks!
Comment 3•12 years ago
|
||
(In reply to Michael Vines [:m1] from comment #1)
> (minidump is from AU 177)
Hi Michael,
Is it possible we can also get minidump and simple instruction to operate this tool? Thanks.
Reporter | ||
Comment 4•12 years ago
|
||
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #2)
> Would you please provide more dump details, such as 'r0' address? That would
> be great help for us. It would be also very helpful that you could clarify
> the detailed test steps here. Thanks!
Decoded minidump is attached to this bug. I think you will need to examine the minidump and corresponding source in the backtrace to attempt to determine why we could crash here.
Comment 5•12 years ago
|
||
(In reply to Michael Vines [:m1] from comment #4)
> (In reply to Hsin-Yi Tsai [:hsinyi] from comment #2)
> > Would you please provide more dump details, such as 'r0' address? That would
> > be great help for us. It would be also very helpful that you could clarify
> > the detailed test steps here. Thanks!
>
> Decoded minidump is attached to this bug. I think you will need to examine
> the minidump and corresponding source in the backtrace to attempt to
> determine why we could crash here.
Yes, we already examined the minidump and we still think it would be helpful to have info about r0 but sadly, the info is missing in the minidump.
Reporter | ||
Comment 6•12 years ago
|
||
Bug in the minidump code? Sorry, I'm certainly not withholding r0! :)
Updated•12 years ago
|
Severity: normal → critical
Crash Signature: [@ mozilla::dom::telephony::Telephony::RILTelephonyCallback::CallStateChanged]
Keywords: crash
Whiteboard: [BTG-1000] → [b2g-crash][BTG-1000]
Comment 7•12 years ago
|
||
Hsin-Yi, I'm assigning to you since it seems like you're investigating here.
Assignee: nobody → htsai
blocking-b2g: tef? → tef+
Updated•12 years ago
|
Whiteboard: [b2g-crash][BTG-1000] → [b2g-crash][BTG-1000][LOE:M]
Comment 8•12 years ago
|
||
Triage with Moz Taipei team:
minidump code without r0 is not helping the team to debug further.
Can mvines either provide STR so that Taipei can reproduce locally? or provide another minidump with r0? Thanks
Flags: needinfo?(mvines)
Reporter | ||
Comment 9•12 years ago
|
||
Let's leave this open for another couple days to see if we can catch this again in automation with hopefully more helpful logs
Flags: needinfo?(mvines)
Comment 10•12 years ago
|
||
Hi, Michael,
If you reproduce this issue again, can you please also provide the RIL binary code of the test image to us?
Reporter | ||
Comment 11•12 years ago
|
||
Haven't seen this come up in a couple days of testing so optimistically withdrawing
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 12•12 years ago
|
||
Crash came back in AU 185. App that's crashing is communications.
Top frames (will attach full minidump too...sorry no r0 still)
-----------
Crash reason: SIGSEGV
Crash address: 0x0
Thread 0 (crashed)
0 0x0
r4 = 0x00000000 r5 = 0xbe8df108 r6 = 0xbe8df108 r7 = 0xbe8df028
r8 = 0x00000005 r9 = 0x41219d98 r10 = 0xbe8df330 fp = 0x44767450
sp = 0xbe8defe0 lr = 0x40870877 pc = 0x00000000
Found by: given as instruction pointer in context
1 libxul.so!mozilla::dom::telephony::Telephony::RILTelephonyCallback::CallStateChanged [Telephony.h : 109 + 0x11]
sp = 0xbe8deff0 pc = 0x40870861
Found by: stack scanning
2 libxul.so!js::PropertyTree::insertChild [Utility.h : 153 + 0x3]
r4 = 0xbe8df108 r5 = 0x432e5dd8 sp = 0xbe8df008 pc = 0x40e8bae9
Found by: call frame info
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Reporter | ||
Comment 13•12 years ago
|
||
Comment 14•12 years ago
|
||
This might be an issue with commercial RIL. Will reopen if we see the issue again.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → INVALID
Comment 15•12 years ago
|
||
Crash came back in AU 189.
rash reason: SIGSEGV
Crash address: 0x0
Thread 0 (crashed)
0 0x0
r4 = 0x00000000 r5 = 0xbeb07100 r6 = 0xbeb07100 r7 = 0xbeb07020
r8 = 0x00000005 r9 = 0x412834c8 r10 = 0xbeb07328 fp = 0x445d9f60
sp = 0xbeb06fd8 lr = 0x408d8b67 pc = 0x00000000
Found by: given as instruction pointer in context
1 libxul.so!mozilla::dom::telephony::Telephony::RILTelephonyCallback::CallStateChanged [Telephony.h : 109 + 0x11]
sp = 0xbeb06fe8 pc = 0x408d8b51
Found by: stack scanning
2 0xbeb07316
r4 = 0xbeb07100 r5 = 0x00000001 sp = 0xbeb07000 pc = 0xbeb07318
Found by: call frame info
3 libxul.so!XPCWrappedNative::CallMethod [XPCWrappedNative.cpp : 3083 + 0xd]
sp = 0xbeb07050 pc = 0x4099ab87
Found by: stack scanning
4 libxul.so!XPC_WN_CallMethod [XPCWrappedNativeJSOps.cpp : 1469 + 0x7]
r4 = 0x4260bf10 r5 = 0xbeb07400 r6 = 0x4441d700 r7 = 0x00000001
r8 = 0x42948108 r9 = 0x00000004 r10 = 0x414fbb58 fp = 0xbeb078d8
sp = 0xbeb073e0 pc = 0x4099f3e5
Found by: call frame info
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Comment 16•12 years ago
|
||
Comment 17•12 years ago
|
||
Anshul,
Sadly this happened again. To further investigate this issue, we need more information from you. Would you please help provide us the following?
1) complete logcat of the test run
2) RIL binary code of the test image
3) code version you were testing with
4) better with the detailed test steps
We do need them for debugging and resolving this issue. Thank you.
Updated•12 years ago
|
Flags: needinfo?(anshulj)
Updated•12 years ago
|
status-b2g18:
--- → affected
status-b2g18-v1.0.0:
--- → affected
Comment 18•12 years ago
|
||
Hsin-Yi, please find the information you requested.
- Log file attached
- You can get the RIL binary from AU 189
- Steps to reproduce
- Start Camera, Call, Camcorder, Call
- Start music at background
- Repeat above steps
The crash happened after the above test case was run over night by our stability team.
Flags: needinfo?(anshulj)
Comment 19•12 years ago
|
||
Comment 20•12 years ago
|
||
(In reply to Anshul from comment #18)
> Hsin-Yi, please find the information you requested.
>
> - Log file attached
> - You can get the RIL binary from AU 189
> - Steps to reproduce
> - Start Camera, Call, Camcorder, Call
> - Start music at background
> - Repeat above steps
>
> The crash happened after the above test case was run over night by our
> stability team.
Hi Anshul,
Thanks for providing what we requested. We investigated the full log, however we didn't see more clues about communications app crash there. Our QA is helping reproduce this issue with QC RIL, and hopefully to gather more hints. We've been following your steps over dozens of iterations but still not made it yet. We are wondering whether your team is testing this issue by a test script. If so, would it be possible to provide us your test script? That would be very much appreciated and a great help. Thanks again :)
Flags: needinfo?(anshulj)
Comment 21•12 years ago
|
||
Anshul,
Me again ;-) Not sure whether it's possible you could help us to get more info when you run the next stability test. In case it is, here are the steps to get more debug messages.
1) edit |B2G/.userconfig|, add this line "export B2G_DEBUG=1" Then delete the obj directory & re-build.
2) In B2G/, run this line |./run-gdb.sh attach [pid communications app]|. Once the seg. fault appears again, type 'bt' to get stack.
![]() |
||
Updated•12 years ago
|
QA Contact: atsai
Comment 22•12 years ago
|
||
Anshul,
Though having 3 people to do this test during these days, we still can not reproduce this bug...:(
Could you also help to reproduce this bug and get logs of what Hsinyi said to us? Thanks.
Comment 23•12 years ago
|
||
According to the minidump, the crash resulted from the null r4. After analyzing the assembly code, we, some RIL team members, conjectured something wrong when RIL gets RILTelephonyCallback that tried calling mTelephony->CallStateChanged() [1]. According to the assembly, RILTelephonyCallback looked fine but mTelephony [2] seemd null. I said 'seemed' because the minidump didn't have enough information and that's we reasoned from the assembly and null r4.
We are discussing whether it's possible that mTelephony got freed due to cycle collection even when RILTelephonyCallback is valid, and whether we should make mTelephony reference counted. May I have your feedback of this, Ben? Thanks!
[1] http://mxr.mozilla.org/mozilla-central/source/dom/telephony/Telephony.h#113
[2] http://mxr.mozilla.org/mozilla-central/source/dom/telephony/Telephony.h#109
Flags: needinfo?(bent.mozilla)
Assignee | ||
Comment 24•12 years ago
|
||
I don't see any place where we set RILTelephonyCallback::mTelephony could become null, even during cycle collection.
However, it seems entirely possible that if any component (RIL, something else?) holds RILTelephonyCallback alive after we call UnregisterTelephonyCallback that we would end up crashing later once the Telephony object has been deleted. The Telephony code is doing the right thing here so we would want to fix the RIL. That may not be possible, though, so we could be safer by guarding against this case.
Flags: needinfo?(bent.mozilla)
Assignee | ||
Comment 25•12 years ago
|
||
Here is a patch that would guard against a buggy RIL (or some other component) holding our callback alive longer than it should. Please give this a try and see if it fixes the crash?
Assignee | ||
Comment 26•12 years ago
|
||
Oh, and we definitely do not want to make RILTelephonyCallback::mTelephony into a strong reference since it would create a cycle between the RIL and the Telephony objects. The only way to fix that would be to make the RIL cycle-collected, which is a) hard, and b) maybe impossible if we are using custom RIL implementations.
Reporter | ||
Comment 27•12 years ago
|
||
Is this potential cycle wholly contained within a content process? It looks like it but I only skimmed the code. Known custom RIL implementations are only running code in the main process.
Assignee | ||
Comment 28•12 years ago
|
||
I actually don't know, it depends on how the remoting for the telephony stuff was done. The cycle I'm talking about would be in the content process for sure and maybe also in the parent process.
Reporter | ||
Comment 29•12 years ago
|
||
Anshul, can you please take a closer look at this possible cycle from the parent process POV?
Assignee | ||
Comment 30•12 years ago
|
||
Well, to be clear, the cycle I'm talking about is the reason why we can't make a strong ref from RIL -> RILTelephonyCallback -> Telephony. We don't have that now so this isn't an actual problem.
The thing I would check for in custom RIL implementations is any place where nsIRILTelephonyCallback methods are called after the callback has been unregistered. That's my best guess for what's causing the current crash.
Comment 31•12 years ago
|
||
Ben, I will make a change in custom RIL to make sure the callstatecange notification is not sent after the telephony callback is unregistered. For now I am marking this change invalid and will reopen if I see it again.
Thanks for helping out getting to the bottom of the issue.
Flags: needinfo?(anshulj)
Comment 32•12 years ago
|
||
(In reply to Anshul from comment #31)
> Ben, I will make a change in custom RIL to make sure the callstatecange
> notification is not sent after the telephony callback is unregistered. For
> now I am marking this change invalid and will reopen if I see it again.
Did you mean to resolve this bug as INVALID?
Flags: needinfo?(anshulj)
Reporter | ||
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Flags: needinfo?(anshulj)
Resolution: --- → INVALID
Comment 33•12 years ago
|
||
(In reply to ben turner [:bent] from comment #25)
> Created attachment 711836 [details] [diff] [review]
> Bandaid patch
>
> Here is a patch that would guard against a buggy RIL (or some other
> component) holding our callback alive longer than it should. Please give
> this a try and see if it fixes the crash?
Ben, thanks for your insights and the patch!
Comment 34•12 years ago
|
||
Batch edit: Bugs still affected on b2g18 after 2/13 merge to v1.0.1 branch are affected on v1.0.1 branch.
status-b2g18-v1.0.1:
--- → affected
Assignee | ||
Comment 35•12 years ago
|
||
(In reply to Anshul from comment #31)
> Ben, I will make a change in custom RIL to make sure the callstatecange
> notification is not sent after the telephony callback is unregistered.
Well, I'd still probably like to check this patch in since it guards against this kind of problem. It won't be high priority if your custom RIL is changed in the meantime, of course, but some day this could come back without further protection.
Andrew, can you set whatever flags that should require? :)
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Assignee | ||
Comment 36•12 years ago
|
||
Comment on attachment 711836 [details] [diff] [review]
Bandaid patch
Justin, we chatted about this last week, feel comfortable reviewing it?
Attachment #711836 -
Flags: review?(justin.lebar+bug)
Comment 37•12 years ago
|
||
> Justin, we chatted about this last week, feel comfortable reviewing it?
I have zero recollection of this conversation, but your patch looks good to me. :)
Updated•12 years ago
|
Attachment #711836 -
Flags: review?(justin.lebar+bug) → review+
Comment 38•12 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #37)
> > Justin, we chatted about this last week, feel comfortable reviewing it?
>
> I have zero recollection of this conversation, but your patch looks good to
> me. :)
Hi Ben, is it ready for m-c?
Assignee | ||
Comment 39•12 years ago
|
||
Comment 40•12 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 41•12 years ago
|
||
Changing ownership to Bent since he gave the last push! Thanks again for helping this out :)
Assignee: htsai → bent.mozilla
Comment 42•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g18/rev/7fc827339cc7
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/98354c0298ab
status-firefox19:
--- → wontfix
status-firefox20:
--- → wontfix
status-firefox21:
--- → fixed
Target Milestone: --- → B2G C4 (2jan on)
Comment 44•12 years ago
|
||
verified with 1.1 build of 20130322070202
Comment 45•12 years ago
|
||
(In reply to Tracy Walker [:tracy] from comment #44)
> verified with 1.1 build of 20130322070202
Btw, if you end up verifying on b2g18, I'd mark the general workflow flag as verified as well.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•