Closed Bug 939373 Opened 6 years ago Closed 6 years ago

[DSDS] Outgoing and incoming call on secondary subscription is causing a crash

Categories

(Firefox OS Graveyard :: RIL, defect, P1, critical)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:1.3+, firefox26 wontfix, firefox27 wontfix, firefox28 fixed, b2g-v1.2 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed)

RESOLVED FIXED
1.3 Sprint 6 - 12/6
blocking-b2g 1.3+
Tracking Status
firefox26 --- wontfix
firefox27 --- wontfix
firefox28 --- fixed
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed

People

(Reporter: anshulj, Assigned: khu)

References

()

Details

Attachments

(4 files, 1 obsolete file)

Hsin-Yi, when I am trying to make an outgoing call or receiving an incoming call on the secondary subscription I see a message on the phone that says "b2g recovered from crash" and I don't see dialer starting at all.

Wondering if you guys have tested this scenario. By the way SMS is going out fine on secondary subscription.
Can we attach the minidump backtrace from the crash here?
Attached file Minidump stack trace
(In reply to Anshul from comment #0)
> Hsin-Yi, when I am trying to make an outgoing call or receiving an incoming
> call on the secondary subscription I see a message on the phone that says
> "b2g recovered from crash" and I don't see dialer starting at all.
> 
> Wondering if you guys have tested this scenario. By the way SMS is going out
> fine on secondary subscription.

I could dial out a call on 2nd SIM card. I can also receive an incoming call on 2nd sim card while Dialer app is launched as expected.

Per minidump bt, seems something wrong in DOMRequest.cpp. However, telephony code path doesn't touch DOMRequest at all, so I don't see any obvious clues. Maybe you could "./run-gdb.sh" to get more information when crash?
Hsin-Yi, I am having trouble causing gdb to break when there is a crash. I am investigating that but I am still seeing the issue consistently even after trying the latest gecko/gaia from Moz central. Is there someone who is familiar with DOMRequest that can help look at it? Or are there any logs that I can enable to help confirm that this is not a Telephony issue?
Is this a dup of bug 934563?
(In reply to Anshul from comment #5)
> Is this a dup of bug 934563?

Based on the symptom of minidump, it's highly possible.

Are you seeing this bug during automatic testing?
(In reply to Anshul from comment #4)
> Hsin-Yi, I am having trouble causing gdb to break when there is a crash. I
> am investigating that but I am still seeing the issue consistently even
> after trying the latest gecko/gaia from Moz central. Is there someone who is
> familiar with DOMRequest that can help look at it? Or are there any logs
> that I can enable to help confirm that this is not a Telephony issue?

[1] shows 3 possible places touching FireSuccessAsync and one is in BT code. 
As BT listens events from TelephonyProvider, I am CCing BT developers to see if they have seen some clues here.

Also, it could help if you provide detailed STR, Anshul. Thanks!

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=934563#c9
Hsin-Yi, I am seeing this issue during manual testing.


STR
- Insert two SIM cards, one on each slot
- Enable SIM1 and SIM2 (Gaia setting not available yet so we are hacking our code to achieve this)
- Make an incoming call destined for SIM2.

Expected: The dialer starts up and shows the incoming call
Observed: Dialer doesn't start and I see a message that says "B2G recovered from crash".
Hsin-Yi, I confirmed that the crash is in the communications app. I am able to attach gdb specifically to the communications app and get the back trace. Back trace below is almost identical to the one I pulled from the minidump.

#0  ~FireSuccessAsyncTask (this=0xb035ad40, __in_chrg=<optimized out>) at /local/mnt/workspace/jb.b2g_mr2/gecko/dom/base/DOMRequest.cpp:312
#1  FireSuccessAsyncTask::~FireSuccessAsyncTask (this=0xb035ad40, __in_chrg=<optimized out>)
    at /local/mnt/workspace/jb.b2g_mr2/gecko/dom/base/DOMRequest.cpp:302
#2  0xb5c49860 in FireSuccessAsyncTask::~FireSuccessAsyncTask (this=0xb035ad40, __in_chrg=<optimized out>)
    at /local/mnt/workspace/jb.b2g_mr2/gecko/dom/base/DOMRequest.cpp:316
#3  0xb57177f8 in Release (this=0xb035ad40) at /local/mnt/workspace/jb.b2g_mr2/gecko/xpcom/glue/nsThreadUtils.cpp:32
#4  nsRunnable::Release (this=0xb035ad40) at /local/mnt/workspace/jb.b2g_mr2/gecko/xpcom/glue/nsThreadUtils.cpp:32
#5  0xb567c852 in nsCOMPtr_base::~nsCOMPtr_base (this=0xbeb65de4, __in_chrg=<optimized out>) at ../../../../dist/include/nsCOMPtr.h:430
#6  0xb5682cb0 in nsCOMPtr<nsIPrefBranch>::~nsCOMPtr (this=0xbeb65de4, __in_chrg=<optimized out>) at ../../../../dist/include/nsCOMPtr.h:469
#7  0xb5740eec in nsThread::ProcessNextEvent (this=0xb48d6b80, mayWait=<optimized out>, result=0xbeb65e17)
    at /local/mnt/workspace/jb.b2g_mr2/gecko/xpcom/threads/nsThread.cpp:618
#8  0xb5717d02 in NS_ProcessNextEvent (thread=<optimized out>, mayWait=<optimized out>)
    at /local/mnt/workspace/jb.b2g_mr2/gecko/xpcom/glue/nsThreadUtils.cpp:251
#9  0xb58619d8 in mozilla::ipc::MessagePump::Run (this=0xb4801b80, aDelegate=0xbeb65f18)
    at /local/mnt/workspace/jb.b2g_mr2/gecko/ipc/glue/MessagePump.cpp:85
#10 0xb5856c5e in MessageLoop::RunInternal (this=<optimized out>) at /local/mnt/workspace/jb.b2g_mr2/gecko/ipc/chromium/src/base/message_loop.cc:220
#11 0xb5856d10 in RunHandler (this=0xbeb65f18) at /local/mnt/workspace/jb.b2g_mr2/gecko/ipc/chromium/src/base/message_loop.cc:213
#12 MessageLoop::Run (this=0xbeb65f18) at /local/mnt/workspace/jb.b2g_mr2/gecko/ipc/chromium/src/base/message_loop.cc:187
#13 0xb5bd4a02 in nsBaseAppShell::Run (this=0xb37b08e0) at /local/mnt/workspace/jb.b2g_mr2/gecko/widget/xpwidgets/nsBaseAppShell.cpp:161
#14 0xb617423a in XRE_RunAppShell () at /local/mnt/workspace/jb.b2g_mr2/gecko/toolkit/xre/nsEmbedFunctions.cpp:679
#15 0xb5856c5e in MessageLoop::RunInternal (this=<optimized out>) at /local/mnt/workspace/jb.b2g_mr2/gecko/ipc/chromium/src/base/message_loop.cc:220
#16 0xb5856d10 in RunHandler (this=0xbeb65f18) at /local/mnt/workspace/jb.b2g_mr2/gecko/ipc/chromium/src/base/message_loop.cc:213
#17 MessageLoop::Run (this=0xbeb65f18) at /local/mnt/workspace/jb.b2g_mr2/gecko/ipc/chromium/src/base/message_loop.cc:187
#18 0xb61745c2 in XRE_InitChildProcess (aArgc=5, aArgv=<optimized out>, aProcess=<optimized out>)
    at /local/mnt/workspace/jb.b2g_mr2/gecko/toolkit/xre/nsEmbedFunctions.cpp:516
#19 0x0000864a in main (argc=7, argv=0xbeb66894) at /local/mnt/workspace/jb.b2g_mr2/gecko/ipc/app/MozillaRuntimeMain.cpp:137

Please let me know if this helps. This really blocking most of our DSDS testing so appreciate your help.
Flags: needinfo?(khu)
Flags: needinfo?(htsai)
Joe, can you help to find someone to check the Comms app crash issue? Thanks.
Flags: needinfo?(khu) → needinfo?(jcheng)
Flags: needinfo?(jcheng)
Etienne, does you believe this crash is in the Dialer app? Thanks
Flags: needinfo?(etienne)
I just checked my local codes. Although I am not sure if my codes are the same as what we had when the crash happened. But, I have a very quick look in DOMRequest.cpp(Line 309 - Line 313): 

    nsresult rv;
    nsIScriptContext* sc = mReq->GetContextForEventHandlers(&rv);
    MOZ_ASSERT(NS_SUCCEEDED(rv));
    AutoPushJSContext cx(sc->GetNativeContext());
    MOZ_ASSERT(cx);

Would it be better to check if sc pointer is valid or not before calling sc->GetNativeContext()? What if GetContextForEventHandlers() return NULL?...
(In reply to Kevin Hu [:khu] from comment #12)
> I just checked my local codes. Although I am not sure if my codes are the
> same as what we had when the crash happened. But, I have a very quick look
> in DOMRequest.cpp(Line 309 - Line 313): 
> 
>     nsresult rv;
>     nsIScriptContext* sc = mReq->GetContextForEventHandlers(&rv);
>     MOZ_ASSERT(NS_SUCCEEDED(rv));
>     AutoPushJSContext cx(sc->GetNativeContext());
>     MOZ_ASSERT(cx);
> 
> Would it be better to check if sc pointer is valid or not before calling
> sc->GetNativeContext()? What if GetContextForEventHandlers() return NULL?...

We had the same issue for Bluetooth as well and the null check was added in Bug 934552. The rate of returning nullptr seemed to be raised for some reasons.
(In reply to Joe Cheng [:jcheng] from comment #11)
> Etienne, does you believe this crash is in the Dialer app? Thanks

Looks like it's happening at a lower level, clearing my ni.
Flags: needinfo?(etienne)
I reviewed the whole source file and found we have huge amount of similar issues...anyway, for this one, because it's in the destructor of FireSuccessAsyncTask, looks like we would like to call JS_RemoveValueRoot() with AutoPushJSContext object. If sc is a NULL pointer, it's meaningless to call JS_RemoveValueRoot(), IMO. So, maybe we can fix it just like: 

    nsresult rv;
    nsIScriptContext* sc = mReq->GetContextForEventHandlers(&rv);
    if(!sc){
        return;
    }
    MOZ_ASSERT(NS_SUCCEEDED(rv));
    AutoPushJSContext cx(sc->GetNativeContext());
    MOZ_ASSERT(cx);

Would it be okay? Or, we have other better ways? Thanks.
If you can aren't able to test this fix, please let me know and I can test the patch for you as I am able to very easily reproduce this issue.
Thanks, Anshul. If my understanding is correct, you can't reproduce this bug always, right? In this case, I think it might be possible that it is a timing issue, which means, that, sometimes, the context from mReq was deleted before the line of code is executed. Adding error handling that I suggested should fix the crash, but not sure if there will have other memory leak issues or other issues due to lack of calling JS_RemoveValueRoot(). No matter what, I think NULL pointer check is the right thing to do.
Attached file DOMRequest.cpp (obsolete) —
Anshul, I am thinking to fix it like the attachment. Not sure if you could help to verify? Sorry for the inconvenience. Thank you.
Attachment #8337500 - Flags: feedback?(anshulj)
Anshul, I made a patch based on what Eric taught me. Could you help to verify? Thanks!
Attachment #8337520 - Flags: feedback?(anshulj)
Attachment #8337500 - Flags: feedback?(anshulj)
Attachment #8337500 - Attachment is obsolete: true
Anshul said no crash anymore after using this patch.
Clearing ni due to comment 20.
Flags: needinfo?(htsai)
Kevin, as we talked on the phone the original crash is resolved with the patch you provided but I see two different crashes.

1. Crash when making an outgoing call on SIM 2 (stack_2.txt attached)
2. Crash when simply opening the dialer (stack_3.txt attached)

Neither of these two crashes are 100% reproducible.
Attached file stack_2.txt
Attached file stack_3.txt
Attachment #8337520 - Flags: feedback?(anshulj) → feedback+
Thanks, Anshul. 

Would you mind submitting different bugs for these crashes? Thank you.
Comment on attachment 8337520 [details] [diff] [review]
0001-Kevin-s-patch.patch

Hi, Kyle, I heard that you could do the review for my patch here. Could you help? Thank you!
Attachment #8337520 - Flags: review?(khuey)
blocking-b2g: 1.3? → 1.3+
Assignee: nobody → khu
Kevin, could you please help to move this bug along?
Severity: normal → critical
Priority: -- → P1
Comment on attachment 8337520 [details] [diff] [review]
0001-Kevin-s-patch.patch

I don't think this should be null ... it seems like there are deeper problems here.

This appears to be related to bug 934563.
Attachment #8337520 - Flags: review?(khuey)
Anshul, can you test with the patchin bug 934563?
Flags: needinfo?(anshulj)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #28)
> Comment on attachment 8337520 [details] [diff] [review]
> 0001-Kevin-s-patch.patch
> 
> I don't think this should be null ... it seems like there are deeper
> problems here.
> 
> This appears to be related to bug 934563.

Thanks, Kyle. I agreed that this should not be NULL, and we should fix other parts to avoid this problem. Even though, we can also add additional protection here, isn't it? :)
(In reply to Kevin Hu [:khu] from comment #30)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #28)
> > Comment on attachment 8337520 [details] [diff] [review]
> > 0001-Kevin-s-patch.patch
> > 
> > I don't think this should be null ... it seems like there are deeper
> > problems here.
> > 
> > This appears to be related to bug 934563.
> 
> Thanks, Kyle. I agreed that this should not be NULL, and we should fix other
> parts to avoid this problem. Even though, we can also add additional
> protection here, isn't it? :)

If mIsSetup is true, we have to call JS_RemoveValueRoot at the end of that function, so returning early is not an option.  The patch I posted in bug 934563 fixes this by using a JSContext* that can never be null.
I am not seeing a crash on outgoing/incoming call on SIM 2 anymore.
Flags: needinfo?(anshulj)
Seems like this is resolved.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Depends on: 934563
Target Milestone: --- → 1.3 Sprint 6 - 12/6
Flags: in-moztrap?
Flags: in-moztrap? → in-moztrap+
You need to log in before you can comment on or make changes to this bug.