Closed
Bug 993954
Opened 11 years ago
Closed 11 years ago
dom::bluetooth::BluetoothChild::RecvNotify crashes during monkey testing
Categories
(Firefox OS Graveyard :: Bluetooth, defect)
Tracking
(blocking-b2g:1.4+, firefox30 wontfix, firefox31 wontfix, firefox32 fixed, b2g-v1.4 fixed, b2g-v2.0 fixed)
People
(Reporter: tkundu, Assigned: shawnjohnjr)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][qa-])
Attachments
(3 files, 3 obsolete files)
No description provided.
Reporter | ||
Updated•11 years ago
|
blocking-b2g: --- → 1.4?
Assignee | ||
Comment 1•11 years ago
|
||
Any log or back trace? And on which device?
Flags: needinfo?(tkundu)
Reporter | ||
Comment 2•11 years ago
|
||
stack trace for settings app crash.
STR:
1) Run monkey testing on device.
Reporter | ||
Comment 3•11 years ago
|
||
Flags: needinfo?(tkundu)
Assignee | ||
Comment 4•11 years ago
|
||
back trace pattern is very similar with bug 795458.
Reporter | ||
Comment 5•11 years ago
|
||
Remove v1.4? blocking tag for now as this issue was seen only on msm8610 .
blocking-b2g: 1.4? → ---
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → shuang
Assignee | ||
Comment 6•11 years ago
|
||
I suspect that sBluetoothService had been deleted during Bluetooth on/off which leads SIGSEGV.
1 libxul.so!mozilla::dom::bluetooth::BluetoothChild::RecvNotify(mozilla::dom::bluetooth::BluetoothSignal const&) [BluetoothChild.cpp : 78 + 0x9]
sp = 0xbee4aa98 pc = 0xb5bf368d
Found by: stack scanning
2 libxul.so!mozilla::dom::bluetooth::PBluetoothChild::OnMessageReceived(IPC::Message const&) [PBluetoothChild.cpp : 364 + 0x9]
sp = 0xbee4aaa0 pc = 0xb5724811
Found by: call frame info
Updated•11 years ago
|
blocking-b2g: --- → 1.4?
Comment 7•11 years ago
|
||
Sorry, that was my re-nom for 1.4. 8610 applies for 1.4 as well.
Comment 8•11 years ago
|
||
(In reply to Greg Grisco from comment #7)
> Sorry, that was my re-nom for 1.4. 8610 applies for 1.4 as well.
Can you clarify what you mean by 8610? Is this supposed to be CS blocking?
Flags: needinfo?(ggrisco)
Comment 9•11 years ago
|
||
Jason that's referring to the chipset in question. Per triage yes it blocks QC.
Flags: needinfo?(ggrisco)
Updated•11 years ago
|
blocking-b2g: 1.4? → 1.4+
Assignee | ||
Updated•11 years ago
|
Target Milestone: --- → 1.4 S6 (25apr)
Assignee | ||
Comment 10•11 years ago
|
||
Based on the stack, Setting app crashed in BluetoothChild, and probably access nullptr (sBluetoothService).
#0 mozilla::dom::bluetooth::BluetoothChild::RecvNotify (this=0x496bdc70, aSignal=...) at /home/shawnjohnjr/b2g-latest/B2G/gecko/dom/bluetooth/ipc/BluetoothChild.cpp:78
#1 0x4084f886 in mozilla::dom::bluetooth::PBluetoothChild::OnMessageReceived (this=0x496bdc70, __msg=...) at /home/shawnjohnjr/b2g-latest/B2G/objdir-gecko/ipc/ipdl/PBluetoothChild.cpp:364
#2 0x40870fb4 in mozilla::dom::PContentChild::OnMessageReceived (this=0x4033f018, __msg=...) at /home/shawnjohnjr/b2g-latest/B2G/objdir-gecko/ipc/ipdl/PContentChild.cpp:3417
#3 0x40833be2 in mozilla::ipc::MessageChannel::DispatchAsyncMessage (this=0x4033f048, aMsg=...) at /home/shawnjohnjr/b2g-latest/B2G/gecko/ipc/glue/MessageChannel.cpp:1142
Assignee | ||
Comment 11•11 years ago
|
||
Hi Thomas,
What do you think if we can add protection here?
Attachment #8410742 -
Flags: feedback?(tzimmermann)
Assignee | ||
Comment 12•11 years ago
|
||
The frame 0 seems occurpt based on attachment 0. The PC is 0x00000000.
0 0x0
r4 = 0x00000002 r5 = 0xbee4aab8 r6 = 0xb2337c9c r7 = 0x00000000
r8 = 0x00000002 r9 = 0x00000001 r10 = 0x00000000 fp = 0xbee4b79c
sp = 0xbee4aa80 lr = 0xb5bf2fe9 pc = 0x00000000
Found by: given as instruction pointer in context
Maybe PC=0 causes by function pointer is null, and maybe sBluetoothService is not null.
Assignee | ||
Comment 13•11 years ago
|
||
I think this bug only crashes Settings application instead of the whole b2g process, since it's at child process side.
Assignee | ||
Comment 14•11 years ago
|
||
I guess when |BluetoothServiceChildProcess::NoteDeadActor| gets called, BluetoothChild destructor called, and |BluetoothServiceChildProcess* gBluetoothService| can be nullptr, at the same time |PBluetoothChild::OnMessageReceived| gets message from MessageLoop, which calls |RecvNotify()|. But I cannot explain why PC=0 in frame 0.
Comment 15•11 years ago
|
||
IRC log:
>>>>
<shawnjohnjr> hello
<tzimmermann> hi
<shawnjohnjr> busy now?
<tzimmermann> kind of, but go on
<tzimmermann> are you asking about bug 993954?
<shawnjohnjr> Any chance to feedback bug 993954 today?
<shawnjohnjr> yap
<tzimmermann> i'll take a look immediately
<shawnjohnjr> it's monkey testing, so i cannot reproduce it easily.
<tzimmermann> there are already assertions for sBluetoothService
<tzimmermann> does that happen on bluez or bluedroid?
<shawnjohnjr> bluez
<shawnjohnjr> Because frame 0 PC = 0, I can only guess gBluetoothService->DistributeSignal is null
<shawnjohnjr> But it shouldn't
<tzimmermann> one more second...
<shawnjohnjr> ok
<tzimmermann> (pc == 0) happened when calling BluetoothChild::RecvNotify, right?
<tzimmermann> that method is virtual, if BluetoothChild has been free'd, you might call access and invalid vtable and jump to address 0
<shawnjohnjr> Around gBluetoothService->DistributeSignal
<tzimmermann> oh, ok
<shawnjohnjr> ok
<tzimmermann> and sBluetoothService has probably been free'd but the variable has not been cleared to nullptr
<tzimmermann> that happens in the destructor of BluetoothService
<tzimmermann> sorry, that happens in the desctuctor of BluetoothChild!
<shawnjohnjr> yap
<tzimmermann> but i couldn't find any call to the destructor of BluetoothChild
<shawnjohnjr> I wrote in Comment 14...
<shawnjohnjr> NoteDeadActor
<tzimmermann> but NoteDeadActor only clears sBluetoothChild!
<tzimmermann> it does not call the destructor afaics
<tzimmermann> sBluetoothChild is not ref-counted
<shawnjohnjr> ok
<tzimmermann> or auto_ptr'ed, or something similar
<shawnjohnjr> you are right
<shawnjohnjr> it's not
<tzimmermann> so i'd guess that the destructor of BluetoothChild never runs and never clears sBluetoothService
<shawnjohnjr> Then my assumption is not correct :(
<tzimmermann> can you reproduce the bug?
<shawnjohnjr> i cannot reproduce this bug.
<tzimmermann> :/
<shawnjohnjr> In theory, monkey testing just randomly click
<tzimmermann> i'd try to delete sBluetoothChild, and give it back to monkey testers for verification.
<shawnjohnjr> ok
<shawnjohnjr> this is something helpful
<tzimmermann> glad to help :) i'll add the conversion to the bug report
<shawnjohnjr> i don't have better idea currently.
<shawnjohnjr> Thanks
Assignee | ||
Updated•11 years ago
|
Attachment #8410742 -
Attachment is obsolete: true
Attachment #8410742 -
Flags: feedback?(tzimmermann)
Comment 16•11 years ago
|
||
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #15)
> <tzimmermann> there are already assertions for sBluetoothService
> <tzimmermann> does that happen on bluez or bluedroid?
> <shawnjohnjr> bluez
> <shawnjohnjr> Because frame 0 PC = 0, I can only guess
Please note we have switched over bluedroid stack
Comment 17•11 years ago
|
||
(In reply to bhargavg1 from comment #16)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #15)
> > <tzimmermann> there are already assertions for sBluetoothService
> > <tzimmermann> does that happen on bluez or bluedroid?
> > <shawnjohnjr> bluez
> > <shawnjohnjr> Because frame 0 PC = 0, I can only guess
>
> Please note we have switched over bluedroid stack
What we've found is unrelated to the BT driver. And at some point, BT will probably be supported on the desktop. And then we'll need BlueZ.
Assignee | ||
Comment 18•11 years ago
|
||
There is any chance I can download symbol file of libxul.so?
Flags: needinfo?(tkundu)
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(tkundu)
Assignee | ||
Comment 19•11 years ago
|
||
I still think sBluetoothService could be nullptr, and at the same time, called RecvNotify() from MessageLoop, which leads content process (Settings app) crashes. I don't have other explanation to fit the back trace.
And I found |BluetoothRequestChild::~BluetoothRequestChild()| will be called if we power off the device manually on Nexus4.
Assignee | ||
Comment 20•11 years ago
|
||
Attachment #8412539 -
Flags: review?(echou)
Comment 21•11 years ago
|
||
Comment on attachment 8412539 [details] [diff] [review]
Bug 993954 - Check sBluetoothChild is null when closing Settings application
Review of attachment 8412539 [details] [diff] [review]:
-----------------------------------------------------------------
I don't think it's correct to delete sBluetoothChild and sBluetoothService manually. First, without settings these two pointers to nullptr, it might cause a crash when RecvNotify() is called fater delete, which also means the additional if-check would become useless. In my opinion, you could declare nsRefPtr<BluetoothService> as a member variable of BluetoothChild so you don't need to worry about memory free/allocate. Second, please add protection for RecvEnabled() as well.
There is a bug which is very similar to our case and Ben Turner left a comment which might be helpful. Please see https://bugzilla.mozilla.org/show_bug.cgi?id=767082 for more details. To be honest I'm not sure how this happened even after investigating related code. I'd like to propose that let's add if-protection and not change the null out behavior to delete. Leak is better than crash to me.
Attachment #8412539 -
Flags: review?(echou) → review-
Assignee | ||
Updated•11 years ago
|
Attachment #8412539 -
Attachment is obsolete: true
Assignee | ||
Comment 22•11 years ago
|
||
Attachment #8414264 -
Flags: review?(echou)
Assignee | ||
Updated•11 years ago
|
Attachment #8414264 -
Attachment is obsolete: true
Attachment #8414264 -
Flags: review?(echou)
Assignee | ||
Comment 23•11 years ago
|
||
Attachment #8414267 -
Flags: review?(echou)
Comment 24•11 years ago
|
||
Comment on attachment 8414267 [details] [diff] [review]
Bug 993954 - Check sBluetoothChild is null when closing Settings application
Review of attachment 8414267 [details] [diff] [review]:
-----------------------------------------------------------------
Shawn, sorry but I have to r- this. In this patch, you add the protection for sBluetoothService, which may be good, but definitely will not solve the original issue.
Attachment #8414267 -
Flags: review?(echou) → review-
Assignee | ||
Updated•11 years ago
|
Target Milestone: 1.4 S6 (25apr) → 2.0 S1 (9may)
Assignee | ||
Comment 25•11 years ago
|
||
(In reply to Eric Chou [:ericchou] [:echou] from comment #24)
> Comment on attachment 8414267 [details] [diff] [review]
> Bug 993954 - Check sBluetoothChild is null when closing Settings application
>
> Review of attachment 8414267 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Shawn, sorry but I have to r- this. In this patch, you add the protection
> for sBluetoothService, which may be good, but definitely will not solve the
> original issue.
ok. I extend my ETA a bit since currently I don't have better solution.
Comment 26•11 years ago
|
||
(In reply to Preeti Raghunath(:Preeti) from comment #9)
> Jason that's referring to the chipset in question. Per triage yes it blocks
> QC.
Hi Preeti,
Can you elaborate on the triage decision on why this blocks? There's no clear repro on this, and the settings app should be recoverable in this case.
Flags: needinfo?(praghunath)
Updated•11 years ago
|
Group: qualcomm-confidential
Comment 29•11 years ago
|
||
Comment on attachment 8414267 [details] [diff] [review]
Bug 993954 - Check sBluetoothChild is null when closing Settings application
ok. After discussed with Shawn, I decided to accept his protection patch. Let's see if QC side could reproduce this after this patch applied.
Shawn, please keep investigating, and please open a followup for this once the patch is proven to work by QC's QA.
Attachment #8414267 -
Flags: review- → review+
Comment 30•11 years ago
|
||
Hi Tapas,
Please apply the r+ patch and see if this can still be reproduced. Thanks.
Flags: needinfo?(tkundu)
Comment 32•11 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #31)
> (In reply to Eric Chou [:ericchou] [:echou] from comment #29)
> > ... QC ...
> >
> > ... QC's ...
>
> We are CAF, not QC :)
ok, sure.
Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Comment 33•11 years ago
|
||
Marking 2 comments as private, rather than the whole bug; we should aim to have as few closed bugs as possible.
Landed as:
remote: https://hg.mozilla.org/integration/b2g-inbound/rev/b908160de1ac
Group: qualcomm-confidential
Keywords: checkin-needed
Reporter | ||
Comment 34•11 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #33)
> Marking 2 comments as private, rather than the whole bug; we should aim to
> have as few closed bugs as possible.
>
> Landed as:
> remote: https://hg.mozilla.org/integration/b2g-inbound/rev/b908160de1ac
I will ask our team to test with this patch and update here soon
Comment 35•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
status-b2g-v1.4:
--- → affected
status-b2g-v2.0:
--- → fixed
Comment 36•11 years ago
|
||
Updated•11 years ago
|
Whiteboard: [b2g-crash] → [b2g-crash][qa-]
Updated•11 years ago
|
Flags: needinfo?(praghunath)
Comment 37•11 years ago
|
||
When I try to reproduce this bug, I don't ever see |BluetoothServiceChildProcess::NoteDeadActor| being executed.
How do I make this happen?
Assignee | ||
Comment 38•11 years ago
|
||
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #37)
> When I try to reproduce this bug, I don't ever see
> |BluetoothServiceChildProcess::NoteDeadActor| being executed.
>
> How do I make this happen?
Comment 15 mentioned NoteDeadActor (will be executed) might not correct. But on Nexus 4, I added logs in |~BluetoothChild()|, the destructor will be called via killing app (long press Home button and kill).
Comment 39•11 years ago
|
||
I don't have any luck reproducing this.
Comment 40•11 years ago
|
||
Shawn
Next steps here. Do we need anything from partners to get ahead?
Flags: needinfo?(shawnjohnjr)
Assignee | ||
Comment 41•11 years ago
|
||
Recently partners run auto tests, so I think let's wait for their feedback if they can hit the problem again.
Flags: needinfo?(shawnjohnjr)
Reporter | ||
Comment 42•11 years ago
|
||
We didn't see it on v1.4 for long time in our stability run. It seems like it is fixed in v1.4
Flags: needinfo?(tkundu)
Comment 43•11 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #42)
> We didn't see it on v1.4 for long time in our stability run. It seems like
> it is fixed in v1.4
Thanks!
Comment 44•10 years ago
|
||
Not enough information with current STR to write test case to address bug.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(rmead)
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(rmead)
Flags: in-moztrap-
You need to log in
before you can comment on or make changes to this bug.
Description
•