Closed
Bug 1074419
Opened 10 years ago
Closed 10 years ago
b2g crashes in nsAString_internal::Assign
Categories
(Firefox OS Graveyard :: Bluetooth, defect, P1)
Tracking
(blocking-b2g:2.1+, firefox34 wontfix, firefox35 wontfix, firefox36 fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)
People
(Reporter: tkundu, Assigned: tzimmermann)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 370][caf priority: p1][CR 729510])
Crash Data
Attachments
(3 files, 2 obsolete files)
594.15 KB,
text/plain
|
Details | |
573.54 KB,
text/plain
|
Details | |
5.35 KB,
patch
|
shawnjohnjr
:
review+
fabrice
:
approval-mozilla-b2g34+
|
Details | Diff | Splinter Review |
STR:
1) Run random stability testing for 24 hours
We are seeing this crash . I attached stack traces and logcat logs :
Reporter | ||
Comment 1•10 years ago
|
||
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.1?
Reporter | ||
Comment 2•10 years ago
|
||
Component: Stability → Bluetooth
Reporter | ||
Comment 4•10 years ago
|
||
Do we know why we are seeing this error in .extra file log ?
09-27 19:16:20.454 202 706 I Gecko : [Parent 202] WARNING: Message requires an excessive number of descriptors channel:a897e000 message-type:-1 header()->num_fds:10131 num_fds:0 fds_i:0: file ../../../../../../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 571
Please also look into bug 1074434 comment 4
Reporter | ||
Comment 5•10 years ago
|
||
Please take a look into Comment 4 and bug 1074434 comment 4. It seems to be something is going wrong during IPC. lsof command does not show 10131 fds.. we are seeing ~300fds for b2g process in lsof command.
Flags: needinfo?(benjamin)
Updated•10 years ago
|
Whiteboard: [CR 729510] → [caf priority: p1][CR 729510]
Updated•10 years ago
|
Whiteboard: [caf priority: p1][CR 729510] → [b2g-crash][caf-crash 370][caf priority: p1][CR 729510]
Comment 6•10 years ago
|
||
Observed on:
Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_2.0.01.04.00.114.093
Moz BuildID: 20140924000243
B2G Version: 2.1
Gecko Version: 34.0a2
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=93a99bea0b40d81bd063f7d8b1964dc1ba35ba7b
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=c896bc5d04b8c08dcddd78195901503a3578a08f
Patches: bug 1055299, bug 1068978, bug 1065511, bug 1068394, bug 1067205, bug 1070431, bug 1073088, bug 1059573, bug 1051636, bug 1051633, bug 1063568
Assignee | ||
Comment 7•10 years ago
|
||
It there a better STR? Something we can do locally?
Assignee | ||
Comment 8•10 years ago
|
||
Hi Tapas,
we suspect bug 1063066 to cause this problem. Is it possible for you to backout the patch of bug 1063066 and test again? We could provide you with a backout patch, if necessary.
Flags: needinfo?(tkundu)
Assignee | ||
Comment 9•10 years ago
|
||
Here's the backout patch for bug 1063066.
Reporter | ||
Comment 11•10 years ago
|
||
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #9)
> Created attachment 8497455 [details] [diff] [review]
> backout_bug_1063066.patch
>
> Here's the backout patch for bug 1063066.
Could you please rebase this patch for v2.1 ?
Flags: needinfo?(tkundu) → needinfo?(tzimmermann)
Reporter | ||
Comment 12•10 years ago
|
||
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #11)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #9)
> > Created attachment 8497455 [details] [diff] [review]
> > backout_bug_1063066.patch
> >
> > Here's the backout patch for bug 1063066.
>
> Could you please rebase this patch for v2.1 ?
I just found that our test build (which has reproduced this issue originally), does not have any fix from bug 1063066 . Our test team also is not using gallery app to reproduce this issue .
I would rather suspect this issue because of closing many fds wrongly (see comment 5) .
@bhavna: Could you please ask some IPC expert to take a look into it ?
Flags: needinfo?(bbajaj)
Reporter | ||
Comment 13•10 years ago
|
||
Comment 14•10 years ago
|
||
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #12)
> (In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from
> comment #11)
> > (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #9)
> > > Created attachment 8497455 [details] [diff] [review]
> > > backout_bug_1063066.patch
> > >
> > > Here's the backout patch for bug 1063066.
> >
> > Could you please rebase this patch for v2.1 ?
>
> I just found that our test build (which has reproduced this issue
> originally), does not have any fix from bug 1063066 . Our test team also is
> not using gallery app to reproduce this issue .
>
> I would rather suspect this issue because of closing many fds wrongly (see
> comment 5) .
>
> @bhavna: Could you please ask some IPC expert to take a look into it ?
Ni :cyu, :khuey to see if they have any other ideas here or give you a patch to get more logs. Hopefully they can help or pass it to the right person.
blocking-b2g: 2.1? → 2.1+
Flags: needinfo?(khuey)
Flags: needinfo?(cyu)
Flags: needinfo?(bbajaj)
The stack is in bluetooth code, so the bluetooth team needs to look at this.
Flags: needinfo?(khuey)
Flags: needinfo?(cyu)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> The stack is in bluetooth code, so the bluetooth team needs to look at this.
Thomas and I are working on this bug.
Assignee | ||
Comment 17•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> The stack is in bluetooth code, so the bluetooth team needs to look at this.
Right, but I don't know the available logging options for memory leaks. I remember you were working on this topic. Can you provide some information or pointers to documentation?
Flags: needinfo?(tzimmermann) → needinfo?(khuey)
Assignee | ||
Comment 18•10 years ago
|
||
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #12)
> (In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from
> comment #11)
> > (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #9)
> > > Created attachment 8497455 [details] [diff] [review]
> > > backout_bug_1063066.patch
> > >
> > > Here's the backout patch for bug 1063066.
> >
> > Could you please rebase this patch for v2.1 ?
>
> I just found that our test build (which has reproduced this issue
> originally), does not have any fix from bug 1063066 . Our test team also is
> not using gallery app to reproduce this issue .
>
> I would rather suspect this issue because of closing many fds wrongly (see
> comment 5) .
In this case, could you test _with_ the v2.1 patch from bug 1063066? Bug 1063066 fixes a problem with file descriptors and is unrelated to the Gallery app.
Flags: needinfo?(tkundu)
Assignee | ||
Comment 19•10 years ago
|
||
I found a potential problem with stale file descriptors when the Socket setup gets interrupted. Could you check if the patch makes a difference, maybe together with bug 1063066?
Attachment #8498037 -
Flags: feedback?(tkundu)
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #17)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> > The stack is in bluetooth code, so the bluetooth team needs to look at this.
>
> Right, but I don't know the available logging options for memory leaks. I
> remember you were working on this topic. Can you provide some information or
> pointers to documentation?
We have tools to log the JS and parts of the C++ heap, and tools to find untracked memory. But I don't see any evidence that this is caused by a memory leak. The pointer we're crashing at is close to 0x5a5a5a5a which is the mark for freed memory, so this looks like a use-after-free.
Flags: needinfo?(khuey)
Reporter | ||
Comment 21•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #20)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #17)
> > (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> > > The stack is in bluetooth code, so the bluetooth team needs to look at this.
> >
> > Right, but I don't know the available logging options for memory leaks. I
> > remember you were working on this topic. Can you provide some information or
> > pointers to documentation?
>
> We have tools to log the JS and parts of the C++ heap, and tools to find
> untracked memory. But I don't see any evidence that this is caused by a
> memory leak. The pointer we're crashing at is close to 0x5a5a5a5a which is
> the mark for freed memory, so this looks like a use-after-free.
We are seeing following output in b2g-info when crash happened.
==- 00011713, 2014-09-29 04:46:55, 2014-09-28 23:16:55 UTC, up 56665 -==
| megabytes |
NAME PID PPID CPU(s) NICE USS PSS RSS SWAP VSIZE OOM_ADJ USER
b2g 15283 1 10233.0 0 187.7 193.2 212.3 0.0 479.1 0 root
(Preallocated a 2334 15294 0.4 18 5.8 7.8 19.2 0.0 60.9 1 u0_a2334
(Nuwa) 15294 15283 94.6 0 2.1 4.2 14.0 0.0 54.1 -16 root
Built-in Keyboa 15492 15294 53.2 18 6.7 9.1 21.7 0.0 63.5 10 u0_a15492
Homescreen 15666 15283 173.1 18 17.5 21.8 37.2 0.0 83.7 8 u0_a15666
Communications 27803 15294 4.8 18 8.9 11.6 25.0 0.0 65.7 10 u0_a27803
FM Radio 28061 15294 0.9 1 7.6 10.3 23.3 0.0 68.5 2 u0_a28061
plugin-containe 30892 15283 0.3 1 4.3 6.1 12.5 0.0 53.0 2 u0_a30892
System memory info:
Total 847.5 MB
SwapTotal 0.0 MB
Used - cache 492.6 MB
B2G procs (PSS) 264.0 MB
Non-B2G procs 228.6 MB
Free + cache 354.8 MB
Free 62.0 MB
Cache 292.9 MB
SwapFree 0.0 MB
Low-memory killer parameters:
notify_trigger 14336 KB
oom_adj min_free
0 4096 KB
58 5120 KB
117 6144 KB
352 7168 KB
470 8192 KB
588 20480 KB
b2g process VSize 479 is very big but we still have enough memory in system when crash happened. I am also in touch with our internal team to collect gecko memory report when it happens again !
Flags: needinfo?(tkundu)
Comment 22•10 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #3)
> Eric, looks like this comes from BT code.
Thomas is digging into it.
Flags: needinfo?(echou)
Assignee | ||
Comment 23•10 years ago
|
||
(In reply to Eric Chou [:ericchou] [:echou] from comment #22)
> (In reply to Fabrice Desré [:fabrice] from comment #3)
> > Eric, looks like this comes from BT code.
>
> Thomas is digging into it.
Well, I had a look at the BT socket code and found a possible cause for all these file descriptors mentioned in comment #4, but in general 'lsof' looks normal here. If this bug is instead a reference-counting problem, I don't see why. I'd need a better STR to reproduce it locally. And I'm waiting for Tapas to return with some feedback on the possible fixes in bug 1063066 and the attached patch.
Comment 24•10 years ago
|
||
Tapas, Thomas is waiting for your feedback on this bug
Flags: needinfo?(tkundu)
Reporter | ||
Comment 25•10 years ago
|
||
I asked our test team to provide feedback with these two patches but our test team is in holidays now. So please expect 3 days delay for response from me . Sorry for the delayed response.
Flags: needinfo?(tkundu)
Comment 26•10 years ago
|
||
Thomas is working on this. If I am wrong, please reassign it.
Assignee: nobody → tzimmermann
Assignee | ||
Comment 27•10 years ago
|
||
Well, I'm not actually _actively_ working on the bug. The current status is still as described in comments 23 and 25.
Assignee | ||
Comment 28•10 years ago
|
||
Hi,
Can you provide some more information on the actual test? What was it doing when it crashed? Does the problem only happen after hours of testing?
Flags: needinfo?(tkundu)
Comment 30•10 years ago
|
||
Hi Bhavana, I wonder if you could talk with Tapas to get his update. Thanks.
Flags: needinfo?(bbajaj)
Reporter | ||
Comment 31•10 years ago
|
||
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #25)
> I asked our test team to provide feedback with these two patches but our
> test team is in holidays now. So please expect 3 days delay for response
> from me . Sorry for the delayed response.
I am still waiting for our test team update on this. Sorry for unexpected delays.
Comment 32•10 years ago
|
||
Touchbased with CAF, they do not have the latest report yet but are expected to have one this week.
Flags: needinfo?(bbajaj)
Reporter | ||
Comment 33•10 years ago
|
||
Comment on attachment 8498037 [details] [diff] [review]
Close received socket file descriptors on errors
Review of attachment 8498037 [details] [diff] [review]:
-----------------------------------------------------------------
We tested with fix from both Bug 1074419 attachment 8498037 [details] [diff] [review] & bug 1063066 and we are not seeing this issue anymore .
Thanks for your help :)
Attachment #8498037 -
Flags: feedback?(tkundu) → feedback+
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(tkundu) → needinfo?(tzimmermann)
Assignee | ||
Comment 34•10 years ago
|
||
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #33)
> Comment on attachment 8498037 [details] [diff] [review]
> Close received socket file descriptors on errors
>
> Review of attachment 8498037 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> We tested with fix from both Bug 1074419 attachment 8498037 [details] [diff] [review]
> [review] & bug 1063066 and we are not seeing this issue anymore .
>
> Thanks for your help :)
Great, thank you! :) I'll cleanup the attached patch and land it in the next days.
Flags: needinfo?(tzimmermann)
No longer blocks: CAF-v2.1-FC-metabug
Blocks: CAF-v2.1-CC-metabug
Assignee | ||
Updated•10 years ago
|
Status: NEW → ASSIGNED
Updated•10 years ago
|
Blocks: CAF-v2.1-FC-metabug
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Comment 35•10 years ago
|
||
Crash 370 (issue 370) has not been seen for 21 days. Closing
Reporter | ||
Comment 36•10 years ago
|
||
Sorry for wrong comment 35. Please cleanup existing patch and land a fix for this issue.
Sorry for confusion.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Assignee | ||
Comment 37•10 years ago
|
||
Attachment #8497455 -
Attachment is obsolete: true
Attachment #8498037 -
Attachment is obsolete: true
Attachment #8507753 -
Flags: review?(shuang)
Attachment #8507753 -
Flags: review?(shuang) → review+
Assignee | ||
Comment 38•10 years ago
|
||
Comment on attachment 8507753 [details] [diff] [review]
[01] Bug 1074419: Close received socket file descriptors on errors
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #):
The bug is probably caused by bug 1029387, patches [05][06][07]; where we refactored the Bluetooth socket code.
User impact if declined:
Not cleaning up resources leads to DOS after running Bluetooth for a long time (days). The affected code is for error handling and usually only triggered by corner cases, such as pairing problems or closing Bluetooth before pairings have been established.
Testing completed:
Tested by CAF testers.
Risk to taking this patch (and alternatives if risky):
The patch adds some cleanup code for errors. It shouldn't affect normal operation.
String or UUID changes made by this patch:
None
Attachment #8507753 -
Flags: approval-mozilla-b2g34?
Assignee | ||
Comment 39•10 years ago
|
||
Thanks, Shawn. Tapas, thanks to you and the testing team.
https://hg.mozilla.org/integration/b2g-inbound/rev/735159446aef
https://treeherder.mozilla.org/ui/#/jobs?repo=b2g-inbound&revision=735159446aef
Updated•10 years ago
|
Attachment #8507753 -
Flags: approval-mozilla-b2g34? → approval-mozilla-b2g34+
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.1 S7 (24Oct)
Comment 41•10 years ago
|
||
status-b2g-v2.1:
--- → fixed
status-b2g-v2.2:
--- → fixed
status-firefox34:
--- → wontfix
status-firefox35:
--- → wontfix
status-firefox36:
--- → fixed
Comment 42•10 years ago
|
||
Unable to verify, requires stability test for 24 hours.
QA Whiteboard: [QAnalyst-Triage?][QAnalyst-verify-]
Flags: needinfo?(ktucker)
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?][QAnalyst-verify-] → [QAnalyst-Triage+][QAnalyst-verify-]
Flags: needinfo?(ktucker)
You need to log in
before you can comment on or make changes to this bug.
Description
•