Closed
Bug 1044322
Opened 11 years ago
Closed 11 years ago
b2g crashes during closing IPC channel
Categories
(Core :: IPC, defect, P1)
Tracking
()
People
(Reporter: tkundu, Assigned: bent.mozilla)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 291][caf priority: p1][CR 696984])
Crash Data
Attachments
(2 files, 2 obsolete files)
6.06 KB,
text/plain
|
Details | |
3.98 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
STR :
1) Run stability test for call,sms,video,music,camera,camcorder.
2) b2g crashes at some point.
Logs https://drive.google.com/file/d/0B1cSMS8_GuAEUUtESW9fN0wxQ0k/edit?usp=sharing
Video app (pid 26788) has closed IPC channel and we are seeing following logs in logcat just before crash:
07-23 23:46:51.544 I/Gecko (26788): ###!!! [Child][MessageChannel] Error: Channel closing: too late to send/recv, messages will be lost
and b2g crashes at [1] while cleaning IPC request.
[1] http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/chrome/common/ipc_channel_posix.cc#991
Crash timestamps is : 07-23 23:46:51.544
We are also seeing 49 FDs are opened with "/data/local/webapps/marketplace.firefox.com/application.zip" just before crash has happened (check lsof logs for crash timestamp)
Reporter | ||
Comment 1•11 years ago
|
||
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.0?
Updated•11 years ago
|
Whiteboard: [CR 696984] → [caf priority: p1][CR 696984]
Updated•11 years ago
|
Whiteboard: [caf priority: p1][CR 696984] → [b2g-crash][caf-crash 291][caf priority: p1][CR 696984]
Comment 2•11 years ago
|
||
Observed on:
Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.040
Moz BuildID: 20140716000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=5f8b1b8a2da9e3b531eee817a669f57fa4d9b9c6
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=e00f7e464333689fcf54edb4945ece94f97f930b
Updated•11 years ago
|
Component: Gaia::Video → IPC
Product: Firefox OS → Core
Version: unspecified → 32 Branch
Comment 3•11 years ago
|
||
This is very probably fixed by this part of the commit for bug 1013571:
https://hg.mozilla.org/mozilla-central/diff/fc4d6b5ee8ce/ipc/glue/MessageLink.cpp
The ProcessLink destructor was very racy and I had to fix it for bug 1013571. Unfortunately its not in a separate patch that can be uplifted with a simple merge.
Tapas, can you try manually applying the changes from the linked diff to see if it helps?
Flags: needinfo?(tkundu)
Comment 4•11 years ago
|
||
Actually, here is the partial uplift patch. Not sure on the rules for reviewing this sort of thing, so flagging for review just to be on the safe side.
Attachment #8463389 -
Flags: review?(bent.mozilla)
Comment 5•11 years ago
|
||
Comment on attachment 8463389 [details] [diff] [review]
Partial uplift of bug 1013571 (fc4d6b5ee8ce) to fix ProcessLink destructor race.
And.... I'm an idiot. I see now this patch is for ThreadLink, not ProcessLink. Sorry for the noise! *Goes to drink more coffee*
Attachment #8463389 -
Attachment is obsolete: true
Attachment #8463389 -
Flags: review?(bent.mozilla)
Flags: needinfo?(tkundu)
Comment 6•11 years ago
|
||
It would be interesting to know if we are hitting the delay path in OnNotifyMaybeChannelError:
http://dxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.cpp#1588
This seems a little scary to me as the delayed task doesn't check to see if the MessageChannel is still alive before running. It seems perhaps this delayed task could be queued up, the MessageChannel deleted, and then the queued task run on a dead object.
Comment 7•11 years ago
|
||
NI, can you confirm you the right assignee here, didn't want to get a wrong assignee ;) ?
blocking-b2g: 2.0? → 2.0+
Flags: needinfo?(bkelly)
Comment 8•11 years ago
|
||
Bhavana, I'll try to help, but I don't have the time available to take point on this. I'm currently booked to work another effort with an aggressive deadline.
So I'd recommend finding a different person to assign this to. Sorry.
Flags: needinfo?(bkelly)
Comment 9•11 years ago
|
||
Anyone who works this, though, will need some way to reproduce it. Tapas, can you provide information on how to run the test referenced in comment 0 to get this crash to occur?
Flags: needinfo?(tkundu)
Reporter | ||
Comment 10•11 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #9)
> Anyone who works this, though, will need some way to reproduce it. Tapas,
> can you provide information on how to run the test referenced in comment 0
> to get this crash to occur?
I am sorry to tell you that it will be very difficult to reproduce this crash as we are running stability test for 24 hours. But if you can enable some logs from IPC layer (You can give us patch) then we can run stability test with that log and provide you logs.
Flags: needinfo?(tkundu)
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(bkelly)
Comment 11•11 years ago
|
||
reached out to overholt offline, he is trying to help get an assignee here given this is a critical p1 blocker for MTBF so passing this onto him until then.Thanks Andrew!
Assignee: nobody → overholt
Comment 12•11 years ago
|
||
Andrea spent some time on this today but is having a hard time getting started given the lack of a nice STR. He's going to sync with Ben Turner and figure something out.
Assignee: overholt → amarchesini
Assignee | ||
Comment 14•11 years ago
|
||
Ok, I know what's happening here. Patch in a bit.
Assignee: amarchesini → bent.mozilla
Assignee | ||
Comment 15•11 years ago
|
||
Let's see how this works:
https://tbpl.mozilla.org/?tree=Try&rev=7603be0ae86c
Assignee | ||
Comment 16•11 years ago
|
||
https://hg.mozilla.org/try/rev/7603be0ae86c is the changeset
Assignee | ||
Comment 17•11 years ago
|
||
Tapas, can you give the patch in comment 16 a try?
Flags: needinfo?(tkundu)
Assignee | ||
Comment 18•11 years ago
|
||
(Well specifically https://hg.mozilla.org/try/raw-diff/7603be0ae86c/ipc/glue/MessageLink.cpp)
Reporter | ||
Comment 19•11 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #17)
> Tapas, can you give the patch in comment 16 a try?
Sure . We asked out internal team to verify it. Thanks a lot for your help
Flags: needinfo?(tkundu)
Assignee | ||
Comment 20•11 years ago
|
||
While we're waiting for verification let's get this reviewed too.
Attachment #8467867 -
Flags: review?(benjamin)
Comment 21•11 years ago
|
||
Comment on attachment 8467867 [details] [diff] [review]
Clear channel listener on correct thread, v1
I could use a little more documentation here: in what situations would the transport for a processlink have a previous listener at all? Can we document the listener behavior in a doccomment on Open()?
Attachment #8467867 -
Flags: review?(benjamin) → review-
Comment 22•11 years ago
|
||
:bent this needs to get into CAF's build tonight before they kick off stability testing so can we get a patch up addressing comment #21 asap and a patch up that can be applied to 2.0 so CAF can pick it up?
Flags: needinfo?(bent.mozilla)
Comment 23•11 years ago
|
||
:bent/bhanvana -- if the changes in the patch are just documentation related and there are no functional update then we can continue to try the previous patch. Please let us know if that's the case.
Assignee | ||
Comment 24•11 years ago
|
||
I don't believe there are any issues here besides additional documentation. The patch should be fine for testing.
Flags: needinfo?(bent.mozilla)
Assignee | ||
Comment 25•11 years ago
|
||
Attachment #8467867 -
Attachment is obsolete: true
Attachment #8469598 -
Flags: review?(benjamin)
Updated•11 years ago
|
Attachment #8469598 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 26•11 years ago
|
||
Comment 27•11 years ago
|
||
Comment 28•11 years ago
|
||
Turns out bug 1035454 (landed at the same time) was causing the leaks. Re-landed.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5832624a0c03
Comment 29•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Comment 30•11 years ago
|
||
status-b2g-v2.0:
--- → fixed
status-b2g-v2.1:
--- → fixed
status-firefox32:
--- → wontfix
status-firefox33:
--- → wontfix
status-firefox34:
--- → fixed
Assignee | ||
Comment 31•11 years ago
|
||
Tapas, any updates here? Did this fix the problem?
Flags: needinfo?(tkundu)
Reporter | ||
Comment 32•11 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #31)
> Tapas, any updates here? Did this fix the problem?
I think that your patch has fixed issue as we are not able to reproduce this issue anymore in last 20 days. Thanks for your work :)
Flags: needinfo?(tkundu)
Updated•11 years ago
|
Flags: in-moztrap?(ychung)
Comment 33•11 years ago
|
||
No STR is present to create test case to address bug.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Updated•11 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(ychung)
Flags: in-moztrap-
You need to log in
before you can comment on or make changes to this bug.
Description
•