Last Comment Bug 622992 - Fennec crash in base::CloseSuperfluousFds at startup
: Fennec crash in base::CloseSuperfluousFds at startup
Status: RESOLVED FIXED
[mobile-crash]
: crash, topcrash
Product: Core
Classification: Components
Component: IPC (show other bugs)
: Trunk
: ARM Android
: -- critical (vote)
: mozilla10
Assigned To: Gian-Carlo Pascutto [:gcp]
:
Mentors:
: 663494 683799 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-04 14:23 PST by Scoobidiver (away)
Modified: 2013-01-23 09:26 PST (History)
13 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
-
+


Attachments
Patch 1. Avoid malloc and locking after fork (19.90 KB, patch)
2011-10-06 06:55 PDT, Gian-Carlo Pascutto [:gcp]
cjones.bugs: review+
christian: approval‑mozilla‑aurora-
akeybl: approval‑mozilla‑beta-
Details | Diff | Splinter Review
Patch 2. Remove the hacks from the previous fix (5.36 KB, patch)
2011-10-06 06:56 PDT, Gian-Carlo Pascutto [:gcp]
no flags Details | Diff | Splinter Review
Patch 2. Remove the hacks from the previous fix (1.56 KB, patch)
2011-10-11 07:37 PDT, Gian-Carlo Pascutto [:gcp]
cjones.bugs: review+
christian: approval‑mozilla‑aurora-
akeybl: approval‑mozilla‑beta-
Details | Diff | Splinter Review
Patch 3. Folded patch of previous 2 patches (19.93 KB, patch)
2011-11-02 04:48 PDT, Gian-Carlo Pascutto [:gcp]
christian: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Scoobidiver (away) 2011-01-04 14:23:17 PST
It is #2 top crasher in Fennec 4.0b3 for the last week.

Signature	libc.so@0xb538
UUID	f03fb650-fa24-4202-a089-f0e512110104
Time 	2011-01-04 11:59:57.565009
Uptime	8
Install Age	438247 seconds (5.1 days) since version was first installed.
Product	Fennec
Version	4.0b3
Build ID	20101221205132
Branch	1.9
OS	Linux
OS Version	0.0.0 Linux 2.6.32.9 #1 Tue Nov 9 00:50:41 KST 2010 armv7l
CPU	arm
CPU Info	
Crash Reason	SIGSEGV
Crash Address	0x2b7fdc
App Notes 	samsung GT-P1000
samsung/GT-P1000/GT-P1000/GT-P1000:2.2/FROYO/JPJK2:user/release-keys

Frame 	Module 	Signature [Expand] 	Source
0 	libc.so 	libc.so@0xb538 	
1 	libc.so 	libc.so@0xc94c 	
2 	libxul.so 	std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int> >::_M_insert_ 	stl_tree.h:914
3 	libc.so 	libc.so@0xcfe8 	
4 	libc.so 	libc.so@0x13484 	
5 	libxul.so 	base::CloseSuperfluousFds 	ipc/chromium/src/base/process_util_posix.cc:138
6 	libxul.so 	base::LaunchApp 	stl_tree.h:243
7 	libxul.so 	mozilla::ipc::GeckoChildProcessHost::PerformAsyncLaunch 	ipc/glue/GeckoChildProcessHost.cpp:564
8 	libxul.so 	RunnableMethod<mozilla::ipc::GeckoChildProcessHost, bool , Tuple2<std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, base::ProcessArchitecture> >::Run 	ipc/chromium/src/base/tuple.h:400
9 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:344
10 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:354
11 	libxul.so 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:451
12 	libxul.so 	base::MessagePumpLibevent::Run 	ipc/chromium/src/base/message_pump_libevent.cc:310
13 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:220
14 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:512
15 	libxul.so 	base::Thread::ThreadMain 	ipc/chromium/src/base/thread.cc:159
16 	libxul.so 	ThreadFunc 	ipc/chromium/src/base/platform_thread_posix.cc:28
17 	libc.so 	libc.so@0x111b3 	
18 	libc.so 	libc.so@0x10ca3 

More reports at:
http://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=libc.so%400xb538
Comment 1 Scoobidiver (away) 2011-01-04 14:30:08 PST
With combined signatures, it is #1 top crasher in Fennec 4.0b3 for the last week.
Comment 2 Brad Lassey [:blassey] (use needinfo?) 2011-01-04 15:31:03 PST
(In reply to comment #1)
> With combined signatures, it is #1 top crasher in Fennec 4.0b3 for the last
> week.

Can you find it in the nightlies? I couldn't when I looked.
Comment 3 Scoobidiver (away) 2011-01-05 01:41:55 PST
> Can you find it in the nightlies? I couldn't when I looked.
You can sort out crash reports by version (click on the arrow on the right of version column). See: bp-9fbea702-7245-4de3-9972-ad3c52101221

According to the low crash daily rate in nightlies, it could be fixed in recent nightlies. But I don't think so because there are only 220 ADU (Active Daily User) in nightlies and 22,000 ones in Beta 3 (ratio of 100).
Comment 5 Brad Lassey [:blassey] (use needinfo?) 2011-01-05 09:23:29 PST
Ahh... my previous search in the nightly crash stats didn't go back far enough. The last crash report from a nightly build with CloseSuperfluousFds in the stack was from Dec 19th. I'm optimistically thinking that this has solved itself.
Comment 6 Brad Lassey [:blassey] (use needinfo?) 2011-01-05 12:31:08 PST
marking as fixed since we haven't seen a crash since the 12/19 build. Please reopen if you see it again.
Comment 7 Scoobidiver (away) 2011-01-14 05:05:38 PST
I saw one in Fennec 4.0b4pre/20110109: bp-c6c92ae0-ce08-40c5-9f7b-60ca82110109
Comment 8 Benjamin Smedberg [:bsmedberg] 2011-01-14 07:27:59 PST
Is this currently a topcrasher? cjones, can you think of why this might be crashing? It appears to be calling opendir("/self/proc/fd").
Comment 9 Scoobidiver (away) 2011-01-14 07:37:35 PST
> Is this currently a topcrasher?
No. There have been only one crash in Fennec 4.0b4pre for the last week.
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-01-14 12:21:39 PST
I don't know which part of the backtrace to believe.  Crashing at std::_Rb_tree::_M_insert_ suggests OOM to me, and nothing else really.  Otherwise, no ideas.
Comment 11 Scoobidiver (away) 2011-03-06 07:30:02 PST
4.0b6pre crash:
bp-84f4fa86-3634-4d1c-b8e6-02cb92110302
bp-e4181d9b-031b-4339-a606-152142110305
Comment 12 Ludovic Hirlimann [:Usul] 2011-03-27 09:07:07 PDT
(In reply to comment #10)
> I don't know which part of the backtrace to believe.  Crashing at
> std::_Rb_tree::_M_insert_ suggests OOM to me, and nothing else really. 
> Otherwise, no ideas.

Just hit that one - on startup only application running was fennec.
Comment 13 Scoobidiver (away) 2011-04-03 05:13:19 PDT
With combined signatures, it is a virtual #13 top crasher in Firefox for mobile over the last 3 days.
Comment 14 Naoki Hirata :nhirata (please use needinfo instead of cc) 2011-08-09 10:03:07 PDT
is bug 663494 a duplicate of this bug?
Comment 15 Scoobidiver (away) 2011-09-17 14:19:38 PDT
With the dlmalloc_stats crash signature, it is #2 top crasher in 6.0 and #1 in 7.0 Beta.
Comment 16 Scoobidiver (away) 2011-10-03 10:35:42 PDT
In 7.0.1, there are only dlmalloc_stats (Samsung devices), ThreadFunc crash signatures.
Crashes happen at startup.
It's probably a dupe of bug 663494 (same stack traces) about energy saving power down.
Comment 17 Gian-Carlo Pascutto [:gcp] 2011-10-05 11:23:22 PDT
http://code.google.com/p/chromium/issues/detail?id=36678

http://git.chromium.org/gitweb/?p=chromium.git;a=blob;f=base/process_util_posix.cc#l334

According to this, the problem is the use of dynamic memory allocation in that function. This is line Chris Jones' comment 10.
Comment 18 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-10-05 13:10:43 PDT
Right-o, good find!
Comment 19 Gian-Carlo Pascutto [:gcp] 2011-10-06 01:57:57 PDT
Hmm, apparently we already knew about that issue, fixed it, and you reviewed it, in bug 680190.
Comment 20 Gian-Carlo Pascutto [:gcp] 2011-10-06 02:36:39 PDT
The crash is still happening in Fennec 9.0, so either it's something else, or the fixes in 680190 aren't enough.

The crash happens in opendir, or thereabouts. That function is both locking and malloc-ing stuff. If I get bug 680190 correctly, the malloc aren't actually a problem. But the locking would be, right? If we are in opendir in a thread at the same time as the LaunchApp is called, we're dead?

http://nv-tegra.nvidia.com/gitweb/?p=android/platform/bionic.git;a=blob;f=libc/unistd/opendir.c
Comment 21 Gian-Carlo Pascutto [:gcp] 2011-10-06 06:55:37 PDT
Created attachment 565205 [details] [diff] [review]
Patch 1. Avoid malloc and locking after fork
Comment 22 Gian-Carlo Pascutto [:gcp] 2011-10-06 06:56:34 PDT
Created attachment 565206 [details] [diff] [review]
Patch 2. Remove the hacks from the previous fix
Comment 23 Gian-Carlo Pascutto [:gcp] 2011-10-06 07:05:52 PDT
This updates our IPC code to be more in sync with the current Chromium code, which avoids dynamic memory allocation (and the opendir) between the fork() and execvp() calls. This should also allow us to remove the prefork/postfork jemalloc hacks.

I'm not positive we're really hitting the problem case (and hence, if this will resolve the bug), but unless anyone has a better idea...

Try run:
https://tbpl.mozilla.org/?tree=Try&rev=cb9a9ce90ddc
Comment 24 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-10-07 11:48:31 PDT
Hi Gian-Carlo,

(In reply to Gian-Carlo Pascutto (:gcp) from comment #21)
> Created attachment 565205 [details] [diff] [review] [diff] [details] [review]
> Patch 1. Avoid malloc and locking after fork

Is this just merging in upstream changes?  I can rubber-stamp those, but if there's new code included in this patch, I need to review that.
Comment 25 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-10-07 11:53:11 PDT
Comment on attachment 565206 [details] [diff] [review]
Patch 2. Remove the hacks from the previous fix

>diff --git a/ipc/chromium/src/base/process_util_linux.cc b/ipc/chromium/src/base/process_util_linux.cc
>--- a/ipc/chromium/src/base/process_util_linux.cc
>+++ b/ipc/chromium/src/base/process_util_linux.cc

It's OK to remove the hacks from here since the helper function always
fork()+exec()s, but we need to keep the general workaround code in
jemalloc for situations in which we just fork() in a jemalloc-using
process.
Comment 26 Gian-Carlo Pascutto [:gcp] 2011-10-09 04:13:20 PDT
>Is this just merging in upstream changes?  I can rubber-stamp those, but if 
>there's new code included in this patch, I need to review that.

It's merging in upstream changes, but it's not a direct update. I made the patch that is in the chrome bug mentioned earlier on our codebase, and edited it wherever appropriate to avoid bringing in more updates (notably, it was using a LOG define that we don't seem to have yet). If I messed up, it's more likely to be in process_util_* since the other files were mostly direct copypasting.
Comment 27 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-10-10 15:30:41 PDT
Comment on attachment 565205 [details] [diff] [review]
Patch 1. Avoid malloc and locking after fork

Alright, rs=me.
Comment 28 Gian-Carlo Pascutto [:gcp] 2011-10-11 07:37:14 PDT
Created attachment 566217 [details] [diff] [review]
Patch 2. Remove the hacks from the previous fix

Only remove pre/postfork usage in fixed code path.
Comment 31 Gian-Carlo Pascutto [:gcp] 2011-10-28 07:26:05 PDT
*** Bug 663494 has been marked as a duplicate of this bug. ***
Comment 32 Naoki Hirata :nhirata (please use needinfo instead of cc) 2011-10-28 10:15:44 PDT
Can we get this nominated for being pushed to Aurora and Beta please?
Comment 33 Gian-Carlo Pascutto [:gcp] 2011-10-28 12:41:18 PDT
- There are no tests or easy STR for this. It's a race condition that can happen when launching the app. The analysis for the fix here is based on reading the crashing code.
- It's been on m-c for 2 weeks.
- No performance impact. We want it because it seems to fix a topcrasher. If I read Socorro correctly we haven't seen any after the fix landed. Because of that, I think it's low risk.
- No impact to API.
Comment 34 Alex Keybl [:akeybl] 2011-10-31 13:41:43 PDT
Comment on attachment 565205 [details] [diff] [review]
Patch 1. Avoid malloc and locking after fork

[Triage Comment]
We're denying these patches for Beta because
* This is not a recent regression
* The patch looks non-trivial
* It's too late in the 8 cycle for m-c-only tested code to land
Comment 35 christian 2011-11-01 14:28:40 PDT
Comment on attachment 565205 [details] [diff] [review]
Patch 1. Avoid malloc and locking after fork

We'll actually take this on mozilla-aurora (top crasher, importing 3rd party released code) but would like a rollup patch. Please attach one and ask for mozilla-aurora approval. Thanks!
Comment 36 Gian-Carlo Pascutto [:gcp] 2011-11-02 04:48:03 PDT
Created attachment 571310 [details] [diff] [review]
Patch 3. Folded patch of previous 2 patches
Comment 37 christian 2011-11-07 13:21:57 PST
Comment on attachment 571310 [details] [diff] [review]
Patch 3. Folded patch of previous 2 patches

[triage comment]

Approved for aurora. Please land today if at all possible.
Comment 38 Gian-Carlo Pascutto [:gcp] 2011-11-07 23:12:38 PST
http://hg.mozilla.org/releases/mozilla-aurora/rev/4c79fc728cc3
Comment 39 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-07 13:01:52 PST
Does this crash only affect Mobile?
Comment 40 Robert Kaiser 2011-12-07 13:11:38 PST
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #39)
> Does this crash only affect Mobile?

AFAIK, yes, this is Mobile-only, I guess Android-only.
Comment 41 Jim Chen [:jchen] [:darchons] 2013-01-23 09:26:43 PST
*** Bug 683799 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.