Closed Bug 622992 Opened 14 years ago Closed 13 years ago

Fennec crash in base::CloseSuperfluousFds at startup

Categories

(Core :: IPC, defect)

ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla10
Tracking Status
firefox9 --- fixed
blocking2.0 --- -
fennec + ---

People

(Reporter: scoobidiver, Assigned: gcp)

References

Details

(Keywords: crash, topcrash, Whiteboard: [mobile-crash])

Crash Data

Attachments

(3 files, 1 obsolete file)

It is #2 top crasher in Fennec 4.0b3 for the last week.

Signature	libc.so@0xb538
UUID	f03fb650-fa24-4202-a089-f0e512110104
Time 	2011-01-04 11:59:57.565009
Uptime	8
Install Age	438247 seconds (5.1 days) since version was first installed.
Product	Fennec
Version	4.0b3
Build ID	20101221205132
Branch	1.9
OS	Linux
OS Version	0.0.0 Linux 2.6.32.9 #1 Tue Nov 9 00:50:41 KST 2010 armv7l
CPU	arm
CPU Info	
Crash Reason	SIGSEGV
Crash Address	0x2b7fdc
App Notes 	samsung GT-P1000
samsung/GT-P1000/GT-P1000/GT-P1000:2.2/FROYO/JPJK2:user/release-keys

Frame 	Module 	Signature [Expand] 	Source
0 	libc.so 	libc.so@0xb538 	
1 	libc.so 	libc.so@0xc94c 	
2 	libxul.so 	std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int> >::_M_insert_ 	stl_tree.h:914
3 	libc.so 	libc.so@0xcfe8 	
4 	libc.so 	libc.so@0x13484 	
5 	libxul.so 	base::CloseSuperfluousFds 	ipc/chromium/src/base/process_util_posix.cc:138
6 	libxul.so 	base::LaunchApp 	stl_tree.h:243
7 	libxul.so 	mozilla::ipc::GeckoChildProcessHost::PerformAsyncLaunch 	ipc/glue/GeckoChildProcessHost.cpp:564
8 	libxul.so 	RunnableMethod<mozilla::ipc::GeckoChildProcessHost, bool , Tuple2<std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, base::ProcessArchitecture> >::Run 	ipc/chromium/src/base/tuple.h:400
9 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:344
10 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:354
11 	libxul.so 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:451
12 	libxul.so 	base::MessagePumpLibevent::Run 	ipc/chromium/src/base/message_pump_libevent.cc:310
13 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:220
14 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:512
15 	libxul.so 	base::Thread::ThreadMain 	ipc/chromium/src/base/thread.cc:159
16 	libxul.so 	ThreadFunc 	ipc/chromium/src/base/platform_thread_posix.cc:28
17 	libc.so 	libc.so@0x111b3 	
18 	libc.so 	libc.so@0x10ca3 

More reports at:
http://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=libc.so%400xb538
tracking-fennec: --- → ?
With combined signatures, it is #1 top crasher in Fennec 4.0b3 for the last week.
Summary: Fennec crash [@ libc.so@0xb538 ] → Fennec crash [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ]
Keywords: topcrash
(In reply to comment #1)
> With combined signatures, it is #1 top crasher in Fennec 4.0b3 for the last
> week.

Can you find it in the nightlies? I couldn't when I looked.
> Can you find it in the nightlies? I couldn't when I looked.
You can sort out crash reports by version (click on the arrow on the right of version column). See: bp-9fbea702-7245-4de3-9972-ad3c52101221

According to the low crash daily rate in nightlies, it could be fixed in recent nightlies. But I don't think so because there are only 220 ADU (Active Daily User) in nightlies and 22,000 ones in Beta 3 (ratio of 100).
I added other crash signatures.

More reports at:
http://crash-stats.mozilla.com/query/query?product=Fennec&range_value=4&range_unit=weeks&query_search=signature&query_type=startswith&query=libc.so@0xb&build_id=&process_type=any&hang_type=any&do_query=1
Summary: Fennec crash [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] → Fennec crash [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] [@ libc.so@0xb47c ] [@ libc.so@0xb548 ] [@ libc.so@0xb64c ] [@ libc.so@0xb52c ] [@ libc.so@0xb4ce ] [@ libc.so@0xb42c ] [@ libc.so@0xb66c ][@ libc.so@0xb57c ] [@ libc.so@0xb43c ]
Ahh... my previous search in the nightly crash stats didn't go back far enough. The last crash report from a nightly build with CloseSuperfluousFds in the stack was from Dec 19th. I'm optimistically thinking that this has solved itself.
blocking2.0: --- → ?
tracking-fennec: ? → 2.0+
marking as fixed since we haven't seen a crash since the 12/19 build. Please reopen if you see it again.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
blocking2.0: ? → final+
Resolution: FIXED → WORKSFORME
I saw one in Fennec 4.0b4pre/20110109: bp-c6c92ae0-ce08-40c5-9f7b-60ca82110109
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Is this currently a topcrasher? cjones, can you think of why this might be crashing? It appears to be calling opendir("/self/proc/fd").
> Is this currently a topcrasher?
No. There have been only one crash in Fennec 4.0b4pre for the last week.
blocking2.0: final+ → -
tracking-fennec: 2.0+ → ---
I don't know which part of the backtrace to believe.  Crashing at std::_Rb_tree::_M_insert_ suggests OOM to me, and nothing else really.  Otherwise, no ideas.
4.0b6pre crash:
bp-84f4fa86-3634-4d1c-b8e6-02cb92110302
bp-e4181d9b-031b-4339-a606-152142110305
Summary: Fennec crash [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] [@ libc.so@0xb47c ] [@ libc.so@0xb548 ] [@ libc.so@0xb64c ] [@ libc.so@0xb52c ] [@ libc.so@0xb4ce ] [@ libc.so@0xb42c ] [@ libc.so@0xb66c ][@ libc.so@0xb57c ] [@ libc.so@0xb43c ] → Fennec crash [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] [@ libc.so@0xb47c ] [@ libc.so@0xb548 ] [@ libc.so@0xb64c ] [@ libc.so@0xb52c ] [@ libc.so@0xb4ce ] [@ libc.so@0xb42c ] [@ libc.so@0xb66c ][@ libc.so@0xb57c ] [@ libc.so@0x12c0e ]
(In reply to comment #10)
> I don't know which part of the backtrace to believe.  Crashing at
> std::_Rb_tree::_M_insert_ suggests OOM to me, and nothing else really. 
> Otherwise, no ideas.

Just hit that one - on startup only application running was fennec.
With combined signatures, it is a virtual #13 top crasher in Firefox for mobile over the last 3 days.
Crash Signature: [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] [@ libc.so@0xb47c ] [@ libc.so@0xb548 ] [@ libc.so@0xb64c ] [@ libc.so@0xb52c ] [@ libc.so@0xb4ce ] [@ libc.so@0xb42c ] [@ libc.so@0xb66c ] [@ libc.so@0xb57c ] [@ libc.so@0x12c0e ]
is bug 663494 a duplicate of this bug?
Crash Signature: [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] [@ libc.so@0xb47c ] [@ libc.so@0xb548 ] [@ libc.so@0xb64c ] [@ libc.so@0xb52c ] [@ libc.so@0xb4ce ] [@ libc.so@0xb42c ] [@ libc.so@0xb66c ] [@ libc.so@0xb57c ] [@ libc.so@0x12c0e ] → [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] [@ libc.so@0xb47c ] [@ libc.so@0xb548 ] [@ libc.so@0xb64c ] [@ libc.so@0xb52c ] [@ libc.so@0xb4ce ] [@ libc.so@0xb42c ] [@ libc.so@0xb66c ] [@ libc.so@0xb57c ] [@ libc.so@0x12c0e ]
Summary: Fennec crash [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] [@ libc.so@0xb47c ] [@ libc.so@0xb548 ] [@ libc.so@0xb64c ] [@ libc.so@0xb52c ] [@ libc.so@0xb4ce ] [@ libc.so@0xb42c ] [@ libc.so@0xb66c ][@ libc.so@0xb57c ] [@ libc.so@0x12c0e ] → Fennec crash in base::CloseSuperfluousFds
Crash Signature: [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] [@ libc.so@0xb47c ] [@ libc.so@0xb548 ] [@ libc.so@0xb64c ] [@ libc.so@0xb52c ] [@ libc.so@0xb4ce ] [@ libc.so@0xb42c ] [@ libc.so@0xb66c ] [@ libc.so@0xb57c ] [@ libc.so@0x12c0e ] → ] [@ libc.so@0x12e2e ] [@ libc.so@0xb538 ] [@ libc.so@0xb45c ] [@ libc.so@0xb40c ] [@ libc.so@0xb47c ] [@ libc.so@0xb548 ] [@ libc.so@0xb64c ] [@ libc.so@0xb52c ] [@ libc.so@0xb4ce ] [@ libc.so@0xb42c ] [@ libc.so@0xb66c ] [@ libc.so@0xb57c ] …
With the dlmalloc_stats crash signature, it is #2 top crasher in 6.0 and #1 in 7.0 Beta.
tracking-fennec: --- → ?
Crash Signature: ] [@ libc.so@0x12e2e ] → ] [@ libc.so@0x12e2e ] [@ dlmalloc_stats ]
tracking-fennec: ? → +
Whiteboard: [mobile-crash]
Crash Signature: ] [@ libc.so@0x12e2e ] [@ dlmalloc_stats ] → ] [@ libc.so@0x12e2e ] [@ dlmalloc_stats ] [@ ThreadFunc ]
In 7.0.1, there are only dlmalloc_stats (Samsung devices), ThreadFunc crash signatures.
Crashes happen at startup.
It's probably a dupe of bug 663494 (same stack traces) about energy saving power down.
Summary: Fennec crash in base::CloseSuperfluousFds → Fennec crash in base::CloseSuperfluousFds at startup
http://code.google.com/p/chromium/issues/detail?id=36678

http://git.chromium.org/gitweb/?p=chromium.git;a=blob;f=base/process_util_posix.cc#l334

According to this, the problem is the use of dynamic memory allocation in that function. This is line Chris Jones' comment 10.
Assignee: nobody → gpascutto
Hmm, apparently we already knew about that issue, fixed it, and you reviewed it, in bug 680190.
The crash is still happening in Fennec 9.0, so either it's something else, or the fixes in 680190 aren't enough.

The crash happens in opendir, or thereabouts. That function is both locking and malloc-ing stuff. If I get bug 680190 correctly, the malloc aren't actually a problem. But the locking would be, right? If we are in opendir in a thread at the same time as the LaunchApp is called, we're dead?

http://nv-tegra.nvidia.com/gitweb/?p=android/platform/bionic.git;a=blob;f=libc/unistd/opendir.c
Attachment #565205 - Flags: review?(jones.chris.g)
Attachment #565206 - Flags: review?(jones.chris.g)
This updates our IPC code to be more in sync with the current Chromium code, which avoids dynamic memory allocation (and the opendir) between the fork() and execvp() calls. This should also allow us to remove the prefork/postfork jemalloc hacks.

I'm not positive we're really hitting the problem case (and hence, if this will resolve the bug), but unless anyone has a better idea...

Try run:
https://tbpl.mozilla.org/?tree=Try&rev=cb9a9ce90ddc
Hi Gian-Carlo,

(In reply to Gian-Carlo Pascutto (:gcp) from comment #21)
> Created attachment 565205 [details] [diff] [review] [diff] [details] [review]
> Patch 1. Avoid malloc and locking after fork

Is this just merging in upstream changes?  I can rubber-stamp those, but if there's new code included in this patch, I need to review that.
Comment on attachment 565206 [details] [diff] [review]
Patch 2. Remove the hacks from the previous fix

>diff --git a/ipc/chromium/src/base/process_util_linux.cc b/ipc/chromium/src/base/process_util_linux.cc
>--- a/ipc/chromium/src/base/process_util_linux.cc
>+++ b/ipc/chromium/src/base/process_util_linux.cc

It's OK to remove the hacks from here since the helper function always
fork()+exec()s, but we need to keep the general workaround code in
jemalloc for situations in which we just fork() in a jemalloc-using
process.
Attachment #565206 - Flags: review?(jones.chris.g)
>Is this just merging in upstream changes?  I can rubber-stamp those, but if 
>there's new code included in this patch, I need to review that.

It's merging in upstream changes, but it's not a direct update. I made the patch that is in the chrome bug mentioned earlier on our codebase, and edited it wherever appropriate to avoid bringing in more updates (notably, it was using a LOG define that we don't seem to have yet). If I messed up, it's more likely to be in process_util_* since the other files were mostly direct copypasting.
Comment on attachment 565205 [details] [diff] [review]
Patch 1. Avoid malloc and locking after fork

Alright, rs=me.
Attachment #565205 - Flags: review?(jones.chris.g) → review+
Only remove pre/postfork usage in fixed code path.
Attachment #565206 - Attachment is obsolete: true
Attachment #566217 - Flags: review?(jones.chris.g)
Attachment #566217 - Flags: review?(jones.chris.g) → review+
https://hg.mozilla.org/mozilla-central/rev/fed6a1e02b7d
https://hg.mozilla.org/mozilla-central/rev/de0be83bf880
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Can we get this nominated for being pushed to Aurora and Beta please?
Attachment #565205 - Flags: approval-mozilla-beta?
Attachment #565205 - Flags: approval-mozilla-aurora?
Attachment #566217 - Flags: approval-mozilla-beta?
Attachment #566217 - Flags: approval-mozilla-aurora?
- There are no tests or easy STR for this. It's a race condition that can happen when launching the app. The analysis for the fix here is based on reading the crashing code.
- It's been on m-c for 2 weeks.
- No performance impact. We want it because it seems to fix a topcrasher. If I read Socorro correctly we haven't seen any after the fix landed. Because of that, I think it's low risk.
- No impact to API.
Comment on attachment 565205 [details] [diff] [review]
Patch 1. Avoid malloc and locking after fork

[Triage Comment]
We're denying these patches for Beta because
* This is not a recent regression
* The patch looks non-trivial
* It's too late in the 8 cycle for m-c-only tested code to land
Attachment #565205 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #566217 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #566217 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Comment on attachment 565205 [details] [diff] [review]
Patch 1. Avoid malloc and locking after fork

We'll actually take this on mozilla-aurora (top crasher, importing 3rd party released code) but would like a rollup patch. Please attach one and ask for mozilla-aurora approval. Thanks!
Attachment #565205 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Attachment #571310 - Flags: approval-mozilla-aurora?
Comment on attachment 571310 [details] [diff] [review]
Patch 3. Folded patch of previous 2 patches

[triage comment]

Approved for aurora. Please land today if at all possible.
Attachment #571310 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Crash Signature: ] [@ libc.so@0x12e2e ] [@ dlmalloc_stats ] [@ ThreadFunc ] → ] [@ libc.so@0x12e2e ] [@ dlmalloc_stats ] [@ ThreadFunc ] [@ std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int> >::_M_insert_ ]
Does this crash only affect Mobile?
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #39)
> Does this crash only affect Mobile?

AFAIK, yes, this is Mobile-only, I guess Android-only.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: