Closed Bug 683063 Opened 13 years ago Closed 13 years ago

Crash opening about:memory in Fennec

Categories

(Core :: IPC, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: cjones)

Details

(Keywords: crash, Whiteboard: [inbound])

Attachments

(2 files, 1 obsolete file)

#0  0x00007fef676a45ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fef676a443c in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#2  0x00007fef64b5281e in ah_crap_handler (signum=11) at /home/njn/moz/mi3/toolkit/xre/nsSigHandlers.cpp:121
#3  0x00007fef64b56202 in nsProfileLock::FatalSignalHandler (signo=11, info=0x7fff7da366b0, context=0x7fff7da36580)
    at /home/njn/moz/mi3/mobile/d64/toolkit/profile/nsProfileLock.cpp:226
#4  <signal handler called>
#5  mozilla::dom::PContentParent::DeallocSubtree (this=0x7fef4dbc2400)
    at /home/njn/moz/mi3/mobile/d64/ipc/ipdl/PContentParent.cpp:2261
#6  0x00007fef65f0485c in mozilla::dom::PContentParent::OnChannelError (this=0x7fef4dbc2400)
    at /home/njn/moz/mi3/mobile/d64/ipc/ipdl/PContentParent.cpp:1858
#7  0x00007fef65eb864d in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x7fef4dbc2410)
    at /home/njn/moz/mi3/ipc/glue/AsyncChannel.cpp:378
#8  0x00007fef65eb8742 in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x7fef4dbc2410)
    at /home/njn/moz/mi3/ipc/glue/AsyncChannel.cpp:343
#9  0x00007fef65e93904 in DispatchToMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)()> (this=<value optimised out>) at /home/njn/moz/mi3/ipc/chromium/src/base/tuple.h:383
#10 RunnableMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)(), Tuple0>::Run (
    this=<value optimised out>) at /home/njn/moz/mi3/ipc/chromium/src/base/task.h:307
#11 0x00007fef66049e2f in MessageLoop::RunTask (this=0x7fef674ee8f0, task=0x7fef4b435c80)
    at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:318
#12 0x00007fef6604baf6 in MessageLoop::DeferOrRunPendingTask (this=<value optimised out>, pending_task=<value optimised out>)
    at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:326
#13 0x00007fef6604bd26 in MessageLoop::DoWork (this=0x7fef674ee8f0) at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:426
#14 0x00007fef65ebd393 in mozilla::ipc::MessagePump::Run (this=0x7fef5ba54f40, aDelegate=0x7fef674ee8f0)
    at /home/njn/moz/mi3/ipc/glue/MessagePump.cpp:114
#15 0x00007fef66049cf6 in MessageLoop::RunInternal (this=0x7fef674ee8f0)
    at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:208
#16 0x00007fef66049d07 in MessageLoop::RunHandler (this=<value optimised out>)
    at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:201
#17 0x00007fef6604a01a in MessageLoop::Run (this=0x7fef674ee8f0) at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:175
#18 0x00007fef65d6ae0c in nsBaseAppShell::Run (this=0x7fef559d58d0)
    at /home/njn/moz/mi3/widget/src/xpwidgets/nsBaseAppShell.cpp:189
#19 0x00007fef65aed3bf in nsAppStartup::Run (this=0x7fef52a091a0)
    at /home/njn/moz/mi3/toolkit/components/startup/nsAppStartup.cpp:224


Doing some more inspection:

#5  mozilla::dom::PContentParent::DeallocSubtree (this=0x7fef4dbc2400)
    at /home/njn/moz/mi3/mobile/d64/ipc/ipdl/PContentParent.cpp:2261
2261	        for (uint32 i = 0; (i) < ((kids).Length()); (++(i))) {
(gdb) p kids
$1 = (
    InfallibleTArray<mozilla::dom::PAudioParent*> &) @0x7fef4dbc26b8: {<nsTArray<mozilla::dom::PAudioParent*, nsTArrayInfallibleAllocator>> = {<nsTArray_base<nsTArrayInfallibleAllocator>> = {
      mHdr = 0x5a5a5a5a5a5a5a5a}, <nsTArray_SafeElementAtHelper<mozilla::dom::PAudioParent*, nsTArray<mozilla::dom::PAudioParent*, nsTArrayInfallibleAllocator> >> = {<No data fields>}, <No data fields>}, <No data fields>}

That 0x5a5a5a5a pattern makes it look like this is a use-after-free error.

This isn't a 100% reliable crash;  I tried a few times and one time the content process crashed but about:memory showed data for the chrome process.
This is on a 64-bit desktop Linux build, BTW.  (I tried to do a 32-bit desktop build but failed due to some problems with freetype headers, unfortunately.)
I have seen this happen on my 64bit Ubuntu builds as well as my Android builds. As Nicholas points out, it seems intermittent.
Attached file Valgrind output
Valgrind finds *lots* of use-after-free errors.  There were some errors before this but they appeared unrelated so I removed them from the attached file.
Component: General → IPC
Product: Fennec → Core
QA Contact: general → ipc
I would just like to point out that this output from valgrind is completely nonsensical:

==31828==  Address 0x1bfba168 is 632 bytes inside a block of size 816 free'd
==31828==    at 0x402A381: free (vg_replace_malloc.c:366)
==31828==    by 0x403FFE1: moz_free (mozalloc.cpp:97)
==31828==    by 0x7761C4B: mozilla::dom::ContentParent::~ContentParent() (mozalloc.h:253)
==31828==    by 0x77627A1: mozilla::dom::ContentParent::Release() (in /home/njn/moz/mi3/mobile/v64/toolkit/library/libxul.so)
==31828==    by 0x66889EC: nsCOMPtr_base::~nsCOMPtr_base() (nsAutoPtr.h:969)
==31828==    by 0x667C96F: nsBoxLayoutState::~nsBoxLayoutState() (nsCSSFrameConstructor.cpp:1463)
==31828==    by 0x776179F: mozilla::dom::ContentParent::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason) (ContentParent.cpp:285)

ActorDestroy -> ~nsBoxLayoutState? Those are two function names I would not expect to see back to back. Nicholas, any idea what Valgrind is trying to say here?
Valgrind just reports what the stack + debug info tells it.  There could be some frames missing due to inlining.
STR here are just to open about:memory?
Assignee: nobody → jones.chris.g
(In reply to Chris Jones [:cjones] [:warhammer] from comment #6)
> STR here are just to open about:memory?

Yes.
This is a pretty serious bug.  It looks like ContentParent was always dying when ActorDestroy() exited, which always caused a bunch of use-after-free errors.  These were probably only happening to lead to crashes when the content process died abnormally, and more stuff happened to be alive under ContentParent.

Anywho, we should take this ASAP; it's very bad.  I'm quite surprised the existing test_process_error hasn't caught crashes from this.
Attachment #556763 - Flags: review?(josh)
What was tickling the problem in part 1 (at least in my testing), is that the memory reporters were querying the windows ID table in the content process, before an nsGlobalWindow has been created.  So the static table returned is null and the memory reporters weren't testing for that.  I'm not a big fan of that initialization scheme, so I made the table getter do implicit init.

This bug isn't all that big of a deal IMHO, but we should definitely fix it.
Attachment #556764 - Flags: review?(bzbarsky)
Also: hm, I'm trying to think of a good way to write a test for that bug.  There's not really a good way in our current setup; we'd have to restart all of firefox or crash all the content processes first.  That may not be worth the effort.
As far as part 2 goes, see bug 681211.  But yes, making that function never return null may be better.
Shame that hasn't landed yet :/.  Which approach do you prefer?  I can land either patch ASAP, your call.
> This bug isn't all that big of a deal IMHO, but we should definitely fix it.

Not having about:memory in Fennec is a bloody big deal, IMO :)
(In reply to Nicholas Nethercote [:njn] from comment #13)
> > This bug isn't all that big of a deal IMHO, but we should definitely fix it.
> 
> Not having about:memory in Fennec is a bloody big deal, IMO :)

The *particular bug* of crashing if an nsGlobalWindow hasn't been created before about:memory is loaded is what I was referring to.  But it's not really worth discussing, let's just fix it.  There are two patches to choose from now.
Severity: normal → critical
Keywords: crash
Comment on attachment 556763 [details] [diff] [review]
part 1: Don't delete ContentParent out from under the cleanup process

My goodness.
Attachment #556763 - Flags: review?(josh) → review+
Comment on attachment 556764 [details] [diff] [review]
part 2: Make sure GetWindowsTable() always returns the table, even if an nsGlobalWindow hasn't been created yet

We're going to do bug 667183 instead.
Attachment #556764 - Attachment is obsolete: true
Attachment #556764 - Flags: review?(bzbarsky)
http://hg.mozilla.org/mozilla-central/rev/86f1809d81d7

resolving based on comment 16 saying part 2 is not going to land.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: