Closed
Bug 683063
Opened 13 years ago
Closed 13 years ago
Crash opening about:memory in Fennec
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: n.nethercote, Assigned: cjones)
Details
(Keywords: crash, Whiteboard: [inbound])
Attachments
(2 files, 1 obsolete file)
128.43 KB,
text/plain
|
Details | |
2.35 KB,
patch
|
jdm
:
review+
|
Details | Diff | Splinter Review |
#0 0x00007fef676a45ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82 #1 0x00007fef676a443c in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138 #2 0x00007fef64b5281e in ah_crap_handler (signum=11) at /home/njn/moz/mi3/toolkit/xre/nsSigHandlers.cpp:121 #3 0x00007fef64b56202 in nsProfileLock::FatalSignalHandler (signo=11, info=0x7fff7da366b0, context=0x7fff7da36580) at /home/njn/moz/mi3/mobile/d64/toolkit/profile/nsProfileLock.cpp:226 #4 <signal handler called> #5 mozilla::dom::PContentParent::DeallocSubtree (this=0x7fef4dbc2400) at /home/njn/moz/mi3/mobile/d64/ipc/ipdl/PContentParent.cpp:2261 #6 0x00007fef65f0485c in mozilla::dom::PContentParent::OnChannelError (this=0x7fef4dbc2400) at /home/njn/moz/mi3/mobile/d64/ipc/ipdl/PContentParent.cpp:1858 #7 0x00007fef65eb864d in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x7fef4dbc2410) at /home/njn/moz/mi3/ipc/glue/AsyncChannel.cpp:378 #8 0x00007fef65eb8742 in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x7fef4dbc2410) at /home/njn/moz/mi3/ipc/glue/AsyncChannel.cpp:343 #9 0x00007fef65e93904 in DispatchToMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)()> (this=<value optimised out>) at /home/njn/moz/mi3/ipc/chromium/src/base/tuple.h:383 #10 RunnableMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)(), Tuple0>::Run ( this=<value optimised out>) at /home/njn/moz/mi3/ipc/chromium/src/base/task.h:307 #11 0x00007fef66049e2f in MessageLoop::RunTask (this=0x7fef674ee8f0, task=0x7fef4b435c80) at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:318 #12 0x00007fef6604baf6 in MessageLoop::DeferOrRunPendingTask (this=<value optimised out>, pending_task=<value optimised out>) at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:326 #13 0x00007fef6604bd26 in MessageLoop::DoWork (this=0x7fef674ee8f0) at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:426 #14 0x00007fef65ebd393 in mozilla::ipc::MessagePump::Run (this=0x7fef5ba54f40, aDelegate=0x7fef674ee8f0) at /home/njn/moz/mi3/ipc/glue/MessagePump.cpp:114 #15 0x00007fef66049cf6 in MessageLoop::RunInternal (this=0x7fef674ee8f0) at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:208 #16 0x00007fef66049d07 in MessageLoop::RunHandler (this=<value optimised out>) at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:201 #17 0x00007fef6604a01a in MessageLoop::Run (this=0x7fef674ee8f0) at /home/njn/moz/mi3/ipc/chromium/src/base/message_loop.cc:175 #18 0x00007fef65d6ae0c in nsBaseAppShell::Run (this=0x7fef559d58d0) at /home/njn/moz/mi3/widget/src/xpwidgets/nsBaseAppShell.cpp:189 #19 0x00007fef65aed3bf in nsAppStartup::Run (this=0x7fef52a091a0) at /home/njn/moz/mi3/toolkit/components/startup/nsAppStartup.cpp:224 Doing some more inspection: #5 mozilla::dom::PContentParent::DeallocSubtree (this=0x7fef4dbc2400) at /home/njn/moz/mi3/mobile/d64/ipc/ipdl/PContentParent.cpp:2261 2261 for (uint32 i = 0; (i) < ((kids).Length()); (++(i))) { (gdb) p kids $1 = ( InfallibleTArray<mozilla::dom::PAudioParent*> &) @0x7fef4dbc26b8: {<nsTArray<mozilla::dom::PAudioParent*, nsTArrayInfallibleAllocator>> = {<nsTArray_base<nsTArrayInfallibleAllocator>> = { mHdr = 0x5a5a5a5a5a5a5a5a}, <nsTArray_SafeElementAtHelper<mozilla::dom::PAudioParent*, nsTArray<mozilla::dom::PAudioParent*, nsTArrayInfallibleAllocator> >> = {<No data fields>}, <No data fields>}, <No data fields>} That 0x5a5a5a5a pattern makes it look like this is a use-after-free error. This isn't a 100% reliable crash; I tried a few times and one time the content process crashed but about:memory showed data for the chrome process.
Reporter | ||
Comment 1•13 years ago
|
||
This is on a 64-bit desktop Linux build, BTW. (I tried to do a 32-bit desktop build but failed due to some problems with freetype headers, unfortunately.)
Comment 2•13 years ago
|
||
I have seen this happen on my 64bit Ubuntu builds as well as my Android builds. As Nicholas points out, it seems intermittent.
Reporter | ||
Comment 3•13 years ago
|
||
Valgrind finds *lots* of use-after-free errors. There were some errors before this but they appeared unrelated so I removed them from the attached file.
Updated•13 years ago
|
Component: General → IPC
Product: Fennec → Core
QA Contact: general → ipc
Comment 4•13 years ago
|
||
I would just like to point out that this output from valgrind is completely nonsensical: ==31828== Address 0x1bfba168 is 632 bytes inside a block of size 816 free'd ==31828== at 0x402A381: free (vg_replace_malloc.c:366) ==31828== by 0x403FFE1: moz_free (mozalloc.cpp:97) ==31828== by 0x7761C4B: mozilla::dom::ContentParent::~ContentParent() (mozalloc.h:253) ==31828== by 0x77627A1: mozilla::dom::ContentParent::Release() (in /home/njn/moz/mi3/mobile/v64/toolkit/library/libxul.so) ==31828== by 0x66889EC: nsCOMPtr_base::~nsCOMPtr_base() (nsAutoPtr.h:969) ==31828== by 0x667C96F: nsBoxLayoutState::~nsBoxLayoutState() (nsCSSFrameConstructor.cpp:1463) ==31828== by 0x776179F: mozilla::dom::ContentParent::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason) (ContentParent.cpp:285) ActorDestroy -> ~nsBoxLayoutState? Those are two function names I would not expect to see back to back. Nicholas, any idea what Valgrind is trying to say here?
Reporter | ||
Comment 5•13 years ago
|
||
Valgrind just reports what the stack + debug info tells it. There could be some frames missing due to inlining.
Assignee | ||
Comment 6•13 years ago
|
||
STR here are just to open about:memory?
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → jones.chris.g
Reporter | ||
Comment 7•13 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #6) > STR here are just to open about:memory? Yes.
Assignee | ||
Comment 8•13 years ago
|
||
This is a pretty serious bug. It looks like ContentParent was always dying when ActorDestroy() exited, which always caused a bunch of use-after-free errors. These were probably only happening to lead to crashes when the content process died abnormally, and more stuff happened to be alive under ContentParent. Anywho, we should take this ASAP; it's very bad. I'm quite surprised the existing test_process_error hasn't caught crashes from this.
Attachment #556763 -
Flags: review?(josh)
Assignee | ||
Comment 9•13 years ago
|
||
What was tickling the problem in part 1 (at least in my testing), is that the memory reporters were querying the windows ID table in the content process, before an nsGlobalWindow has been created. So the static table returned is null and the memory reporters weren't testing for that. I'm not a big fan of that initialization scheme, so I made the table getter do implicit init. This bug isn't all that big of a deal IMHO, but we should definitely fix it.
Attachment #556764 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 10•13 years ago
|
||
Also: hm, I'm trying to think of a good way to write a test for that bug. There's not really a good way in our current setup; we'd have to restart all of firefox or crash all the content processes first. That may not be worth the effort.
Comment 11•13 years ago
|
||
As far as part 2 goes, see bug 681211. But yes, making that function never return null may be better.
Assignee | ||
Comment 12•13 years ago
|
||
Shame that hasn't landed yet :/. Which approach do you prefer? I can land either patch ASAP, your call.
Reporter | ||
Comment 13•13 years ago
|
||
> This bug isn't all that big of a deal IMHO, but we should definitely fix it.
Not having about:memory in Fennec is a bloody big deal, IMO :)
Assignee | ||
Comment 14•13 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #13) > > This bug isn't all that big of a deal IMHO, but we should definitely fix it. > > Not having about:memory in Fennec is a bloody big deal, IMO :) The *particular bug* of crashing if an nsGlobalWindow hasn't been created before about:memory is loaded is what I was referring to. But it's not really worth discussing, let's just fix it. There are two patches to choose from now.
Comment 15•13 years ago
|
||
Comment on attachment 556763 [details] [diff] [review] part 1: Don't delete ContentParent out from under the cleanup process My goodness.
Attachment #556763 -
Flags: review?(josh) → review+
Comment 16•13 years ago
|
||
Comment on attachment 556764 [details] [diff] [review] part 2: Make sure GetWindowsTable() always returns the table, even if an nsGlobalWindow hasn't been created yet We're going to do bug 667183 instead.
Attachment #556764 -
Attachment is obsolete: true
Attachment #556764 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 17•13 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/86f1809d81d7
Whiteboard: [inbound]
Comment 18•13 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/86f1809d81d7 resolving based on comment 16 saying part 2 is not going to land.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 19•13 years ago
|
||
Correct, thanks.
You need to log in
before you can comment on or make changes to this bug.
Description
•