Closed Bug 837187 Opened 11 years ago Closed 11 years ago

b2g process using ~80MB RSS

Categories

(Core :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
b2g18 + fixed

People

(Reporter: cjones, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(4 files)

Attached file Memory dump
The usage was up to 82MB but then dropped when I tried dumping about:memory several times.

Unfortunately, the STR are "two days of heavy usage of phone stuff".  Email, SMS, calls, browser, maps.

The memory dump is attached, but it's the standard "large amounts of heap unclassified".

I noticed this state last night when the phone experience degraded, could only run 3-4 apps simultaneously, or 1-2 if one was fat app like email.  This is a liveable condition but a severely degraded UX.

How about we try applying some more advanced technology here.  gwagner, is your monkey setup in a state we could use it to monitor memory in a test run?
FTR, it's

> ├──29.61 MB (48.02%) ── heap-unclassified

If I had to guess, I'd say this is likely one of our workers using memory.  We were denied approval to land the worker memory reporters on b2g18 (bug 813867 comment 56).

Perhaps that patch hasn't rotted and you can build with it.  Otherwise you'll need to build with m-c to get that patch.

I'd also build with DMD enabled.  |export MOZ_DMD=1| in your .userconfig.  When you run ./get_about_memory.py, it should pull the DMD reports and so on.

We need both DMD and the workers patch -- if it's a worker that's allocated this memory, DMD alone won't be of much help.
Whiteboard: [MemShrink]
 
> We were denied approval to land the worker memory reporters on b2g18 (bug
> 813867 comment 56).

Oh, I missed that.  We really need that patch to understand memory consumption properly.
This would be a perfect monkey/orangutan test when combined with memory monitoring.  Generally speaking, in v1, the USS of the b2g process corresponds to
 - below 40MB: doing great
 - 40-50MB: within normal operating range
 - 50-60MB: using too much memory
 - 60-70MB: starting to have noticeable effect on UX
 - 70-80MB: device borderline usable
 - 80MB+: fuggedaboutit

If our monkey can scream and poop out memory reports when we get into bad ranges, that'd be awesome.
(The report here falls at the upper end of the "starting to have noticeable effect on UX" category.)
> If our monkey can scream and poop out memory reports when we get into bad ranges, that'd 
> be awesome.

Indeed, but at the risk of repeating myself, is it possible to run the monkey with bug 813867 applied and DMD enabled?  Otherwise we are likely to have a hard time interpreting the results, and the first step in taking any action will likely be reproducing the results with bug 813867 and DMD.
Of course.  The world is your banana.
Do we have memory reporters for objects related to tcpsocket and/or network cache?  Anecdotally, this seems to happen with lots of Email usage; maybe tcpsocket?
> Do we have memory reporters for objects related to tcpsocket and/or network cache? 

Yes to network cache.  I'd guess no for TCP sockets.  Those would probably show up clearly with DMD.
We added memory reporters for disk/memory cache in bug 669117 (you have to click "more verbose" on the about:memory page to see them--search for 'memory-cache' or 'disk-cache' if the page is crowded).   

I'm not aware of any memory reporting for TCP sockets (though I wouldn't expect them to take up much memory: just for the network buffer.  Maybe honza or jdm know how big that can get).
Since the excessive memory usage is going to Heap Unclassified it must not be network cache.

I'm wondering if there might be cases where we leak resources related to tcpsocket, like maybe if a client is killed while data is still being delivered.  Or something even simpler like a small permaleak that accumulates.
Ok, I found a problem with e10s TCPSocket. We append streams to the multiplex stream, but _ensureCopying never runs in the child, so the streams are never removed, hence memory usage will continue to grow as we send more data.
Actually scratch that, we never append to it in the child.
There's no need for speculation;  DMD is designed for exactly this purpose.  https://wiki.mozilla.org/Performance/MemShrink/DMD has usage instructions.
FWIW DMD with the default flags is not perceivably slower for me, and the memory increase (a few mb per process) should not interfere with day-to-day use.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #3)
> This would be a perfect monkey/orangutan test when combined with memory
> monitoring.  Generally speaking, in v1, the USS of the b2g process
> corresponds to
>  - below 40MB: doing great
>  - 40-50MB: within normal operating range
>  - 50-60MB: using too much memory
>  - 60-70MB: starting to have noticeable effect on UX
>  - 70-80MB: device borderline usable
>  - 80MB+: fuggedaboutit
> 
> If our monkey can scream and poop out memory reports when we get into bad
> ranges, that'd be awesome.

My monkey usually crashes with bug 832385 after 2 hours :(
Whiteboard: [MemShrink] → [MemShrink:P1]
(we are seeing this in some stability test runs too)
blocking-b2g: --- → shira?
Are there any flavors of test runs that are reproducing this more frequently?  We're still kind of reaching for steps to tickle this still ...
Michael and Pooja are looking at this here, adding them to cc:.   We also see the communications app getting bloated (and ultimately killed).   Seems mostly "heap unclassified" right now.  The test scenario is just making repeated MT calls to the DUT.
> We also
> see the communications app getting bloated (and ultimately killed).   Seems
> mostly "heap unclassified" right now.  The test scenario is just making
> repeated MT calls to the DUT.

We have the tools to understand memory consumption in these cases.  We just need to use them.

Please apply the patch in bug 813867, which adds measurement for workers.  It recently landed on mozilla-b2g18, if that's any help.

If heap-unclassified is still high after that, please run with DMD enabled to get further insight -- see https://wiki.mozilla.org/Performance/MemShrink/DMD for instructions on how to use it.
to TEF?
shira is to leverage tef as much as possible with limited additional features only (therefore shira+ only blocks new feature related work/bugs and the rest is to leverage tef).
blocking-b2g: shira? → tef?
(In reply to Joe Cheng [:jcheng] from comment #20)
> to TEF?
> shira is to leverage tef as much as possible with limited additional
> features only (therefore shira+ only blocks new feature related work/bugs
> and the rest is to leverage tef).

This is new policy that I'm not sure all are aware of.  How has this been communicated and to whom?  

That being said, tef is looking more and more like a dead branch every day so I'm not worried about this change landing in tef.  Only reason I shira? instead of tef? was for this reason.
(In reply to Michael Vines [:m1] [:evilmachines] from comment #21)
> (In reply to Joe Cheng [:jcheng] from comment #20)
> > to TEF?
> > shira is to leverage tef as much as possible with limited additional
> > features only (therefore shira+ only blocks new feature related work/bugs
> > and the rest is to leverage tef).
> 
> This is new policy that I'm not sure all are aware of.  How has this been
> communicated and to whom?  
> 
> That being said, tef is looking more and more like a dead branch every day
> so I'm not worried about this change landing in tef.  Only reason I shira?
> instead of tef? was for this reason.

tef+ is a group of work promised to TEF (a v1.0 product), shira+ to another partner (v1.0.1 features/localizations). Both sets of bugs are now landing to v1.0.1. This bug does not have concrete next steps. If you still feel strongly it needs to block the release, this bug may have schedule impact (please re-nominate as tef?).
blocking-b2g: tef? → -
tracking-b2g18: --- → +
Although we don't have concrete STR, we do have concrete next steps here: Use a DMD build which includes bug 813867 (which has now landed on b2g18) for a few days.
Flags: needinfo?(vasanth)
Flags: needinfo?(poojas)
HI,
We ran the MT call on device (for 2 days)with DMD enabled and it has been observed that Memory usage of b2g process reached upto ~92MB RSS and after this communication app get killed and was never able to come back.
heap unclassified for main process reached upto 25.57 MB 
Attachment : Memory and DMD reports when the communication app was last time alive and was killed in next report.

Please let us know for any further information needed.
Flags: needinfo?(poojas)
Hi PoojaS, wow, great news! :)  Unfortunately the attachment didn't seem to make it here.  Can you please attach or email to one of us?

Thanks!
Hi Jones,
I tried to upload it but beacause of its large size its not possible to attach.
Michael Vines or Vasanth are trying to share it with you .
If you compress it (e.g. with gzip) that should make it small enough.
HI Jones,

Here is the link of the Memory Report: https://www.dropbox.com/s/whzvgyfyls9y7lc/about-memory-100.tar

Thanks,
Pooja
There's a lot of good stuff in here (thanks!), but have a gander at this first DMD stack.

> Unreported: ~1,178 blocks in stack trace record 1 of 812
>  ~4,821,554 bytes (~4,821,554 requested / ~0 slop)
>  9.58% of the heap (9.58% cumulative);  16.09% of unreported (16.09% cumulative)
>  Allocated at
>    replace_realloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/replace/dmd/DMD.cpp:1134 (0x400b4038 libdmd.so+0x3038)
>    realloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/replace_malloc.c:192 (0x401971d0 libmozglue.so+0x41d0)
>    moz_realloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/mozalloc/mozalloc.cpp:97 (0x40df8d62 libxul.so+0xc49d62)
>    Pickle::Resize(unsigned int) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/pickle.cc:531 (0x40c49082 libxul.so+0xaa3082)
>    Pickle /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/pickle.cc:60 (0x40c49270 libxul.so+0xaa3270)
>    Message /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/chrome/common/ipc_message.cc:51 (0x40c54d9e libxul.so+0xaaed9e)
>    std::deque<IPC::Message, std::allocator<IPC::Message> >::push_back(IPC::Message const&) /local/mnt/workspace/B2G_build/b2g_1.008/ndk/sources/cxx-stl/stlport/stlport/stl/_deque.h:674 (0x40b1f5c0 libxul.so+0x9795c0)
>    std::stack<IPC::Message, std::deque<IPC::Message, std::allocator<IPC::Message> > >::size() const /local/mnt/workspace/B2G_build/b2g_1.008/ndk/sources/cxx-stl/stlport/stlport/stl/_stack.h:83 (0x40a8f73c libxul.so+0x97973c)
>    mozilla::Mutex::Unlock() /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/mozilla/Mutex.h:83 (0x40a8a8da libxul.so+0x9748da)
>    IPC::Channel::ChannelImpl::ProcessIncomingMessages() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:578 (0x40bcea9a libxul.so+0xab8a9a)
>    IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:762 (0x40c67b90 libxul.so+0xab8b90)
>    base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_pump_libevent.cc:215 (0x40c61668 libxul.so+0xab2668)
>    event_process_active /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/third_party/libevent/event.c:386 (0x40c45d74 libxul.so+0xa96d74)
>    base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_pump_libevent.cc:339 (0x40c61766 libxul.so+0xab2766)
>    MessageLoop::RunInternal() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_loop.cc:216 (0x40c4f7f0 libxul.so+0xaa07f0)
>    ~AutoRunState /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_loop.cc:502 (0x40c4f8a6 libxul.so+0xaa08a6)
>    base::Thread::ThreadMain() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/thread.cc:159 (0x40c57bb4 libxul.so+0xaa8bb4)
>    ThreadFunc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/platform_thread_posix.cc:41 (0x40c61b88 libxul.so+0xab2b88)
>    __thread_entry /local/mnt/workspace/B2G_build/b2g_1.008/bionic/libc/bionic/pthread.c:218 (0x4002f114 libc.so+0x13114)
>    pthread_create /local/mnt/workspace/B2G_build/b2g_1.008/bionic/libc/bionic/pthread.c:362 (0x4002ec68 libc.so+0x12c68)
> 
> Unreported: ~551 blocks in stack trace record 2 of 812
>  ~2,255,243 bytes (~2,255,243 requested / ~0 slop)
>  4.48% of the heap (14.06% cumulative);  7.52% of unreported (23.61% cumulative)
>  Allocated at
>    malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/replace_malloc.c:152 (0x401972ae libmozglue.so+0x42ae)
>    moz_xmalloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/mozalloc/mozalloc.cpp:55 (0x410a603e libxul.so+0xef703e)
>    mozilla::layers::Layer::SetAnimations(InfallibleTArray<mozilla::layers::Animation> const&) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/gfx/layers/Layers.cpp:483 (0x41209766 libxul.so+0xae8766)
>    mozilla::layers::ShadowLayersParent::RecvUpdate(InfallibleTArray<mozilla::layers::Edit> const&, mozilla::layers::TargetConfig const&, bool const&, InfallibleTArray<mozilla::layers::EditReply>*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:260 (0x41221d94 libxul.so+0xb00d94)
>    mozilla::layers::PLayersParent::OnMessageReceived(IPC::Message const&, IPC::Message*&) /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/ipc/ipdl/PLayersParent.cpp:509 (0x410d7116 libxul.so+0x9b6116)
>    mozilla::layers::PCompositorParent::OnMessageReceived(IPC::Message const&, IPC::Message*&) /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/ipc/ipdl/PCompositorParent.cpp:393 (0x410cf496 libxul.so+0x9ae496)
>    mozilla::ipc::SyncChannel::OnDispatchMessage(IPC::Message const&) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/glue/SyncChannel.cpp:147 (0x4109c28c libxul.so+0x97b28c)
>    mozilla::ipc::RPCChannel::OnMaybeDequeueOne() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/glue/RPCChannel.cpp:400 (0x4109ad8c libxul.so+0x979d8c)
>    RunnableMethod<mozilla::layers::ImageContainerChild, void (mozilla::layers::ImageContainerChild::*)(), Tuple0>::Run() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/task.h:308 (0x40b0af12 libxul.so+0x95bf12)
>    mozilla::ipc::RPCChannel::DequeueTask::Run() /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/mozilla/ipc/RPCChannel.h:448 (0x40b2774c libxul.so+0x97874c)
>    MessageLoop::RunTask(Task*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_loop.cc:334 (0x40c4f840 libxul.so+0xaa0840)
>    MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_loop.cc:344 (0x40c50672 libxul.so+0xaa1672)
>    MessageLoop::DoWork() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_loop.cc:441 (0x40c51250 libxul.so+0xaa2250)
>    base::MessagePumpDefault::Run(base::MessagePump::Delegate*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_pump_default.cc:24 (0x411c34d0 libxul.so+0xaa24d0)
>    MessageLoop::RunInternal() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_loop.cc:216 (0x40c4f7f0 libxul.so+0xaa07f0)
>    ~AutoRunState /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_loop.cc:502 (0x40c4f8a6 libxul.so+0xaa08a6)
>    base::Thread::ThreadMain() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/thread.cc:159 (0x40c57bb4 libxul.so+0xaa8bb4)
>    ThreadFunc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/platform_thread_posix.cc:41 (0x40c61b88 libxul.so+0xab2b88)
>    __thread_entry /local/mnt/workspace/B2G_build/b2g_1.008/bionic/libc/bionic/pthread.c:218 (0x4002f114 libc.so+0x13114)
>    pthread_create /local/mnt/workspace/B2G_build/b2g_1.008/bionic/libc/bionic/pthread.c:362 (0x4002ec68 libc.so+0x12c68)
> 
> Unreported: 20 blocks in stack trace record 3 of 812
>  1,310,720 bytes (1,280,160 requested / 30,560 slop)
>  2.60% of the heap (16.67% cumulative);  4.37% of unreported (27.98% cumulative)
>  Allocated at
>    malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/replace_malloc.c:152 (0x401972ae libmozglue.so+0x42ae)
>    sqlite3MemMalloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:15436 (0x406f07ec libmozsqlite3.so+0x257ec)
>    mallocWithAlarm /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:18734 (0x406d0420 libmozsqlite3.so+0x5420)
>    sqlite3Malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:18768 (0x406d04a4 libmozsqlite3.so+0x54a4)
>    setupLookaside /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:112131 (0x406d5282 libmozsqlite3.so+0xa282)
>    openDatabase /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:114032 (0x406eb61c libmozsqlite3.so+0x2061c)
>    sqlite3_open_v2 /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:114069 (0x406eb78a libmozsqlite3.so+0x2078a)
>    mozilla::storage::Connection::initialize(nsIFile*, char const*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/storage/src/mozStorageConnection.cpp:491 (0x40fc4252 libxul.so+0x8a3252)
>    mozilla::storage::Service::OpenDatabaseWithVFS(nsIFile*, nsACString_internal const&, mozIStorageConnection**) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/storage/src/mozStorageService.cpp:906 (0x40fc115c libxul.so+0x8a015c)
>    mozilla::dom::indexedDB::IDBFactory::GetConnection(nsAString_internal const&) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/dom/indexedDB/IDBFactory.cpp:279 (0x40d794b0 libxul.so+0x6584b0)
>    nsCOMPtr_base /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/nsCOMPtr.h:399 (0x40d85e38 libxul.so+0x664e38)
>    nsCOMPtr<mozIStorageConnection>::get() const /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/nsCOMPtr.h:762 (0x40d73542 libxul.so+0x652542)
>    nsCOMPtr<nsIRunnable>::operator=(nsIRunnable*) /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/nsCOMPtr.h:622 (0x40d8f85c libxul.so+0x66e85c)
>    ~nsCOMPtr /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/nsCOMPtr.h:451 (0x411a03da libxul.so+0xa7f3da)
>    nsThread::ProcessNextEvent(bool, bool*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/xpcom/threads/nsThread.cpp:620 (0x40c2d1f2 libxul.so+0xa7e1f2)
>    NS_ProcessNextEvent_P(nsIThread*, bool) /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/nsError.h:1069 (0x40c0d5fe libxul.so+0xa5e5fe)
>    nsThread::ThreadFunc(void*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/xpcom/threads/nsThread.h:63 (0x40b9463c libxul.so+0xa7e63c)
>    _pt_root /local/mnt/workspace/B2G_build/b2g_1.008/gecko/nsprpub/pr/src/pthreads/ptthread.c:159 (0x40137a28 libnspr4.so+0x19a28)
>    __thread_entry /local/mnt/workspace/B2G_build/b2g_1.008/bionic/libc/bionic/pthread.c:218 (0x4002f114 libc.so+0x13114)
>    pthread_create /local/mnt/workspace/B2G_build/b2g_1.008/bionic/libc/bionic/pthread.c:362 (0x4002ec68 libc.so+0x12c68)
> 
> Unreported: 2 blocks in stack trace record 4 of 812
>  1,228,800 bytes (1,228,800 requested / 0 slop)
>  2.44% of the heap (19.11% cumulative);  4.10% of unreported (32.08% cumulative)
>  Allocated at
>    malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/replace_malloc.c:152 (0x401972ae libmozglue.so+0x42ae)
>    moz_malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/mozalloc/mozalloc.cpp:65 (0x40df8d6a libxul.so+0xc49d6a)
>    gfxImageSurface /local/mnt/workspace/B2G_build/b2g_1.008/gecko/gfx/thebes/gfxImageSurface.cpp:111 (0x40bdb062 libxul.so+0xac5062)
>    nsRefPtr<gfxASurface>::assign_with_AddRef(gfxASurface*) /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/nsAutoPtr.h:844 (0x40c894cc libxul.so+0xada4cc)
>    gfxPlatform::CreateDrawTargetForBackend(mozilla::gfx::BackendType, mozilla::gfx::IntSize const&, mozilla::gfx::SurfaceFormat) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/gfx/thebes/gfxPlatform.cpp:754 (0x411f2eaa libxul.so+0xad1eaa)
>    mozilla::RefPtr<mozilla::gfx::DrawTarget>::operator mozilla::gfx::DrawTarget*() const /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/mozilla/RefPtr.h:137 (0x411f2f0e libxul.so+0xad1f0e)
>    operator mozilla::TemporaryRef<mozilla::gfx::DrawTarget><mozilla::gfx::DrawTarget> /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/mozilla/RefPtr.h:141 (0x412092ea libxul.so+0xae82ea)
>    mozilla::RefPtr<mozilla::gfx::DrawTarget>::assign(mozilla::gfx::DrawTarget*) /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/mozilla/RefPtr.h:145 (0x40c2e236 libxul.so+0x50d236)
>    mozilla::dom::HTMLImageElementOrHTMLCanvasElementOrHTMLVideoElement::IsHTMLCanvasElement() const /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/mozilla/dom/UnionTypes.h:219 (0x40c2fd54 libxul.so+0x50ed54)
>    nsCanvasRenderingContext2DAzure::DrawImage(mozilla::dom::HTMLImageElementOrHTMLCanvasElementOrHTMLVideoElement const&, double, double, mozilla::ErrorResult&) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/content/canvas/src/nsCanvasRenderingContext2DAzure.h:265 (0x4114d914 libxul.so+0xa2c914)
>    mozilla::dom::CanvasRenderingContext2DBinding::genericMethod(JSContext*, unsigned int, JS::Value*) /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dom/bindings/CanvasRenderingContext2DBinding.cpp:2660 (0x4114e410 libxul.so+0xa2d410)
>    CallJSNative /local/mnt/workspace/B2G_build/b2g_1.008/gecko/js/src/jscntxtinlines.h:364 (0x40e981d8 libxul.so+0xce91d8)
>    js::Interpret(JSContext*, js::StackFrame*, js::InterpMode) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/js/src/jsinterp.cpp:2475 (0x40e959ca libxul.so+0xce69ca)
>    js::RunScript(JSContext*, JS::Handle<JSScript*>, js::StackFrame*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/js/src/jsinterp.cpp:324 (0x40e97a96 libxul.so+0xce8a96)
>    js::FrameRegs::fp() const /local/mnt/workspace/B2G_build/b2g_1.008/gecko/js/src/vm/Stack.h:1137 (0x40e99454 libxul.so+0xcea454)
>    ~AutoLastFrameCheck /local/mnt/workspace/B2G_build/b2g_1.008/gecko/js/src/jscntxt.h:1557 (0x40e36202 libxul.so+0xc87202)
>    nsJSContext::CallEventHandler(nsISupports*, JSObject*, JSObject*, nsIArray*, nsIVariant**) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/dom/base/nsJSEnvironment.cpp:1940 (0x406ebc44 libxul.so+0x5d5c44)
>    nsJSEventListener::HandleEvent(nsIDOMEvent*) /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/nsError.h:1069 (0x40749732 libxul.so+0x633732)
>    nsEventListenerManager::HandleEventSubType(nsListenerStruct*, nsIDOMEventListener*, nsIDOMEvent*, nsIDOMEventTarget*, unsigned int, nsCxPusher*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/content/events/src/nsEventListenerManager.cpp:889 (0x4063d5da libxul.so+0x5275da)
>    nsEventListenerManager::HandleEventInternal(nsPresContext*, nsEvent*, nsIDOMEvent**, nsIDOMEventTarget*, unsigned int, nsEventStatus*, nsCxPusher*) /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/nsError.h:1065 (0x4063d714 libxul.so+0x527714)
>    nsEventListenerManager::HandleEvent(nsPresContext*, nsEvent*, nsIDOMEvent**, nsIDOMEventTarget*, unsigned int, nsEventStatus*, nsCxPusher*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/content/events/src/nsEventListenerManager.h:144 (0x4064d8d4 libxul.so+0x5378d4)
>    nsEventTargetChainItem::HandleEventTargetChain(nsEventChainPostVisitor&, unsigned int, nsDispatchingCallback*, bool, nsCxPusher*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/content/events/src/nsEventDispatcher.cpp:318 (0x4064d9c0 libxul.so+0x5379c0)
>    nsEventDispatcher::Dispatch(nsISupports*, nsPresContext*, nsEvent*, nsIDOMEvent*, nsEventStatus*, nsDispatchingCallback*, nsCOMArray<nsIDOMEventTarget>*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/content/events/src/nsEventDispatcher.cpp:636 (0x4064df42 libxul.so+0x537f42)
>    nsEventDispatcher::DispatchDOMEvent(nsISupports*, nsEvent*, nsIDOMEvent*, nsPresContext*, nsEventStatus*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/content/events/src/nsEventDispatcher.cpp:694 (0x406e7138 libxul.so+0x538138)
> 
> Unreported: 117 blocks in stack trace record 5 of 812
>  958,464 bytes (479,349 requested / 479,115 slop)
>  1.90% of the heap (21.01% cumulative);  3.20% of unreported (35.28% cumulative)
>  Allocated at
>    malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/replace_malloc.c:152 (0x401972ae libmozglue.so+0x42ae)
>    operator new(unsigned int) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/mozmemory_wrap.c:51 (0x40196fb6 libmozglue.so+0x3fb6)
>    __stl_new /local/mnt/workspace/B2G_build/b2g_1.008/ndk/sources/cxx-stl/stlport/stlport/stl/_new.h:134 (0x40451f42 libxul.so+0x2a2f42)
>    std::allocator<char>::_M_allocate(unsigned int, unsigned int&) /local/mnt/workspace/B2G_build/b2g_1.008/ndk/sources/cxx-stl/stlport/stlport/stl/_alloc.h:352 (0x40c67334 libxul.so+0xab8334)
>    IPC::Channel::ChannelImpl::ProcessIncomingMessages() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:587 (0x40c67910 libxul.so+0xab8910)
>    IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:762 (0x40c67b90 libxul.so+0xab8b90)
>    base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_pump_libevent.cc:215 (0x40c61668 libxul.so+0xab2668)
>    event_process_active /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/third_party/libevent/event.c:386 (0x40c45d74 libxul.so+0xa96d74)
>    base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_pump_libevent.cc:339 (0x40c61766 libxul.so+0xab2766)
>    MessageLoop::RunInternal() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_loop.cc:216 (0x40c4f7f0 libxul.so+0xaa07f0)
>    ~AutoRunState /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_loop.cc:502 (0x40c4f8a6 libxul.so+0xaa08a6)
>    base::Thread::ThreadMain() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/thread.cc:159 (0x40c57bb4 libxul.so+0xaa8bb4)
>    ThreadFunc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/platform_thread_posix.cc:41 (0x40c61b88 libxul.so+0xab2b88)
>    __thread_entry /local/mnt/workspace/B2G_build/b2g_1.008/bionic/libc/bionic/pthread.c:218 (0x4002f114 libc.so+0x13114)
>    pthread_create /local/mnt/workspace/B2G_build/b2g_1.008/bionic/libc/bionic/pthread.c:362 (0x4002ec68 libc.so+0x12c68)
Um, yuck.

What do the numbers in

> Unreported: ~1,178 blocks in stack trace record 1 of 812

mean, exactly?
> Unreported: ~1,178 blocks in stack trace record 1 of 812
>  ~4,821,554 bytes (~4,821,554 requested / ~0 slop)

The "~" means the numbers are approximate; as an optimization to speed up DMD, we don't record precisely allocations smaller than 4903 bytes.

"1,178 blocks" - there were 1,178 allocations with the given stack.
"stack trace record 1 of 812" - This is counting how many different stacks allocated unreported memory; this is the first one out of 812 different stacks.

"X bytes" These blocks together account for X bytes of memory allocated by jemalloc.  "Y requested / Z slop": This measures how much internal fragmentation there is in the allocations.  E.g. if you do malloc(15) and jemalloc rounds that up to 16 bytes, we'll report 16 bytes (15 requested / 1 slop).
Looking at about:memory...

├───7.15 MB (07.83%) -- workers/workers()
│   ├──2.40 MB (02.63%) -- worker(resource://gre/modules/wifi_worker.js, 0x443b6c00)
│   │  ├──1.79 MB (01.95%) -- gc-heap
│   │  │  ├──1.75 MB (01.92%) ── decommitted-arenas
│   │  │  └──0.03 MB (00.03%) ++ (3 tiny)
│   │  └──0.62 MB (00.68%) ++ (3 tiny)
│   ├──2.39 MB (02.61%) -- worker(resource://gre/modules/net_worker.js, 0x44de0000)
│   │  ├──1.74 MB (01.91%) -- gc-heap
│   │  │  ├──1.71 MB (01.87%) ── decommitted-arenas
│   │  │  └──0.03 MB (00.03%) ++ (3 tiny)
│   │  └──0.64 MB (00.71%) ++ (3 tiny)
│   └──2.36 MB (02.59%) -- worker(resource://gre/modules/wifi_worker.js, 0x43387c00)
│      ├──1.79 MB (01.95%) -- gc-heap
│      │  ├──1.75 MB (01.92%) ── decommitted-arenas
│      │  └──0.03 MB (00.03%) ++ (3 tiny)
│      └──0.58 MB (00.63%) ++ (3 tiny)

I've been told that there are supposed to be two wifi_worker.js workers?  Seems unfortunate... 


├───2.77 MB (03.03%) -- storage/sqlite
│   ├──2.17 MB (02.37%) -- 3406066227csotncta.sqlite
│   │  ├──1.89 MB (02.07%) ── cache-used [20]
│   │  └──0.28 MB (00.30%) -- (2 tiny)
│   │     ├──0.16 MB (00.17%) ── schema-used [20]
│   │     └──0.12 MB (00.13%) ── stmt-used [20]

What is this, and why are there 20 instances of it?  I think that indicates that there are 20 connections to the 3406066227csotncta.sqlite database.

----

Looking at DMD's results... which repo should I check out to get matching line
numbers?

IPC, IPC everywhere!


> > Unreported: 20 blocks in stack trace record 3 of 812
> >  1,310,720 bytes (1,280,160 requested / 30,560 slop)
> >  2.60% of the heap (16.67% cumulative);  4.37% of unreported (27.98% cumulative)
> >  Allocated at
> >    malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/replace_malloc.c:152 (0x401972ae libmozglue.so+0x42ae)
> >    sqlite3MemMalloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:15436 (0x406f07ec libmozsqlite3.so+0x257ec)
> >    mallocWithAlarm /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:18734 (0x406d0420 libmozsqlite3.so+0x5420)
> >    sqlite3Malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:18768 (0x406d04a4 libmozsqlite3.so+0x54a4)
> >    setupLookaside /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:112131 (0x406d5282 libmozsqlite3.so+0xa282)
> >    openDatabase /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:114032 (0x406eb61c libmozsqlite3.so+0x2061c)
> >    sqlite3_open_v2 /local/mnt/workspace/B2G_build/b2g_1.008/gecko/db/sqlite3/src/sqlite3.c:114069 (0x406eb78a libmozsqlite3.so+0x2078a)
> >    mozilla::storage::Connection::initialize(nsIFile*, char const*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/storage/src/mozStorageConnection.cpp:491 (0x40fc4252 libxul.so+0x8a3252)

This is surprising.  In theory we should be reporting every SQLite allocation.
Maybe I should look at bug 819801.


> > Unreported: 2 blocks in stack trace record 4 of 812
> >  1,228,800 bytes (1,228,800 requested / 0 slop)
> >  2.44% of the heap (19.11% cumulative);  4.10% of unreported (32.08% cumulative)
> >  Allocated at
> >    malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/replace_malloc.c:152 (0x401972ae libmozglue.so+0x42ae)
> >    moz_malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/mozalloc/mozalloc.cpp:65 (0x40df8d6a libxul.so+0xc49d6a)
> >    gfxImageSurface /local/mnt/workspace/B2G_build/b2g_1.008/gecko/gfx/thebes/gfxImageSurface.cpp:111 (0x40bdb062 libxul.so+0xac5062)

Hmm, we have a reporter for this case -- gfxImageSurface::SizeOfExcludingThis.
I wonder what's going wrong.


> > Unreported: 117 blocks in stack trace record 5 of 812
> >  958,464 bytes (479,349 requested / 479,115 slop)
> >  1.90% of the heap (21.01% cumulative);  3.20% of unreported (35.28% cumulative)
> >  Allocated at
> >    malloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/replace_malloc.c:152 (0x401972ae libmozglue.so+0x42ae)
> >    operator new(unsigned int) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/mozmemory_wrap.c:51 (0x40196fb6 libmozglue.so+0x3fb6)
> >    __stl_new /local/mnt/workspace/B2G_build/b2g_1.008/ndk/sources/cxx-stl/stlport/stlport/stl/_new.h:134 (0x40451f42 libxul.so+0x2a2f42)
> >    std::allocator<char>::_M_allocate(unsigned int, unsigned int&) /local/mnt/workspace/B2G_build/b2g_1.008/ndk/sources/cxx-stl/stlport/stlport/stl/_alloc.h:352 (0x40c67334 libxul.so+0xab8334)
> >    IPC::Channel::ChannelImpl::ProcessIncomingMessages() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:587 (0x40c67910 libxul.so+0xab8910)

I made this buffer shrink itself after each use in bug 811596.  It used to be
128 or 256 KiB, it's now 8 KiB after the shrinking, but there are 117 of them.
(I never saw more than 1 instance of this buffer in my profiles.)

I have an immediate way to halve this -- from bug 811596 comment 0:

> On my b2g-desktop build this reduces the amount of memory
> allocated for the string to 8 KiB.  (It's not 4 KiB because
> this machine's std::string seems to need 57 bytes for string
> header stuff.  I could reduce the capacity from 4096 to something
> like 4000 but it doesn't seem worth chasing this.)

But we should also investigate why there are 117 of these buffers!
The biggest offender by far is the additional 20mb of heap-unclassified.  After that it's ~13mb of additional committed JS memory (the memory used by workers actually /decreased/, but the main runtime is using more memory).  Then there's a potpourri of other stuff (SQLite, DOM, xpconnect).

Unfortunately we can't focus only on the heap-unclassified and JS issues, because the SQLite and DOM increases are roughly as large as the largest single offender in heap-unclassified.
Allocations involving Pickle::Resize account for 5.75 MiB all up.  You can see this from this stack frame record:

Unreported: ~1,405 blocks from ~4 stack trace records in stack frame record 30 of 3,057
 ~5,750,665 bytes (~5,750,665 requested / ~0 slop)
 11.43% of the heap;  0.13% of unreported
 PC is
   Pickle::Resize(unsigned int) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/pickle.cc:531 (0x40c49082 libxul.so+0xaa3082)


Allocations involving gfxImageSurface (the ones that I said should be getting reported) account for 4.19 MiB all up:

Unreported: ~44 blocks from ~9 stack trace records in stack frame record 40 of 3,057
 ~4,194,277 bytes (~4,162,541 requested / ~31,736 slop)
 8.33% of the heap;  0.10% of unreported
 PC is
   gfxImageSurface /local/mnt/workspace/B2G_build/b2g_1.008/gecko/gfx/thebes/gfxImageSurface.cpp:111 (0x40bdb062 libxul.so+0xac5062)
(In reply to Justin Lebar [:jlebar] from comment #33)
> "1,178 blocks" - there were 1,178 allocations with the given stack.

Just to be clear, it means that many blocks *live* when the report was generated?
I don't know the IPC code very well, but from what I can tell it looks like there are just tons of IPC messages that are pending and/or being processed.
> Just to be clear, it means that many blocks *live* when the report was
> generated?

Yes.  DMD's report is based entirely a single snapshot of the live heap.
> > Unreported: ~1,178 blocks in stack trace record 1 of 812
> >  ~4,821,554 bytes (~4,821,554 requested / ~0 slop)
> >  9.58% of the heap (9.58% cumulative);  16.09% of unreported (16.09% cumulative)
> >  Allocated at
> >    replace_realloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/replace/dmd/DMD.cpp:1134 (0x400b4038 libdmd.so+0x3038)
> >    realloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/build/replace_malloc.c:192 (0x401971d0 libmozglue.so+0x41d0)
> >    moz_realloc /local/mnt/workspace/B2G_build/b2g_1.008/gecko/memory/mozalloc/mozalloc.cpp:97 (0x40df8d62 libxul.so+0xc49d62)
> >    Pickle::Resize(unsigned int) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/pickle.cc:531 (0x40c49082 libxul.so+0xaa3082)
> >    Pickle /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/pickle.cc:60 (0x40c49270 libxul.so+0xaa3270)
> >    Message /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/chrome/common/ipc_message.cc:51 (0x40c54d9e libxul.so+0xaaed9e)

I understand this up to here.  Pickle's constructor allocates memory, and
Memory's constructor creates the Pickle.


> >    std::deque<IPC::Message, std::allocator<IPC::Message> >::push_back(IPC::Message const&) /local/mnt/workspace/B2G_build/b2g_1.008/ndk/sources/cxx-stl/stlport/stlport/stl/_deque.h:674 (0x40b1f5c0 libxul.so+0x9795c0)
> >    std::stack<IPC::Message, std::deque<IPC::Message, std::allocator<IPC::Message> > >::size() const /local/mnt/workspace/B2G_build/b2g_1.008/ndk/sources/cxx-stl/stlport/stlport/stl/_stack.h:83 (0x40a8f73c libxul.so+0x97973c)

I'm not sure about this bit.  http://www.cplusplus.com/reference/stack/stack/
tells me that std::stack is by default implemented on top of std::deque.  So
we're probably looking for a std::stack<IPC::Message>.  The two I can find are
both in class RPCChannel (in ipc/glue/RPCChannel.h):

    // Stack of all the RPC out-calls on which this RPCChannel is
    // awaiting a response.
    //
    std::stack<Message> mStack;

and:

    // Stack of RPC in-calls that were deferred because of race
    // conditions.
    //
    std::stack<Message> mDeferred;

But I don't understand why std::stack<...>::size() would call
std::deque<...>::push_back().  Maybe one of the stack frames is bogus.

Anyway, mStack gets pushed-to in RPCChannel::Call(), and mDeferred gets
pushed-to in RPCChannel::Incall().


> >    mozilla::Mutex::Unlock() /local/mnt/workspace/B2G_build/b2g_1.008/out/target/product/msm7627a/obj/objdir-gecko/dist/include/mozilla/Mutex.h:83 (0x40a8a8da libxul.so+0x9748da)

I don't know where this is coming from, and how we can get from here to the
std:stack<...>::size() call above.  Maybe it's a bogus stack frame.


> >    IPC::Channel::ChannelImpl::ProcessIncomingMessages() /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:578 (0x40bcea9a libxul.so+0xab8a9a)

This line is:

    listener_->OnMessageReceived(m);

OnMessageReceived() is a virtual call.


> >    IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:762 (0x40c67b90 libxul.so+0xab8b90)
> >    base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) /local/mnt/workspace/B2G_build/b2g_1.008/gecko/ipc/chromium/src/base/message_pump_libevent.cc:215 (0x40c61668 libxul.so+0xab2668)

This is straightforward.


Anyway, my guess is that IPC messages are piling up in some unexpected and undesirable way, due to race conditions or lack of responses or something.
> Allocations involving gfxImageSurface (the ones that I said should be
> getting reported) account for 4.19 MiB all up

Roughly 3 MiB of that 4.19 MiB is being reported, but in a way that DMD doesn't see (i.e. it's a computed size rather than a measurement with a malloc_size_of function);  that roughly matches the 3.28 MiB of images data for the main process in about:memory.  About 1.2 MiB isn't covered by any existing reporter, AFAICT.
Depends on: 806377
The IPC badness looks to me like a hung compositor thread.  There's not enough information in the package to determine that for sure though.

If we can reproduce this in our automation, we'll be able to tell for sure.
We have been having considerable stability problems in our automated Gaia tests since around Jan 30th (or nearly same time this bug was filed), with apps experiencing memory problems and freezing.  We have been trying to isolate some STR with little success.  I'll try running our tests with a DMD build and pulling an about:memory report when the problem occurs.
(In reply to Jonathan Griffin (:jgriffin) from comment #44)
> We have been having considerable stability problems in our automated Gaia
> tests since around Jan 30th (or nearly same time this bug was filed), with
> apps experiencing memory problems and freezing.  We have been trying to
> isolate some STR with little success.  I'll try running our tests with a DMD
> build and pulling an about:memory report when the problem occurs.

Good.  Please put the results in a new bug;  if you do that and your problems have the same cause it'll be easy to dup the new bug against this one, but if you don't do that and your problems have a different cause this bug will become a tangled mess.
That sounds great.  If you catch a freeze and can attach gdb to the b2g process, a |thread apply all bt| will speak volumes.
I've filed our problem as bug 841976, and attached gdb and about:memory outputs.
cjones, it's great that you diagnosed this in bug 841993.  I'd like to add a memory reporter for these Message objects, so that any future snafus don't require DMD runs to show up.

But I can't work out from the allocation stack (see comment 31 and comment 41) where they end up being stored, which is what I need to know to write a memory reporter.  Do you know?
That's going to be a bit nontrivial, because the code there is multithreaded and sticks Messages in different places at times.  I'd recommend we do that in a separate bug.
And since we create a ton of these messages (nearly 80k just for my one small IPC-IndexedDB test suite) and their lifetime is (normally) very short the overhead of adding to/removing from a map might hurt our IPC performance, right?
(In reply to ben turner [:bent] from comment #50)
> And since we create a ton of these messages (nearly 80k just for my one
> small IPC-IndexedDB test suite) and their lifetime is (normally) very short
> the overhead of adding to/removing from a map might hurt our IPC
> performance, right?

That sounds to me like it might be a premature optimization.  :)

I'd expect e.g. a call to malloc(), which is surely part of this operation, to be at least as expensive as taking a lock and adding an entry to a hashtable (which is all NS_RegisterMemoryReporter does on trunk these days, and we could backport this behavior to b2g18).  Not to mention the syscalls involved in IPC...
(In reply to Justin Lebar [:jlebar] from comment #51)
> to be at least as expensive as taking a lock and adding an entry to a
> hashtable (which is all NS_RegisterMemoryReporter does on trunk these days,

Sure, I doubt that would be expensive. The part I'm talking about is tracking all the live messages. You'd have to do that in another big table right? Each message is a different size so you can't just keep a count and multiply for instance.
I filed bug 842177 for a memory reporter for the Message objects.  I'll worry about the efficiency of this reporter once I know how to actually write it, i.e. where the Message objects are stored.
Flags: needinfo?(vasanth)
Partners are internally testing bug 841993 right now to determine if it address this bug, which will take a couple more days.
Word on the street is that this isn't popping up after bug 841993, so we're going to tentatively declare victory.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Attached file about:memory dump
I am still seeing this issue on our latest v1 build.  I am using a modified version of the http://hg.mozilla.org/users/tmielczarek_mozilla.com/b2gperf script to start the Email application, kill all apps, and repeat over and over.  After ~400, the b2g process is now ~110MB and Email boots to a black screen.
(bummer, reopening)
Status: RESOLVED → REOPENED
blocking-b2g: - → tef+
Resolution: WORKSFORME → ---
Attaching the smaps file for the b2g process in case this helps.  Retrieved using:

 adb shell cat /proc/124/smaps > smaps.txt

Top 10 allocations are:

Referenced:         3044 kB
Referenced:         3068 kB
Referenced:         3072 kB
Referenced:         3936 kB
Referenced:         4092 kB
Referenced:         4096 kB
Referenced:         4096 kB
Referenced:         8680 kB
Referenced:        21672 kB
Referenced:        24100 kB

The last two are Anonymous and one of them might be the large strings seen in the about:memory file attached above.
Let's not reuse this bug for a new problem.  I've spun off bug 850893 for the new problem.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
blocking-b2g: tef+ → ---
(In reply to Michael Schwartz [:m4] from comment #56)
> Created attachment 724553 [details]
> about:memory dump
> 
> I am still seeing this issue on our latest v1 build.  I am using a modified
> version of the http://hg.mozilla.org/users/tmielczarek_mozilla.com/b2gperf
> script to start the Email application, kill all apps, and repeat over and
> over.  After ~400, the b2g process is now ~110MB and Email boots to a black
> screen.

Just FYI, the latest version of that script is now hosted on github: https://github.com/mozilla/b2gperf
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: