Closed Bug 948327 Opened 11 years ago Closed 11 years ago

Profiler I/O interpose observer causes unbounded heap-unclassified growth

Categories

(Core :: Gecko Profiler, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: jdm, Assigned: ehsan.akhgari)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(1 file, 1 obsolete file)

Running a Nightly for several days, I ended up with 9gb of memory usage, 8gb of which was unclassified. DMD showed me this after two hours of browsing with a recent mozilla-inbound build:

Unreported: ~41,834 blocks in stack trace record 1 of 6,297
 ~171,226,562 bytes (~171,226,562 requested / ~0 slop)
 19.77% of the heap (19.77% cumulative);  35.71% of unreported (35.71% cumulative)
 Allocated at
   replace_malloc (/run/media/jdm/ssd/inbound/memory/replace/dmd/DMD.cpp:1247) 0x8ab206dd
   moz_xmalloc (/run/media/jdm/ssd/inbound/memory/mozalloc/mozalloc.cpp:53) 0x8ab1a1e6
   mozilla::ProfilerIOInterposeObserver::Observe(mozilla::IOInterposeObserver::Observation&) (/run/media/jdm/ssd/inbound/tools/profiler/ProfilerIOInterposeObserver.cpp:42) 0x86eecc32
   mozilla::IOInterposer::Report(mozilla::IOInterposeObserver::Observation&) (/run/media/jdm/ssd/inbound/tools/profiler/IOInterposer.cpp:187 (discriminator 1)) 0x86ee9a32
   ~IOThreadAutoTimer (/run/media/jdm/ssd/inbound/storage/src/TelemetryVFS.cpp:123) 0x862b7f6a
   xRead (/run/media/jdm/ssd/inbound/storage/src/TelemetryVFS.cpp:179) 0x862b80a0
   readDbPage (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:41637) 0x8a9708ba
   sqlite3PagerAcquire (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:44110) 0x8a99eb50
   btreeGetPage (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:51590) 0x8a99ebda
   getAndInitPage (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:51645) 0x8a99ec31
   moveToChild (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:54297) 0x8a99ec88
   moveToLeftmost (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:54472) 0x8a99ed7c
   sqlite3VdbeExec (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:70748) 0x8a9ac0f5
   sqlite3Step (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:64419) 0x8a9afb7e
   mozilla::storage::Connection::stepStatement(sqlite3_stmt*) (/run/media/jdm/ssd/inbound/storage/src/mozStorageConnection.cpp:922 (discriminator 1)) 0x862b4ef8
   mozilla::storage::Statement::ExecuteStep(bool*) (/run/media/jdm/ssd/inbound/storage/src/mozStorageStatement.cpp:647) 0x862bdaaf
   mozilla::net::Seer::LookupTopLevel(mozilla::net::Seer::QueryType, nsACString_internal const&, mozilla::net::Seer::TopLevelInfo&) (/run/media/jdm/ssd/inbound/netwerk/base/src/Seer.cpp:1078) 0x860221fb
   mozilla::net::Seer::LearnForSubresource(mozilla::net::Seer::UriInfo const&, mozilla::net::Seer::UriInfo const&) (/run/media/jdm/ssd/inbound/netwerk/base/src/Seer.cpp:1782) 0x860222ae
   mozilla::net::SeerLearnEvent::Run() (/run/media/jdm/ssd/inbound/netwerk/base/src/Seer.cpp:1594) 0x8602719f
   nsThread::ProcessNextEvent(bool, bool*) (/run/media/jdm/ssd/inbound/xpcom/threads/nsThread.cpp:612) 0x85fe5fdc
   NS_ProcessNextEvent(nsIThread*, bool) (/run/media/jdm/ssd/inbound/xpcom/glue/nsThreadUtils.cpp:263) 0x85fab072
   nsThread::ThreadFunc(void*) (/run/media/jdm/ssd/inbound/xpcom/threads/nsThread.cpp:245 (discriminator 1)) 0x85fe9202
   _pt_root (/run/media/jdm/ssd/inbound/nsprpub/pr/src/pthreads/ptthread.c:208) 0x8a3eaca2
   start_thread (/usr/src/debug/glibc-2.16-75f0d304/nptl/pthread_create.c:308) 0x5e607d15

Unreported: ~7,713 blocks in stack trace record 2 of 6,297
 ~31,569,309 bytes (~31,569,309 requested / ~0 slop)
 3.64% of the heap (23.41% cumulative);  6.58% of unreported (42.29% cumulative)
 Allocated at
   replace_malloc (/run/media/jdm/ssd/inbound/memory/replace/dmd/DMD.cpp:1247) 0x8ab206dd
   moz_xmalloc (/run/media/jdm/ssd/inbound/memory/mozalloc/mozalloc.cpp:53) 0x8ab1a1e6
   mozilla::ProfilerIOInterposeObserver::Observe(mozilla::IOInterposeObserver::Observation&) (/run/media/jdm/ssd/inbound/tools/profiler/ProfilerIOInterposeObserver.cpp:42) 0x86eecc32
   mozilla::IOInterposer::Report(mozilla::IOInterposeObserver::Observation&) (/run/media/jdm/ssd/inbound/tools/profiler/IOInterposer.cpp:187 (discriminator 1)) 0x86ee9a32
   ~IOThreadAutoTimer (/run/media/jdm/ssd/inbound/storage/src/TelemetryVFS.cpp:123) 0x862b7f6a
   xRead (/run/media/jdm/ssd/inbound/storage/src/TelemetryVFS.cpp:179) 0x862b80a0
   readDbPage (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:41637) 0x8a9708ba
   sqlite3PagerAcquire (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:44110) 0x8a99eb50
   btreeGetPage (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:51590) 0x8a99ebda
   getAndInitPage (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:51645) 0x8a99ec31
   moveToChild (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:54297) 0x8a99ec88
   moveToLeftmost (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:54472) 0x8a99ed7c
   sqlite3VdbeExec (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:70748) 0x8a9ac0f5
   sqlite3Step (/run/media/jdm/ssd/inbound/db/sqlite3/src/sqlite3.c:64419) 0x8a9afb7e
   mozilla::storage::Connection::stepStatement(sqlite3_stmt*) (/run/media/jdm/ssd/inbound/storage/src/mozStorageConnection.cpp:922 (discriminator 1)) 0x862b4ef8
   mozilla::storage::Statement::ExecuteStep(bool*) (/run/media/jdm/ssd/inbound/storage/src/mozStorageStatement.cpp:647) 0x862bdaaf
   mozilla::net::Seer::LookupTopLevel(mozilla::net::Seer::QueryType, nsACString_internal const&, mozilla::net::Seer::TopLevelInfo&) (/run/media/jdm/ssd/inbound/netwerk/base/src/Seer.cpp:1078) 0x860221fb
   mozilla::net::Seer::PredictForPageload(mozilla::net::Seer::UriInfo const&, nsMainThreadPtrHandle<nsINetworkSeerVerifier>&, int, mozilla::TimeStamp&) (/run/media/jdm/ssd/inbound/netwerk/base/src/Seer.cpp:1334) 0x86023d79
   mozilla::net::SeerPredictionEvent::Run() (/run/media/jdm/ssd/inbound/netwerk/base/src/Seer.cpp:812) 0x86027f6a
   nsThread::ProcessNextEvent(bool, bool*) (/run/media/jdm/ssd/inbound/xpcom/threads/nsThread.cpp:612) 0x85fe5fdc
   NS_ProcessNextEvent(nsIThread*, bool) (/run/media/jdm/ssd/inbound/xpcom/glue/nsThreadUtils.cpp:263) 0x85fab072
   nsThread::ThreadFunc(void*) (/run/media/jdm/ssd/inbound/xpcom/threads/nsThread.cpp:245 (discriminator 1)) 0x85fe9202
   _pt_root (/run/media/jdm/ssd/inbound/nsprpub/pr/src/pthreads/ptthread.c:208) 0x8a3eaca2
   start_thread (/usr/src/debug/glibc-2.16-75f0d304/nptl/pthread_create.c:308) 0x5e607d15

Unreported: 1,495 blocks in stack trace record 3 of 6,297
 24,494,080 bytes (23,920,000 requested / 574,080 slop)
 2.83% of the heap (26.24% cumulative);  5.11% of unreported (47.40% cumulative)
 Allocated at
   replace_malloc (/run/media/jdm/ssd/inbound/memory/replace/dmd/DMD.cpp:1247) 0x8ab206dd
   moz_xmalloc (/run/media/jdm/ssd/inbound/memory/mozalloc/mozalloc.cpp:53) 0x8ab1a1e6
   ThreadProfile::ThreadProfile(char const*, int, PseudoStack*, int, PlatformData*, bool, void*) (/run/media/jdm/ssd/inbound/tools/profiler/ProfileEntry.cpp:163) 0x86eea5bf
   SyncProfile::SyncProfile(char const*, int, PseudoStack*, int, bool) (/run/media/jdm/ssd/inbound/tools/profiler/SyncProfile.cpp:16) 0x86eeaad1
   NewSyncProfile (/run/media/jdm/ssd/inbound/tools/profiler/TableTicker.cpp:627) 0x86eeab61
   TickSample (/run/media/jdm/ssd/inbound/tools/profiler/platform.h:264) 0x86eeab7e
   operator new (/run/media/jdm/ssd/inbound/obj-x86_64-unknown-linux-gnu/tools/profiler/../../dist/include/mozilla/mozalloc.h:201) 0x86eec515
   profiler_get_backtrace (/run/media/jdm/ssd/inbound/tools/profiler/platform.cpp:873) 0x86eecc19
   mozilla::IOInterposer::Report(mozilla::IOInterposeObserver::Observation&) (/run/media/jdm/ssd/inbound/tools/profiler/IOInterposer.cpp:187 (discriminator 1)) 0x86ee9a32
   ~Observation (/run/media/jdm/ssd/inbound/tools/profiler/IOInterposer.h:101) 0x86ee9a78
   interposedRead (/run/media/jdm/ssd/inbound/tools/profiler/NSPRInterposer.cpp:71) 0x86ee9bd3
   nsFileStreamBase::Read(char*, unsigned int, unsigned int*) (/run/media/jdm/ssd/inbound/netwerk/base/src/nsFileStreams.cpp:218) 0x8602c3b1
   nsFileInputStream::Read(char*, unsigned int, unsigned int*) (/run/media/jdm/ssd/inbound/netwerk/base/src/nsFileStreams.cpp:482) 0x8602c3db
   tag_nsresult NS_ReadLine<char, nsFileInputStream, nsACString_internal>(nsFileInputStream*, nsLineBuffer<char>*, nsACString_internal&, bool*) (/run/media/jdm/ssd/inbound/obj-x86_64-unknown-linux-gnu/netwerk/base/src/../../../dist/include/nsReadLine.h:85) 0x86035c4e
   nsOSHelperAppService::GetExtensionsAndDescriptionFromMimetypesFile(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, nsAString_internal&, nsAString_internal&) (/run/media/jdm/ssd/inbound/uriloader/exthandler/unix/nsOSHelperAppService.cpp:514) 0x862fda37
   nsOSHelperAppService::LookUpExtensionsAndDescription(nsAString_internal const&, nsAString_internal const&, nsAString_internal&, nsAString_internal&) (/run/media/jdm/ssd/inbound/uriloader/exthandler/unix/nsOSHelperAppService.cpp:469) 0x862fdc00
   nsOSHelperAppService::GetFromType(nsCString const&) (/run/media/jdm/ssd/inbound/uriloader/exthandler/unix/nsOSHelperAppService.cpp:1410) 0x862ffa0c
   ~nsPromiseFlatCString (/run/media/jdm/ssd/inbound/obj-x86_64-unknown-linux-gnu/uriloader/exthandler/../../dist/include/nsTPromiseFlatString.h:61) 0x862fff49
   nsExternalHelperAppService::GetFromTypeAndExtension(nsACString_internal const&, nsACString_internal const&, nsIMIMEInfo**) (/run/media/jdm/ssd/inbound/uriloader/exthandler/nsExternalHelperAppService.cpp:2465) 0x862fa0b9
   nsMimeTypeArray::NamedGetter(nsAString_internal const&, bool&) (/run/media/jdm/ssd/inbound/dom/base/nsMimeTypeArray.cpp:132) 0x8680028f
   nsMimeTypeArray::NamedItem(nsAString_internal const&) (/run/media/jdm/ssd/inbound/dom/base/nsMimeTypeArray.cpp:75) 0x86800402
   mozilla::dom::Navigator::JavaEnabled(mozilla::ErrorResult&) (/run/media/jdm/ssd/inbound/dom/base/Navigator.cpp:589) 0x867e63ac
   javaEnabled (/run/media/jdm/ssd/inbound/obj-x86_64-unknown-linux-gnu/dom/bindings/NavigatorBinding.cpp:553) 0x86548bfd
   genericMethod (/run/media/jdm/ssd/inbound/obj-x86_64-unknown-linux-gnu/dom/bindings/NavigatorBinding.cpp:1769) 0x86552284

Unreported: 1 block in stack trace record 4 of 6,297
 16,003,072 bytes (16,000,000 requested / 3,072 slop)
 1.85% of the heap (28.09% cumulative);  3.34% of unreported (50.74% cumulative)
 Allocated at
   replace_malloc (/run/media/jdm/ssd/inbound/memory/replace/dmd/DMD.cpp:1247) 0x8ab206dd
   moz_xmalloc (/run/media/jdm/ssd/inbound/memory/mozalloc/mozalloc.cpp:53) 0x8ab1a1e6
   ThreadProfile::ThreadProfile(char const*, int, PseudoStack*, int, PlatformData*, bool, void*) (/run/media/jdm/ssd/inbound/tools/profiler/ProfileEntry.cpp:163) 0x86eea5bf
   ThreadInfo::SetProfile(ThreadProfile*) (/run/media/jdm/ssd/inbound/tools/profiler/platform.h:397) 0x86eefa7d
   mozilla_sampler_start(int, double, char const**, unsigned int, char const**, unsigned int) (/run/media/jdm/ssd/inbound/tools/profiler/platform.cpp:646 (discriminator 6)) 0x86eed8ad
   nsProfiler::StartProfiler(unsigned int, double, char const**, unsigned int, char const**, unsigned int) (/run/media/jdm/ssd/inbound/tools/profiler/nsProfiler.cpp:86) 0x86eedad2
   NS_InvokeByIndex (/run/media/jdm/ssd/inbound/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:164) 0x85febc87
   CallMethodHelper::Invoke() (/run/media/jdm/ssd/inbound/js/xpconnect/src/XPCWrappedNative.cpp:2567) 0x867608db
   XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) (/run/media/jdm/ssd/inbound/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1301) 0x86760e91
   CallJSNative (/run/media/jdm/ssd/inbound/js/src/jscntxtinlines.h:220) 0x87624fbb
   Interpret (/run/media/jdm/ssd/inbound/js/src/vm/Interpreter.cpp:2505) 0x8761bcc5
   RunScript (/run/media/jdm/ssd/inbound/js/src/vm/Interpreter.cpp:420) 0x876249dc
   js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (/run/media/jdm/ssd/inbound/js/src/vm/Interpreter.cpp:486) 0x87624f24
   js_fun_call(JSContext*, unsigned int, JS::Value*) (/run/media/jdm/ssd/inbound/js/src/jsfun.cpp:893) 0x8755778f
   CallJSNative (/run/media/jdm/ssd/inbound/js/src/jscntxtinlines.h:220) 0x87624fbb
   Interpret (/run/media/jdm/ssd/inbound/js/src/vm/Interpreter.cpp:2505) 0x8761bcc5
   RunScript (/run/media/jdm/ssd/inbound/js/src/vm/Interpreter.cpp:420) 0x876249dc
   js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (/run/media/jdm/ssd/inbound/js/src/vm/Interpreter.cpp:486) 0x87624f24
   js_fun_call(JSContext*, unsigned int, JS::Value*) (/run/media/jdm/ssd/inbound/js/src/jsfun.cpp:893) 0x8755778f
   CallJSNative (/run/media/jdm/ssd/inbound/js/src/jscntxtinlines.h:220) 0x87624fbb
   Interpret (/run/media/jdm/ssd/inbound/js/src/vm/Interpreter.cpp:2505) 0x8761bcc5
   RunScript (/run/media/jdm/ssd/inbound/js/src/vm/Interpreter.cpp:420) 0x876249dc
   js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (/run/media/jdm/ssd/inbound/js/src/vm/Interpreter.cpp:486) 0x87624f24
   js_fun_call(JSContext*, unsigned int, JS::Value*) (/run/media/jdm/ssd/inbound/js/src/jsfun.cpp:893) 0x8755778f
Having spent another hour browsing in this session today, the vast majority of DMD entries are now profiler-related, with heap-unclassified sitting at 55% of 1.4gb total.
Version: 24 Branch → Trunk
Aaron, do you have any idea what's going on here?
Flags: needinfo?(aklotz)
Actually, Benoit might have a better idea about lifetime issues here.

AFAICS, the proper destructors ought to be getting called, but there's clearly some sort of leakage here.
Flags: needinfo?(bgirard)
Looks like the markers are leaking. There's code that should delete the markers when the generation expires. That's the likely culprit.
Flags: needinfo?(bgirard)
This code leaks aPayload _at least_ in all of the early returns there: <http://mxr.mozilla.org/mozilla-central/source/tools/profiler/GeckoProfilerImpl.h#369>
Assignee: nobody → ehsan
Attachment #8345484 - Attachment is obsolete: true
Attachment #8345490 - Flags: review?(bgirard)
Attachment #8345490 - Flags: review?(bgirard) → review+
Whiteboard: [MemShrink] → [MemShrink:P1]
https://hg.mozilla.org/mozilla-central/rev/a173b518a028
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
While Ehsan's fix is legit, I'm not convinced that it is what we are seeing here. Vladan's repro in bug 948285 is still valid even with this fix.

Vlad's repro is easy to replicate, so I'm commenting based on what I am seeing there. The "Forget about this site" command in history is very I/O intensive. Since this operation results in the I/O interposer inserting a large number of stacks into the profile, the circular buffer must wrap frequently for the generation count to be incremented and for those stacks to eventually be freed. If the buffer is set large enough, or the sample interval is low enough (or both), it could take a relatively long time for the circular buffer in the ThreadProfile to wrap -- several minutes by a back-of-the-envelope estimate. With the heavy I/O going on, it is not too farfetched that we could hit OOM simply by stuffing too much data into the thread's profile without its circular buffer wrapping soon enough.

Using Vlad's repro, if I used the default profiler settings I couldn't OOM. In fact, I could observe the Firefox process memory usage oscillating in a sawtooth pattern. OTOH, if I increased the buffer size to ten million but left the sample period at 10ms, Firefox would hit OOM.

Josh, what are your profiler sample interval and buffer size settings?
Flags: needinfo?(aklotz) → needinfo?(josh)
profiler.interval = 10, profiler.entries = 1000000.
Flags: needinfo?(josh)
FWIW I just read the code and found the bug that I fixed.  So it can totally be insufficient here.
Does this bug reproduce with profiling disabled? Does it reproduce with 'Main thread IO' disabled?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: