Closed Bug 932215 Opened 11 years ago Closed 11 years ago

Memory usage of gUM

Categories

(Core :: WebRTC: Audio/Video, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: slee, Assigned: jesup)

References

Details

(Keywords: perf, Whiteboard: [WebRTC][MemShrink:P2])

Attachments

(3 files)

This bug is to analyze the memory usage of gUM. I am using DMD to estimate the memory usage of gUM. By using DMD on B2G, you should 1. export MOZ_DMD=1 and export B2G_NOOPT=1 2. go to B2G/tools and run the command "./get_about_memory.py --product unagi --gecko-objdir ../../mozilla-central-2/obj-unagi-memory --no-auto-open --no-gc-cc-log --remove-cache --toolchain-prefix arm-linux-androideabi- --toolchain-dir ./prebuilt/linux-x86/toolchain/arm-linux-androideabi-4.4.x/bin" 3. The result will be store in the same folder with name "about-memory-#" Here is how I verify the memory usage. 1. run browser app, connect to [1] and estimate the memory by the above command 2. gUM for audio or video then profile again. 3. compare the memory usage of Browser app by using firefox desktop with url "about:memory" Here is the result * video ** the memory diff is not much, about 0.1MB * audio ** the memory diff is about 5.97MB, much higher than video I unzipped the corresponding file, ex, dmd-warning-7192.txt.gz, and found that [2] allocates much memory, 2 * 8000 * 256, on B2G. [1] http://mozilla.github.io/webrtc-landing/gum_test.html [2] http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/system_wrappers/source/trace_impl.cc#l92
Attached file audio.tar
memory usage of gUM with audio only
Attached file video.tar
memory usage of gUM with video only
MemShrink for analyzing the high memory usage of gUM audio per comment 0.
Keywords: perf
Whiteboard: webrtc → [WebRTC][MemShrink]
The relevant section is dmd-warning-7192.txt Unreported: ~1,001 blocks in stack trace record 1 of 379 ~4,097,093 bytes (~4,097,093 requested / ~0 slop) 27.06% of the heap (27.06% cumulative); 51.01% of unreported (51.01% cumulative) Allocated at 51% (4MB) of the unreported is webrtc_trace bufferspace (they buffer up log messages in a circular buffer, more or less). Was webrtc_trace on in NSPR_LOG_MODULES? They may well allocate the trace buffers always, however (in fact it looks that way). However... The size of this should be <224KB (it's 2240xx on a 64-bit-pointer machine). 4MB appears to be in use. I tried on desktop, and only a single TracePosix is ever created, even going audio gum tests multiple times. Perhaps there's something broken in B2G's compiling of the "single static instance" stuff? slee, can you verify how many times TracePosix::TracePosix() is called? (trace_posix.cc) Also, can you verify the size of the object created? It should be <200KB - break in TraceImpl::CreateInstance() and then step down into new and look at the sz param. or "p sizeof(*this) in the TracePosix() func
Flags: needinfo?(slee)
400K is 50 AudioFrames in a MemoryPool, which have a kMaxDataSizeSamples == 32KHz * 60ms * 2 channels = 3840 samples (this will likely grow to 48KHz). 60ms may be overkill.... we pretty much do 20ms all the time, but we may need to allow receiving it. 50 frames is quite a bit too.
NetEq is another 200K alloc for a packetbuffer block And I see a CreateInstance() line 64 ("new TracePosix();") for 164K, which makes more sense, but with a different stack. The 4MB is 1001 objects (of 4093 bytes!) from TracePosix::TracePosix() (line 40), called from that line 64(!). This makes absolutely no sense (should be 1 obj), since we should call CreateInstance()/TracePosix() only once, and on desktop I have proof of that. slee, can you check that (both line 64, and TracePosix():40)
Bunch of 64K objects which are artifacts of DMD (gzwrite buffers). 64K object for BufferedAudioStream (mBuffer.SetCapacity(1 second)). Bunch of gfx/font allocations, 128K hash table, etc 24K for OpenSLES Down in the noise from here. Looks like the only unusual one is the 4MB instead of 164K for TracePosix()/CreateCriticalSec(?)
Whiteboard: [WebRTC][MemShrink] → [WebRTC][MemShrink:P2]
(In reply to Randell Jesup [:jesup] from comment #4) > The relevant section is dmd-warning-7192.txt > 51% (4MB) of the unreported is webrtc_trace bufferspace (they buffer up log > messages in a circular buffer, more or less). Was webrtc_trace on in > NSPR_LOG_MODULES? They may well allocate the trace buffers always, however > (in fact it looks that way). However... The 4BM is mainly from the buffer in TraceImpl which allocates 8000*2 char array and each size is 256, [1]. From [2], the number of array is 2000*2 on iOS. If we apply the same settings as iOS, we can reduce the memory to about 1MB. > > The size of this should be <224KB (it's 2240xx on a 64-bit-pointer machine). > 4MB appears to be in use. I tried on desktop, and only a single TracePosix > is ever created, even going audio gum tests multiple times. Perhaps there's > something broken in B2G's compiling of the "single static instance" stuff? > > slee, can you verify how many times TracePosix::TracePosix() is called? It calls only once. I have no ideas about 1001 blocks in "~1,001 blocks in stack trace record 1 of 379" means. :( > (trace_posix.cc) Also, can you verify the size of the object created? It > should be <200KB - break in TraceImpl::CreateInstance() and then step down > into new and look at the sz param. or "p sizeof(*this) in the TracePosix() > func The size is 160056. [1] http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/system_wrappers/source/trace_impl.cc#l92 [2] http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/system_wrappers/source/trace_impl.h#l27
Flags: needinfo?(slee)
> > slee, can you verify how many times TracePosix::TracePosix() is called? > It calls only once. I have no ideas about 1001 blocks in "~1,001 blocks in > stack trace record 1 of 379" means. :( For the allocation point described by the given stack trace (which comment 4 omits), approximately 1,001 blocks were allocated with malloc/new. Their cumulative size was approximatly 4MB.
(In reply to Nicholas Nethercote [:njn] from comment #9) > For the allocation point described by the given stack trace (which comment 4 > omits), approximately 1,001 blocks were allocated with malloc/new. Their > cumulative size was approximatly 4MB. Hi njn, Thanks for your reply. I have some other questions. For general cases, should 1 block mean 4KB? jesup mentioned in comment 6 that the 4MB should be 1 block. Here is the corresponding code of allocating memory. I replace the macros with real values. for (int m = 0; m < 2; ++m) { for (int n = 0; n < 8000; ++n) { message_queue_[m][n] = new char[256]; } } Could it explain that the 4MB memory are in 1001 blocks since the above buffers use 1000 blocks and for the remaining memory occupy 1 block?
Ok, thanks (missed one line in the traceback, which explains the allocations). not sure why it's reported as 1001 4093ish byte allocations instead of 16000 256-byte allocs, but that's not important. So, two issues here: the webrtc.org code always starts the TraceImpl. We could make it late-allocate the bufferspace when the trace file is set; right now it accumulates messages even when it's not logging them anywhere. That would kill the 4MB allocation unless you turn on logging. You still have the thread and the 160KB allocation, and some overhead formatting the log messages distributed around. (We can profile the impact of that.) We *do* grab a log when adding a log message to the list; if we could rework it as a lockless circular buffer at some point it would be better. Turning it fully off is something we can consider for release builds, though if the overhead is low it's not worth the risk (and makes it hard to debug problems later or in the field).
Attachment #824690 - Flags: review?(jib)
> jesup mentioned in comment 6 that the 4MB should be 1 block. Here is the > corresponding code of allocating memory. I replace the macros with real > values. > for (int m = 0; m < 2; ++m) { > for (int n = 0; n < 8000; ++n) { > message_queue_[m][n] = new char[256]; > } > } > > Could it explain that the 4MB memory are in 1001 blocks since the above > buffers use 1000 blocks and for the remaining memory occupy 1 block? By default, DMD approximates things for speed. Normally it doesn't matter much, but lots of small allocations in a row like this is a pathological case for the approximation technique. If you want a run without any approximations, add "--sample-below=1" to the DMD environment variable. https://wiki.mozilla.org/Performance/MemShrink/DMD#--sample-below.3D.3C1..n.3E has the full instructions. The output of such a run won't have any '~' chars (which indicate approximations). BTW, why does the code do 16,000 x 256 byte allocations in a row? That is... surprising. Allocations are relatively slow, so a single 4MB allocation should be *much* faster. It'd also be great if this memory had a memory reporter so it showed up in about:memory properly, not under heap-unclassified.
Thanks, that explains the DMD issue. I think it was something that happens once and wasn't on anyone's radar. I agree, given there's no further allocation/reallocation, one big alloc would be better. I may add that to the patch I propose to upstream. As for memory reporters: this is imported code, so adding reporters will be a pain. Not impossible, but we'd want to do so in a generic, templated/macroed way that would be upstreamable - I don't want to carry tons of mods all over the code causing merge failures.
Comment on attachment 824690 [details] [diff] [review] Lazily allocate log buffers for webrtc (4MB saving) Review of attachment 824690 [details] [diff] [review]: ----------------------------------------------------------------- Seems logically sound, if a little confusing. See comments. r+ but recommend reverting logic or clarifying. ::: media/webrtc/trunk/webrtc/system_wrappers/source/system_wrappers.gyp @@ +131,5 @@ > 'trace_win.cc', > 'trace_win.h', > ], > }], > + ['enable_lazy_trace_alloc==0', { Don't you mean 'enable_lazy_trace_alloc==1' ? ::: media/webrtc/trunk/webrtc/system_wrappers/source/trace_impl.cc @@ +92,2 @@ > message_queue_[m][n] = new > char[WEBRTC_TRACE_MAX_MESSAGE_SIZE]; See below. @@ +345,5 @@ > int32_t TraceImpl::SetTraceFileImpl(const char* file_name_utf8, > const bool add_file_counter) { > +#if !defined(WEBRTC_LAZY_TRACE_ALLOC) > + if (file_name_utf8) { > + // Lazy-allocate trace buffers to save memory. Logically this seems to work, except WEBRTC_LAZY_TRACE_ALLOC seems to do the opposite of what it's name implies? - I would rename it or invert the boolean logic.
Attachment #824690 - Flags: review?(jib) → review+
s/reverting/inverting/
> This is imported code, so adding reporters will be > a pain. Not impossible, but we'd want to do so in a generic, > templated/macroed way that would be upstreamable - I don't want to carry > tons of mods all over the code causing merge failures. Ugh. A lot of libraries let you provide custom definitions of malloc and free, which is enough to get DMD to work. (We do that with ICU and freetype.) Would that be possible with this code?
(In reply to Jan-Ivar Bruaroey [:jib] from comment #15) > Logically this seems to work, except WEBRTC_LAZY_TRACE_ALLOC seems to do the > opposite of what it's name implies? - I would rename it or invert the > boolean logic. Yeah - I changed the symbol from WEBRTC_TRACE_KEEP_WHEN_OFF to WEBRTC_LAZY_TRACE_ALLOC because the old symbol was a confusing name - but didn't invert the logic to match. Thanks
(In reply to Nicholas Nethercote [:njn] from comment #17) > > This is imported code, so adding reporters will be > > a pain. Not impossible, but we'd want to do so in a generic, > > templated/macroed way that would be upstreamable - I don't want to carry > > tons of mods all over the code causing merge failures. > > Ugh. A lot of libraries let you provide custom definitions of malloc and > free, which is enough to get DMD to work. (We do that with ICU and > freetype.) Would that be possible with this code? Perhaps - but many of these are new/delete (like this one), not malloc/free, and webrtc wasn't designed for allocator replacement. This makes replacing a little more tricky. Pointers to how it was done for those (ICU/freetype/others) would help; build/freetype.mk didn't give any clues. Perhaps file a bug on adding reporters with some pointers/examples.
> Pointers to how it was done for those (ICU/freetype/others) would help The library has to provide an API that lets you register malloc/free (and possibly realloc) functions. The library then has to ensure it only calls those provided ones, not the system ones. The library can define new/delete on top of the provided ones. The Freetype example is in gfx/thebes/gfxAndroidPlatform.cpp. The |FreetypeReporter| class implements our custom malloc/realloc/free functions (which hook into DMD), and they are passed to Freetype via the |FT_MemoryRec_| argument of the FT_New_Library() call. The ICU example is in js/src/jsapi.cpp. We pass custom malloc/realloc/free functions to u_setMemoryFunctions(). SQLite also allows a custom allocator, though it's a bit more complicated, requiring additional methods for other malloc-related operations. See the ::sqlite3_config() call in storage/src/mozStorageService.cpp.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f2757e7d1a08 Uplift to 1.2 should be considered, though note it's only a memory saving
Target Milestone: --- → mozilla28
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/f477a4fe4c32 - one or the other, this or bug 933384, leaked in mochitest-3 and crashtest.
(In reply to Randell Jesup [:jesup] from comment #21) > https://hg.mozilla.org/integration/mozilla-inbound/rev/f2757e7d1a08 > > Uplift to 1.2 should be considered, though note it's only a memory saving Probably too late for 1.2 uplift at this point - we're only taking blockers right now. While I think this is important to fix, I don't think this bug would block the release.
Checkin-needed because I have to get on a plane. The leak was bug 933384
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: