Closed
Bug 932215
Opened 11 years ago
Closed 11 years ago
Memory usage of gUM
Categories
(Core :: WebRTC: Audio/Video, defect)
Tracking
()
RESOLVED
FIXED
mozilla28
People
(Reporter: slee, Assigned: jesup)
References
Details
(Keywords: perf, Whiteboard: [WebRTC][MemShrink:P2])
Attachments
(3 files)
This bug is to analyze the memory usage of gUM. I am using DMD to estimate the memory usage of gUM. By using DMD on B2G, you should
1. export MOZ_DMD=1 and export B2G_NOOPT=1
2. go to B2G/tools and run the command "./get_about_memory.py --product unagi --gecko-objdir ../../mozilla-central-2/obj-unagi-memory --no-auto-open --no-gc-cc-log --remove-cache --toolchain-prefix arm-linux-androideabi- --toolchain-dir ./prebuilt/linux-x86/toolchain/arm-linux-androideabi-4.4.x/bin"
3. The result will be store in the same folder with name "about-memory-#"
Here is how I verify the memory usage.
1. run browser app, connect to [1] and estimate the memory by the above command
2. gUM for audio or video then profile again.
3. compare the memory usage of Browser app by using firefox desktop with url "about:memory"
Here is the result
* video
** the memory diff is not much, about 0.1MB
* audio
** the memory diff is about 5.97MB, much higher than video
I unzipped the corresponding file, ex, dmd-warning-7192.txt.gz, and found that [2] allocates much memory, 2 * 8000 * 256, on B2G.
[1] http://mozilla.github.io/webrtc-landing/gum_test.html
[2] http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/system_wrappers/source/trace_impl.cc#l92
Reporter | ||
Comment 1•11 years ago
|
||
memory usage of gUM with audio only
Reporter | ||
Comment 2•11 years ago
|
||
memory usage of gUM with video only
Assignee | ||
Comment 4•11 years ago
|
||
The relevant section is dmd-warning-7192.txt
Unreported: ~1,001 blocks in stack trace record 1 of 379
~4,097,093 bytes (~4,097,093 requested / ~0 slop)
27.06% of the heap (27.06% cumulative); 51.01% of unreported (51.01% cumulative)
Allocated at
51% (4MB) of the unreported is webrtc_trace bufferspace (they buffer up log messages in a circular buffer, more or less). Was webrtc_trace on in NSPR_LOG_MODULES? They may well allocate the trace buffers always, however (in fact it looks that way). However...
The size of this should be <224KB (it's 2240xx on a 64-bit-pointer machine). 4MB appears to be in use. I tried on desktop, and only a single TracePosix is ever created, even going audio gum tests multiple times. Perhaps there's something broken in B2G's compiling of the "single static instance" stuff?
slee, can you verify how many times TracePosix::TracePosix() is called? (trace_posix.cc) Also, can you verify the size of the object created? It should be <200KB - break in TraceImpl::CreateInstance() and then step down into new and look at the sz param. or "p sizeof(*this) in the TracePosix() func
Flags: needinfo?(slee)
Assignee | ||
Comment 5•11 years ago
|
||
400K is 50 AudioFrames in a MemoryPool, which have a kMaxDataSizeSamples == 32KHz * 60ms * 2 channels = 3840 samples (this will likely grow to 48KHz). 60ms may be overkill.... we pretty much do 20ms all the time, but we may need to allow receiving it. 50 frames is quite a bit too.
Assignee | ||
Comment 6•11 years ago
|
||
NetEq is another 200K alloc for a packetbuffer block
And I see a CreateInstance() line 64 ("new TracePosix();") for 164K, which makes more sense, but with a different stack. The 4MB is 1001 objects (of 4093 bytes!) from TracePosix::TracePosix() (line 40), called from that line 64(!). This makes absolutely no sense (should be 1 obj), since we should call CreateInstance()/TracePosix() only once, and on desktop I have proof of that. slee, can you check that (both line 64, and TracePosix():40)
Assignee | ||
Comment 7•11 years ago
|
||
Bunch of 64K objects which are artifacts of DMD (gzwrite buffers).
64K object for BufferedAudioStream (mBuffer.SetCapacity(1 second)).
Bunch of gfx/font allocations, 128K hash table, etc
24K for OpenSLES
Down in the noise from here. Looks like the only unusual one is the 4MB instead of 164K for TracePosix()/CreateCriticalSec(?)
Updated•11 years ago
|
Whiteboard: [WebRTC][MemShrink] → [WebRTC][MemShrink:P2]
Reporter | ||
Comment 8•11 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #4)
> The relevant section is dmd-warning-7192.txt
> 51% (4MB) of the unreported is webrtc_trace bufferspace (they buffer up log
> messages in a circular buffer, more or less). Was webrtc_trace on in
> NSPR_LOG_MODULES? They may well allocate the trace buffers always, however
> (in fact it looks that way). However...
The 4BM is mainly from the buffer in TraceImpl which allocates 8000*2 char array and each size is 256, [1]. From [2], the number of array is 2000*2 on iOS. If we apply the same settings as iOS, we can reduce the memory to about 1MB.
>
> The size of this should be <224KB (it's 2240xx on a 64-bit-pointer machine).
> 4MB appears to be in use. I tried on desktop, and only a single TracePosix
> is ever created, even going audio gum tests multiple times. Perhaps there's
> something broken in B2G's compiling of the "single static instance" stuff?
>
> slee, can you verify how many times TracePosix::TracePosix() is called?
It calls only once. I have no ideas about 1001 blocks in "~1,001 blocks in stack trace record 1 of 379" means. :(
> (trace_posix.cc) Also, can you verify the size of the object created? It
> should be <200KB - break in TraceImpl::CreateInstance() and then step down
> into new and look at the sz param. or "p sizeof(*this) in the TracePosix()
> func
The size is 160056.
[1] http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/system_wrappers/source/trace_impl.cc#l92
[2] http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/system_wrappers/source/trace_impl.h#l27
Flags: needinfo?(slee)
Comment 9•11 years ago
|
||
> > slee, can you verify how many times TracePosix::TracePosix() is called?
> It calls only once. I have no ideas about 1001 blocks in "~1,001 blocks in
> stack trace record 1 of 379" means. :(
For the allocation point described by the given stack trace (which comment 4 omits), approximately 1,001 blocks were allocated with malloc/new. Their cumulative size was approximatly 4MB.
Reporter | ||
Comment 10•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #9)
> For the allocation point described by the given stack trace (which comment 4
> omits), approximately 1,001 blocks were allocated with malloc/new. Their
> cumulative size was approximatly 4MB.
Hi njn,
Thanks for your reply. I have some other questions.
For general cases, should 1 block mean 4KB?
jesup mentioned in comment 6 that the 4MB should be 1 block. Here is the corresponding code of allocating memory. I replace the macros with real values.
for (int m = 0; m < 2; ++m) {
for (int n = 0; n < 8000; ++n) {
message_queue_[m][n] = new char[256];
}
}
Could it explain that the 4MB memory are in 1001 blocks since the above buffers use 1000 blocks and for the remaining memory occupy 1 block?
Assignee | ||
Comment 11•11 years ago
|
||
Ok, thanks (missed one line in the traceback, which explains the allocations). not sure why it's reported as 1001 4093ish byte allocations instead of 16000 256-byte allocs, but that's not important.
So, two issues here: the webrtc.org code always starts the TraceImpl. We could make it late-allocate the bufferspace when the trace file is set; right now it accumulates messages even when it's not logging them anywhere. That would kill the 4MB allocation unless you turn on logging. You still have the thread and the 160KB allocation, and some overhead formatting the log messages distributed around. (We can profile the impact of that.) We *do* grab a log when adding a log message to the list; if we could rework it as a lockless circular buffer at some point it would be better.
Turning it fully off is something we can consider for release builds, though if the overhead is low it's not worth the risk (and makes it hard to debug problems later or in the field).
Assignee | ||
Comment 12•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #824690 -
Flags: review?(jib)
Comment 13•11 years ago
|
||
> jesup mentioned in comment 6 that the 4MB should be 1 block. Here is the
> corresponding code of allocating memory. I replace the macros with real
> values.
> for (int m = 0; m < 2; ++m) {
> for (int n = 0; n < 8000; ++n) {
> message_queue_[m][n] = new char[256];
> }
> }
>
> Could it explain that the 4MB memory are in 1001 blocks since the above
> buffers use 1000 blocks and for the remaining memory occupy 1 block?
By default, DMD approximates things for speed. Normally it doesn't matter much, but lots of small allocations in a row like this is a pathological case for the approximation technique.
If you want a run without any approximations, add "--sample-below=1" to the DMD environment variable. https://wiki.mozilla.org/Performance/MemShrink/DMD#--sample-below.3D.3C1..n.3E has the full instructions.
The output of such a run won't have any '~' chars (which indicate approximations).
BTW, why does the code do 16,000 x 256 byte allocations in a row? That is... surprising. Allocations are relatively slow, so a single 4MB allocation should be *much* faster.
It'd also be great if this memory had a memory reporter so it showed up in about:memory properly, not under heap-unclassified.
Assignee | ||
Comment 14•11 years ago
|
||
Thanks, that explains the DMD issue.
I think it was something that happens once and wasn't on anyone's radar. I agree, given there's no further allocation/reallocation, one big alloc would be better. I may add that to the patch I propose to upstream.
As for memory reporters: this is imported code, so adding reporters will be a pain. Not impossible, but we'd want to do so in a generic, templated/macroed way that would be upstreamable - I don't want to carry tons of mods all over the code causing merge failures.
Comment 15•11 years ago
|
||
Comment on attachment 824690 [details] [diff] [review]
Lazily allocate log buffers for webrtc (4MB saving)
Review of attachment 824690 [details] [diff] [review]:
-----------------------------------------------------------------
Seems logically sound, if a little confusing. See comments. r+ but recommend reverting logic or clarifying.
::: media/webrtc/trunk/webrtc/system_wrappers/source/system_wrappers.gyp
@@ +131,5 @@
> 'trace_win.cc',
> 'trace_win.h',
> ],
> }],
> + ['enable_lazy_trace_alloc==0', {
Don't you mean 'enable_lazy_trace_alloc==1' ?
::: media/webrtc/trunk/webrtc/system_wrappers/source/trace_impl.cc
@@ +92,2 @@
> message_queue_[m][n] = new
> char[WEBRTC_TRACE_MAX_MESSAGE_SIZE];
See below.
@@ +345,5 @@
> int32_t TraceImpl::SetTraceFileImpl(const char* file_name_utf8,
> const bool add_file_counter) {
> +#if !defined(WEBRTC_LAZY_TRACE_ALLOC)
> + if (file_name_utf8) {
> + // Lazy-allocate trace buffers to save memory.
Logically this seems to work, except WEBRTC_LAZY_TRACE_ALLOC seems to do the opposite of what it's name implies? - I would rename it or invert the boolean logic.
Attachment #824690 -
Flags: review?(jib) → review+
Comment 16•11 years ago
|
||
s/reverting/inverting/
Comment 17•11 years ago
|
||
> This is imported code, so adding reporters will be
> a pain. Not impossible, but we'd want to do so in a generic,
> templated/macroed way that would be upstreamable - I don't want to carry
> tons of mods all over the code causing merge failures.
Ugh. A lot of libraries let you provide custom definitions of malloc and free, which is enough to get DMD to work. (We do that with ICU and freetype.) Would that be possible with this code?
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Jan-Ivar Bruaroey [:jib] from comment #15)
> Logically this seems to work, except WEBRTC_LAZY_TRACE_ALLOC seems to do the
> opposite of what it's name implies? - I would rename it or invert the
> boolean logic.
Yeah - I changed the symbol from WEBRTC_TRACE_KEEP_WHEN_OFF to WEBRTC_LAZY_TRACE_ALLOC because the old symbol was a confusing name - but didn't invert the logic to match. Thanks
Assignee | ||
Comment 19•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #17)
> > This is imported code, so adding reporters will be
> > a pain. Not impossible, but we'd want to do so in a generic,
> > templated/macroed way that would be upstreamable - I don't want to carry
> > tons of mods all over the code causing merge failures.
>
> Ugh. A lot of libraries let you provide custom definitions of malloc and
> free, which is enough to get DMD to work. (We do that with ICU and
> freetype.) Would that be possible with this code?
Perhaps - but many of these are new/delete (like this one), not malloc/free, and webrtc wasn't designed for allocator replacement. This makes replacing a little more tricky. Pointers to how it was done for those (ICU/freetype/others) would help; build/freetype.mk didn't give any clues. Perhaps file a bug on adding reporters with some pointers/examples.
Comment 20•11 years ago
|
||
> Pointers to how it was done for those (ICU/freetype/others) would help
The library has to provide an API that lets you register malloc/free (and possibly realloc) functions. The library then has to ensure it only calls those provided ones, not the system ones. The library can define new/delete on top of the provided ones.
The Freetype example is in gfx/thebes/gfxAndroidPlatform.cpp. The |FreetypeReporter| class implements our custom malloc/realloc/free functions (which hook into DMD), and they are passed to Freetype via the |FT_MemoryRec_| argument of the FT_New_Library() call.
The ICU example is in js/src/jsapi.cpp. We pass custom malloc/realloc/free functions to u_setMemoryFunctions().
SQLite also allows a custom allocator, though it's a bit more complicated, requiring additional methods for other malloc-related operations. See the ::sqlite3_config() call in storage/src/mozStorageService.cpp.
Assignee | ||
Comment 21•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/f2757e7d1a08
Uplift to 1.2 should be considered, though note it's only a memory saving
Target Milestone: --- → mozilla28
Comment 22•11 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/f477a4fe4c32 - one or the other, this or bug 933384, leaked in mochitest-3 and crashtest.
Comment 23•11 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #21)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/f2757e7d1a08
>
> Uplift to 1.2 should be considered, though note it's only a memory saving
Probably too late for 1.2 uplift at this point - we're only taking blockers right now. While I think this is important to fix, I don't think this bug would block the release.
Assignee | ||
Comment 24•11 years ago
|
||
Checkin-needed because I have to get on a plane. The leak was bug 933384
Keywords: checkin-needed
Comment 25•11 years ago
|
||
Assignee: nobody → rjesup
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•