Lots of heap-unclassified on Score Rush game, caused by OpenGL context creation and initial glClear() in WebGL context creation

RESOLVED DUPLICATE of bug 893865

Status

()

defect
RESOLVED DUPLICATE of bug 893865
7 years ago
6 years ago

People

(Reporter: azakai, Unassigned)

Tracking

(Blocks 2 bugs)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P2][games:p3], )

Attachments

(2 attachments)

Reporter

Description

7 years ago
https://turbulenz.com/#!games/scorerush

is a 2D shootemup game. The good news is it takes 800MB on Chrome and 450MB on Firefox. The bad news is that even 450MB sounds like way, way too much for a 2D shooter. about:memory puts the vast majority of that memory, 360MB, in heap-unclassified, so some important class of memory usage might be currently missed.
Reporter

Updated

7 years ago
Blocks: gecko-games
Whiteboard: [MemShrink]
Nathan has been doing DMD work recently.
Posted file edited DMD output
I did a run with the integrated DMD (bug 717853) on my Linux64 machine.  The heap-unclassified is mostly WebGL stuff with a smattering of Ogg/Vorbis stuff.  I'm attaching a file showing the highlights of the output.  It's a collection of stack trace trees, where common stack trace prefixes are commoned up.
Posted file full DMD output
Here's the full DMD output, just in case that's helpful.
Depends on: NewDMD, 677653
Reporter

Comment 4

7 years ago
Thanks njn. So it looks like much of that is allocations from inside the GL driver, which I assume we would have no way of tracking? Or maybe there is some way to ask GL drivers how much memory they have allocated?

Side note, would be nice to be able to say something like "heap-libraries" instead of "heap-unclassified" for stuff like this, to differentiate stuff the browser directly malloced itself vs stuff that the libraries it called did. I wonder if that's possible somehow.
(In reply to Alon Zakai (:azakai) from comment #4)
> Thanks njn. So it looks like much of that is allocations from inside the GL
> driver, which I assume we would have no way of tracking? Or maybe there is
> some way to ask GL drivers how much memory they have allocated?

I don't know about the latter approach.  With Hunspell we manage to make it use a wrapped version of our own allocators that count allocations and deallocations;  see extensions/spellcheck/hunspell/src/hunspell_alloc_hooks.h.  But we have a copy of hunspell in our tree... doing the same for GL drivers will probably be harder.

(There's a similar situation with Cairo -- it's often responsible for a non-trivial fraction of heap-unclassified.)
 
> Side note, would be nice to be able to say something like "heap-libraries"
> instead of "heap-unclassified" for stuff like this, to differentiate stuff
> the browser directly malloced itself vs stuff that the libraries it called
> did. I wonder if that's possible somehow.

It would be nice, but I don't know how to do it :(
> It would be nice, but I don't know how to do it :(

I think we'd need to mark our allocations as "special" and treat whatever isn't special as heap-unclassified-libraries.

If all of "our" calls to |malloc| went through one wrapper, we could record the allocated size and subtract that off from heap-allocated to get heap-unclassified-library.  We wouldn't even need atomic ops -- each thread could maintain its own running malloc/free total (which might be negative if a thread free'd memory it didn't malloc), and then only when we opened about:memory would we sum all the threads' counts.

The main cost would be an extra malloc_usable_size call in front of every free(), unless we integrated this into jemalloc (which usually [always?] figures out how big an alloc is before freeing it).  Note that on mac, we *already* have an isalloc_validate (malloc_usable_size) call in front of a lot of (maybe all) free calls, due to weirdness in the zone wrapper.

If we ever malloc() memory only to pass it to a library, which might free it at a later date, we'd have to handle those places separately.

Anyway, this sounds possible.  I don't know whether it's worth the effort.
(In reply to Justin Lebar [:jlebar] from comment #6)
> I think we'd need to mark our allocations as "special" and treat whatever
> isn't special as heap-unclassified-libraries.
> 
> If all of "our" calls to |malloc| went through one wrapper, we could record
> the allocated size and subtract that off from heap-allocated to get
> heap-unclassified-library.  We wouldn't even need atomic ops -- each thread
> could maintain its own running malloc/free total (which might be negative if
> a thread free'd memory it didn't malloc), and then only when we opened
> about:memory would we sum all the threads' counts.

Speaking with great ignorance of valgrind, can't we (mostly) figure this out at runtime?  valgrind is already catching calls to malloc; we ought to be able to unwind the stack (just a couple of frames) to check whether malloc's caller lives in places we care about.  Then that state becomes just another bit valgrind tracks for each memory allocation.  (I realize that unwinding is...uneven across various platforms, but if we just need to get out of valgrind's code, then maybe valgrind can be tweaked with frame pointers or whatnot.  And/or --enable-dmd can force frame pointers as well.)
(In reply to Nathan Froyd (:froydnj) from comment #7)
> (In reply to Justin Lebar [:jlebar] from comment #6)
> > I think we'd need to mark our allocations as "special" and treat whatever
> > isn't special as heap-unclassified-libraries.
> > 
> > If all of "our" calls to |malloc| went through one wrapper, we could record
> > the allocated size and subtract that off from heap-allocated to get
> > heap-unclassified-library.  We wouldn't even need atomic ops -- each thread
> > could maintain its own running malloc/free total (which might be negative if
> > a thread free'd memory it didn't malloc), and then only when we opened
> > about:memory would we sum all the threads' counts.
> 
> Speaking with great ignorance of valgrind, can't we (mostly) figure this out
> at runtime?

Speaking also with great ignorance of the problem we want to solve: we want this to work *without* valgrind.  Doh!
I can't seem to play the game, all I get is this exception:

[09:45:26.879] NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMHTMLAudioElement.mozSetup] @ https://d36kx37nk0bz9z.cloudfront.net/T1405EYqBXKfAADB/7/rdBhndR5P2XO4nB5VxFYoR4xMidoINy3vwyCrmNbgQe0388.js:231

Also, the game gives me an annoying warning claiming that Firefox sucks and I should switch to Chrome.

Talking about heap-unclassified, I have been worried that since bug 676071 landed, we indeed can have lots of heap-unclassified in WebGL apps that use large uniform arrays. I didn't have time to write a memory reporter for this, and can't test myself, but you could test this theory by editing WebGLContext::CompileShader and printf'ing num_uniforms and num_attributes, in WebGLContextGL.cpp.

Part of why I've not prioritized this is that we're considering a completely different approach in ANGLE that will get us rid of that memory usage, see
  http://code.google.com/p/angleproject/issues/detail?id=315#c2
Blocks: DarkMatter
Summary: Lots of heap-unclassified on Score Rush game → Lots of heap-unclassified on Score Rush game, mostly due to missing WebGL memory reporters
Hey, comment 9 was just a theory --- I don't know if the heap-unclassified here is actually that. Comment 9 proposes a way to check that.
bjacob, comment 2 and the attached DMD output show that it's mostly WebGL.
Ah, I hadn't seen that. So, forget about my theory in comment 9: the DMD output in comment 2 shows allocations caused by WebGL context creation and clearing, while comment 9 was theorizing allocations that would be caused by a WebGL context in unrelated ways.

So I really don't think that additional WebGL memory reporters would help here. Mesa is allocating memory while creating a GL context, that's expected. Mesa is allocating memory while executing a glClear() call, that's a bit more unexpected but I guess it could be the result of lazy allocation, as this glClear() is done during WebGL context creation.

Can you reproduce this outside of Mesa?
Summary: Lots of heap-unclassified on Score Rush game, mostly due to missing WebGL memory reporters → Lots of heap-unclassified on Score Rush game, caused by WebGL context creation
Summary: Lots of heap-unclassified on Score Rush game, caused by WebGL context creation → Lots of heap-unclassified on Score Rush game, caused by OpenGL context creation and initial glClear() in WebGL context creation
azakai, did you see this on Linux?
Whiteboard: [MemShrink] → [MemShrink:P2]
Reporter

Comment 14

7 years ago
Yeah, Linux with NVidia binary drivers (not Mesa).
Whiteboard: [MemShrink:P2] → [MemShrink:P2][games:p3]

Comment 15

6 years ago
This is Linux only?
Testing on Win7 I got only 12% of heap-unclassified while playing.
I'm going to dup this to bug 893865, which is about the OpenGL heap-unclassified issue.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 893865
You need to log in before you can comment on or make changes to this bug.