Last Comment Bug 813783 - Don't create a global GL context at all #ifdef ANDROID, including B2G
: Don't create a global GL context at all #ifdef ANDROID, including B2G
Status: RESOLVED FIXED
[MemShrink][slim:750KiB]
:
Product: Firefox OS
Classification: Client Software
Component: General (show other bugs)
: unspecified
: ARM Gonk (Firefox OS)
: -- normal (vote)
: ---
Assigned To: Benoit Jacob [:bjacob] (mostly away)
:
Mentors:
Depends on:
Blocks: slim-fast B2GDarkMatter 814159
  Show dependency treegraph
 
Reported: 2012-11-20 15:55 PST by Nicholas Nethercote [:njn]
Modified: 2012-11-21 21:23 PST (History)
16 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
+
fixed
fixed
fixed


Attachments
don't create a global context at all #ifdef ANDROID, including B2G (559 bytes, patch)
2012-11-20 16:46 PST, Benoit Jacob [:bjacob] (mostly away)
cjones.bugs: review+
Details | Diff | Splinter Review

Description Nicholas Nethercote [:njn] 2012-11-20 15:55:19 PST
Attachment 683738 [details] (from bug 801580) shows multiple stacks like this:

  libc_malloc_debug_leak.so+0xdba8 (0x401e5ba8)  .
  libc.so+0x16122 (0x400f7122)  .
  libgsl.so+0x4134 (0x42666134)  .
  libGLESv2_adreno200.so+0x3a98c (0x4488b98c)  .
  libGLESv2_adreno200.so+0x38b12
  libGLESv2_adreno200.so+0x225fa
  libGLESv2_adreno200.so+0x24944
  libEGL_adreno200.so+0x17a50 (0x426bda50)  .
  libEGL_adreno200.so+0x117ac
  libEGL_adreno200.so+0x5560
  libEGL.so+0xc68c (0x418e968c)  .
  libxul.so+0xa7149a (0x40d7c49a) mozilla::gl::GLLibraryEGL::fCreateContext(void*, void*, void*, int const*) /home/jlebar/code/moz/ff-git2/src/gfx/gl/GLLibraryEGL.h:194

There's some variation, but they all go through fCreateContext().  The stacks have the following allocation sizes:

  667,648*2 + 57,344*2 + 28,672*2 + 14,336*2

For a total of 1,536,000 bytes.

It would be nice to have a reporter for this memory and/or shrink it.  I don't have much idea on how to do either of these, though.
Comment 1 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 16:06:18 PST
this is eglCreateContext. it's code in the system EGL library, creating the EGL context. We don't control how much memory that takes. What we can do is have a counter for how many EGL contexts we have, so that we could notice if we were creating more that necessary.

Do you know how many EGL contexts we currently have? Should probably be 1.
Comment 2 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 16:21:15 PST
Bingo, we do create 2 EGL contexts: the one we actually use, and the global one --- the dummy GL context used for sharing textures.

Is there even any situation in which we actually rely on this share group? For example we already disable that on Android, so for sure we don't rely on that on Android at all. We should be able to disable that on B2G as well.

Digging a little bit, we had tried already to disable the share group on B2G in bug 760675 and backed out from that in bug 761894 as it broke WebGL compositing. Let's check if WebGL compositing still relies on share groups on B2G... CC'ing a few people who would know.
Comment 3 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-20 16:34:53 PST
Nice find.  As bjacob says, we can't really do anything about the allocations made for GL contexts.  System libraries assume that there are very few of them.

However, also as bjacob says, we should only have one of these in the b2g process and 0 in each content process.  (Plus one per WebGL context that's created dynamically.)

So we ought to be able to 0.75MB * (1 + #apps) with bug 761894 reverted.  Nice!

It would make me much happier if this code were more robust than #ifdef ANDROID, but that's a bug for another day.
Comment 4 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 16:46:05 PST
Created attachment 683805 [details] [diff] [review]
don't create a global context at all #ifdef ANDROID, including B2G

This basically undoes the patch in bug 761894. I checked, WebGL compositing still works fine. Tested on a pair of online demos. The local CrystalSkull seemed frozen at the first frame, but that must be a different issue: if WebGL compositing was still relying on a global context, then taking the global context out would not even allow to see the first frame.

Also checked in GDB that with this patch we are only creating 1 GL context anymore.
Comment 5 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 16:48:05 PST
(In reply to Chris Jones [:cjones] [:warhammer] from comment #3)
> It would make me much happier if this code were more robust than #ifdef
> ANDROID, but that's a bug for another day.

We could switch to listing the platforms where we do want a global context (Mac and Linux) rather than listing the platforms where we don't want one (mobile and Windows). That should be safe because if you need a global context and fail to create one, at least the effects are easily visible.
Comment 6 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-20 16:52:42 PST
Comment on attachment 683805 [details] [diff] [review]
don't create a global context at all #ifdef ANDROID, including B2G

Actually, on second look this would only save context allocations in content processes only if they ever create a WebGL context.  But that's still a win.
Comment 7 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 17:28:48 PST
I definitely checked in GDB attached on main process that this results in us creating only one EGL context on startup instead of two. What did you mean in comment 6?
Comment 8 Nicholas Nethercote [:njn] 2012-11-20 17:33:52 PST
As for the remaining context -- is it possible to write a memory reporter for it?
Comment 9 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 17:43:26 PST
(In reply to Benoit Jacob [:bjacob] from comment #7)
> I definitely checked in GDB attached on main process that this results in us
> creating only one EGL context on startup instead of two. What did you mean
> in comment 6?

Also checked: same story on content processes. In short: 
 - without patch, we create 2 GL contexts on the main process at startup and 2 GL contexts on browser process at first WebGL context creation
 - with patch, it's 1 and 1 instead.
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-20 17:44:37 PST
(In reply to Benoit Jacob [:bjacob] from comment #7)
> I definitely checked in GDB attached on main process that this results in us
> creating only one EGL context on startup instead of two. What did you mean
> in comment 6?

I mistakenly thought we would be creating a share context around startup, in each content process.  But AFAICT we would only do that when a WebGL context was first created.

That's perfectly fine, it just means we don't win back 0.75MB * (1 + #apps) memory, but instead 0.75MB * (1 + #apps-that-use-WebGL).
Comment 11 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 17:54:00 PST
(In reply to Nicholas Nethercote [:njn] from comment #8)
> As for the remaining context -- is it possible to write a memory reporter
> for it?

We can easily report the number of live GL contexts, but that would not have allowed us to discover that the superfluous GL context was eating 750k.

We have plans (Jeff Gilbert has code, don't remember the bug #) to add fine about:memory reporting of GL objects such as textures, like we have for WebGL already. That will be very nice... but it still wouldn't have discovered the present bug.

The 750k allocated by eglCreateContext here are entirely internal things done inside the GL library. It's not anything like allocating a graphics surface --- that is done in separate calls. It's probably more along the lines of initializing the GL state machine. So we can't discover issues like "OMG eglCreateContext allocates 750k" without lower-level tools, going at least as low-level as the level of the memory allocator, like AFAIU the tool you were using here.
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 17:55:17 PST
(In reply to Chris Jones [:cjones] [:warhammer] from comment #10)
> (In reply to Benoit Jacob [:bjacob] from comment #7)
> > I definitely checked in GDB attached on main process that this results in us
> > creating only one EGL context on startup instead of two. What did you mean
> > in comment 6?
> 
> I mistakenly thought we would be creating a share context around startup, in
> each content process.  But AFAICT we would only do that when a WebGL context
> was first created.
> 
> That's perfectly fine, it just means we don't win back 0.75MB * (1 + #apps)
> memory, but instead 0.75MB * (1 + #apps-that-use-WebGL).

Ah, OK. Yeah, that would have been rather fantastic. Still, even without any WebGL at all, saving 750k is already nice :)
Comment 13 Nicholas Nethercote [:njn] 2012-11-20 17:56:34 PST
> The 750k allocated by eglCreateContext here are entirely internal things
> done inside the GL library.

Is wrapping the allocator used by the library a possibility?  We've done that for other libraries (e.g. hunspell).
Comment 14 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 18:04:10 PST
(In reply to Benoit Jacob [:bjacob] from comment #11)
> (In reply to Nicholas Nethercote [:njn] from comment #8)
> > As for the remaining context -- is it possible to write a memory reporter
> > for it?
>
> [...snip...]
> The 750k allocated by eglCreateContext here are entirely internal things
> done inside the GL library. It's not anything like allocating a graphics
> surface --- that is done in separate calls. It's probably more along the
> lines of initializing the GL state machine. So we can't discover issues like
> "OMG eglCreateContext allocates 750k" without lower-level tools, going at
> least as low-level as the level of the memory allocator, like AFAIU the tool
> you were using here.

So here's one thing that we could do, probably the only thing that we can do, to get these 750k reported on about:memory so that they're not Dark Matter anymore.

Ship --enable-replace-malloc enabled (bug 804303; that's very little overhead). As a first approximation (not tihnking about optimization for now), imagine that we ship with a default replace_malloc tool that increments/decrements a thread-local counter of bytes on malloc/free. Then we could instrument the GL context creation code by putting code like this around the eglCreateContext call:

   int64_t bytes_allocated_before = get_total_bytes_allocated();
   eglCreateContext(...);
   int64_t bytes_allocated_after = get_total_bytes_allocated();

   report_metric_on_about_memory(bytes_allocated_after - bytes_allocated_before);

Now, to avoid the TLS overhead on every malloc/free call, we could have a global flag allowing to enable/disable this accounting, and only enable it when we're doing a measurement (creating a GL context is slow anyways).
Comment 15 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 18:05:03 PST
(In reply to Nicholas Nethercote [:njn] from comment #13)
> > The 750k allocated by eglCreateContext here are entirely internal things
> > done inside the GL library.
> 
> Is wrapping the allocator used by the library a possibility?  We've done
> that for other libraries (e.g. hunspell).

Hehe, our comments crossed each other, we're talking about the same approach in comments 13 and 14.
Comment 16 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 18:11:39 PST
(In reply to Benoit Jacob [:bjacob] from comment #4)
> The local CrystalSkull
> seemed frozen at the first frame, but that must be a different issue:

I've retried now and it works just fine with and without the patch. Sorry for the noise.
Comment 17 Benoit Jacob [:bjacob] (mostly away) 2012-11-20 18:48:36 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/a2c7428d4abf

Nick do you want to keep this bug open or file a follow-up bug?

Also, if in comment 13 you meant "does the EGL library allow us to set up a custom malloc/free hooks just for it" the answer is no. The only thing we can do is completely replace our memory allocator as in bug 804303.

Chris I don't know what to put in target milestone. Do I need to backport this to some repo now that C1 milestone is behind us?
Comment 18 Nicholas Nethercote [:njn] 2012-11-20 19:52:20 PST
> Nick do you want to keep this bug open or file a follow-up bug?

I filed bug 813843 for the memory reporting.
Comment 19 Justin Lebar (not reading bugmail) 2012-11-20 21:56:27 PST
> Chris I don't know what to put in target milestone. Do I need to backport this to some repo now that 
> C1 milestone is behind us?

I have no idea what to put in the target milestone, since we seem to be overloading that to mean both when the patch was landed (e.g. "mozilla 20") and when we /want/ the patch to land "C2".

But you need to land this on Aurora and Beta please.  a=me to do so.
Comment 20 Justin Lebar (not reading bugmail) 2012-11-20 21:58:02 PST
I hope this summary accurately reflects what we did here; please change it if I got it wrong!
Comment 21 Mike Hommey [:glandium] 2012-11-20 23:02:13 PST
(In reply to Nicholas Nethercote [:njn] from comment #13)
> > The 750k allocated by eglCreateContext here are entirely internal things
> > done inside the GL library.
> 
> Is wrapping the allocator used by the library a possibility?  We've done
> that for other libraries (e.g. hunspell).

If we're building the library that does the malloc (that is, if the malloc is not done by one of the binary blobs), we could, easily.
It's also possibly to hack the binary blobs to change the undefined symbols to resolve to something else than malloc/free, etc. although it would only work if making the names shorter, not longer (there's no room in the dynamic symbol string table for longer symbols). That's actually pretty easy to do, you just need to do some string replacement in the .dynstr section.
Comment 22 Mike Hommey [:glandium] 2012-11-20 23:03:37 PST
(In reply to Benoit Jacob [:bjacob] from comment #14)
> Ship --enable-replace-malloc enabled (bug 804303; that's very little
> overhead). As a first approximation (not tihnking about optimization for
> now), imagine that we ship with a default replace_malloc tool that
> increments/decrements a thread-local counter of bytes on malloc/free. Then
> we could instrument the GL context creation code by putting code like this
> around the eglCreateContext call:

I don't know about bionic, but in glibc, TLS uses malloc to allocate its storage.
Comment 23 Justin Lebar (not reading bugmail) 2012-11-20 23:19:25 PST
> I don't know about bionic, but in glibc, TLS uses malloc to allocate its storage.

FWIW, njn worked around this in his newdmd patches, using pthreads.  Perhaps pthread_key_create mallocs, but then we don't get mallocs on each thread creation?

Attachment 683470 [details] [diff].
Comment 24 Ed Morley [:emorley] 2012-11-21 07:35:13 PST
https://hg.mozilla.org/mozilla-central/rev/a2c7428d4abf

Note You need to log in before you can comment on or make changes to this bug.