Optimize custom dynamic loader to use less memory

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
9 years ago
5 years ago

People

(Reporter: mwu, Assigned: mwu)

Tracking

Trunk
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

(Assignee)

Description

9 years ago
The transfer of the ashmem fds is pretty nasty. Oh well.
Attachment #486249 - Flags: review?(tglek)
Attachment #486249 - Flags: review?(jones.chris.g)
(Assignee)

Updated

9 years ago
tracking-fennec: --- → ?
Comment on attachment 486249 [details] [diff] [review]
Use ashmemory to reduce copies of library code

r- for
 -- duplicating ashmem goop.  Turns out we should have been using
    <linux/ashmem.h> all along; my fault.  I'm OK with having separate
    creation functions (your create_ashmem()).
 -- passing cache info through the env.  Let's pass it on the command
    line instead, as the last argument, then pluck out that arg in
    main().  I think we'll want an APKOpen.h interface for serializing
    the cache info to a string and then unpacking a string into the
    cache structure.

>diff --git a/other-licenses/android/APKOpen.cpp b/other-licenses/android/APKOpen.cpp
>+static int cache_count = 0;
>+struct lib_cache_info {
>+  char name[32];
>+  int fd;
>+};
>+static struct lib_cache_info *cache_mapping = NULL;
>+
>+static void
>+readLibCache()
>+
>+static void
>+writeLibCache()
>+
>+static int
>+getLibCache(const char *libName)
>+
>+static void
>+addLibCache(const char *libName, int fd)

I would sleep a lot easier if we used real data structures and library
code to implement these.  Is it possible for us to use <vector> or
<map> and <string> from this code?  I think it should be.

>@@ -358,32 +461,45 @@ static void * mozload(const char * path,
> 
>   int fd = zip_fd;
>   void * buf = NULL;
>+  uint32_t lib_size = letoh32(entry->uncompressed_size);
>   if (letoh16(file->compression) == DEFLATE) {
>-    fd = -1;
>-    buf = mmap(NULL, letoh32(entry->uncompressed_size),
>-               PROT_READ | PROT_WRITE | PROT_EXEC,
>-               MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>+    int cache_fd = getLibCache(path + 4);
>+    fd = cache_fd;
>+    if (fd < 0) {
>+      fd = create_ashmem(lib_size);
>+    } else

I think you want to nuke this |else|, otherwise you'll miss failed
create_ashmem()s.

>diff --git a/other-licenses/android/linker.c b/other-licenses/android/linker.c
>--- a/other-licenses/android/linker.c
>+++ b/other-licenses/android/linker.c
>@@ -1394,6 +1395,24 @@ static int reloc_library(soinfo *si, Elf
>     Elf32_Rel *start = rel;
>     unsigned idx;
> 
>+    /* crappy hack to ensure we don't write into the read-only region */
>+
>+    /* crappy hack part 1: find the read only region */
>+        /* crappy hack part 2: make this section writable */

I didn't follow too closely what this code is doing, but can you use
mprotect() instead of remapping regions?
Attachment #486249 - Flags: review?(jones.chris.g) → review-
(Assignee)

Comment 2

9 years ago
(In reply to comment #1)
> >diff --git a/other-licenses/android/linker.c b/other-licenses/android/linker.c
> >--- a/other-licenses/android/linker.c
> >+++ b/other-licenses/android/linker.c
> >@@ -1394,6 +1395,24 @@ static int reloc_library(soinfo *si, Elf
> >     Elf32_Rel *start = rel;
> >     unsigned idx;
> > 
> >+    /* crappy hack to ensure we don't write into the read-only region */
> >+
> >+    /* crappy hack part 1: find the read only region */
> >+        /* crappy hack part 2: make this section writable */
> 
> I didn't follow too closely what this code is doing, but can you use
> mprotect() instead of remapping regions?

Nope because mapping private regions out of ashmem fds (to take advantage of COW) doesn't appear to work. If we just mprotect the region to be writable, then the ashmem for the library ends up getting changed, which then prevents us from easily sharing the inflated library with the child process.
(Assignee)

Comment 3

9 years ago
Still addressing review comments but I changed the unsharing code to use less memory. We should save at least 2mb per process, 4mb overall.
Attachment #486249 - Attachment is obsolete: true
Attachment #486249 - Flags: review?(tglek)
(Assignee)

Comment 4

9 years ago
Attachment #486732 - Attachment is obsolete: true
Attachment #486771 - Flags: review?(jones.chris.g)
(Assignee)

Comment 5

9 years ago
Small cleanup.
Attachment #486771 - Attachment is obsolete: true
Attachment #486784 - Flags: review?(jones.chris.g)
Attachment #486771 - Flags: review?(jones.chris.g)
Comment on attachment 486784 [details] [diff] [review]
Use ashmemory to reduce copies of library code, v4

There's a fairly big problem with this approach that I didn't think of
on first review.  The spec for ashmem says that the kernel can toss
out the backing memory whenever it feels like it.  This isn't too bad
for gfx surfaces and such (we'll just crash), but it's a lot scarier
for code sections.  I thought for a while, and I don't think there's
any additional security issue even for ashmem segments mapped +wx,
because writing to the segment or trying to execute code will just
crash.  (Definitely worth thinking over a few times though!)

The problem is that if our mapped libraries get tossed out, we'll get
weird crashes that breakpad might not be able to catch, because
breakpad itself might be one of the libs tossed out.  We also probably
won't be able to reliably listen for ashmem-pressure notifications
since the listening might get tossed.

A solution might be pinning our mapped libraries, but that kinda
sucks.  I guess alternately, OOM-killer crashes (SIGKILL) can't be
caught by breakpad either, so we might lump ashmem-toss crashes with
those and define the problem away.  At any rate, we can file this away
until post-beta2 I guess.

>diff --git a/ipc/glue/GeckoChildProcessHost.cpp b/ipc/glue/GeckoChildProcessHost.cpp
>+  while (cache &&
>+         cacheCount++ < MAX_LIB_CACHE_ENTRIES &&
>+         strlen(cache->name)) {
>+    char entrybuf[32 + 6];
>+    mFileMap.push_back(std::pair<int,int>(cache->fd, cache->fd));
>+    snprintf(entrybuf, sizeof(entrybuf), "%s:%d;", cache->name, cache->fd);
>+    cacheStr.Append(entrybuf);

  cacheStr.Append(cache->name);
  cacheStr.AppendPrintf(":%d", cache->fd);

please.

>+  if (cacheStr.IsEmpty())
>+    cacheStr.AppendLiteral("-");

Why is this here?  Display purposes?  If so, please add a comment.

>diff --git a/ipc/glue/SharedMemoryBasic_android.cpp b/ipc/glue/SharedMemoryBasic_android.cpp

Thanks for cleaning this up.

>diff --git a/other-licenses/android/APKOpen.cpp b/other-licenses/android/APKOpen.cpp
>--- a/other-licenses/android/APKOpen.cpp
>+++ b/other-licenses/android/APKOpen.cpp
>+static int
>+create_ashmem(size_t bytes)

You're mixing this_style() with thisOne(), not sure which is preferred
in general here.

>+static void
>+readLibCache(const char *buf)

Minor nit, but a name like "loadLibCacheFrom()" or "fillLibCache()"
would make it a bit clearer what this function is doing.

>+  do {
>+    struct lib_cache_info *info = &cache_mapping[cache_count];
>+
>+    char * name_end = strchr(buf, ':');
>+    if (!name_end)
>+      break;
>+    strncpy(info->name, buf, name_end - buf);
>+    buf = name_end + 1;

No thanks, we can pay the cost of a strdup() then use strtok_r() here
:).

Need to assert() that the deserialized name is < MAX_LIB_NAME_LEN (or
whatever), or alternately use that limit for the strncpy() since
garbage names don't really hurt us, they just won't match in lookups.

>+
>+    char * fd_end = strchr(buf, ';');
>+    if (!fd_end)
>+      break;
>+    info->fd = atoi(buf);

Let's use strod() here, error-check the fd first, then bail on this
entry if something goes wrong.

>+static int
>+getLibCache(const char *libName)

Minor nit again, would prefer "lookupCachedLibFd()" or something.

>+static void
>+addLibCache(const char *libName, int fd)
>+{
>+  if (!cache_mapping)
>+    cache_mapping = (struct lib_cache_info *)calloc(MAX_LIB_CACHE_ENTRIES,
>+                                                    sizeof(*cache_mapping));

It would be nice to share this code with readLibCache().

>+
>+  struct lib_cache_info *info = &cache_mapping[cache_count++];
>+  strncpy(info->name, libName, 32);

See upcoming comments re: 32.  Also remember that the limit arg to
strncpy() is the number of non-'\0' chars it will copy; it doesn't
include the '\0'.  So you need MAX_LEN-1, or to define MAX_LEN to be
in non-\0 chars.

>@@ -358,32 +425,44 @@ static void * mozload(const char * path,
> 
>   int fd = zip_fd;
>   void * buf = NULL;
>+  uint32_t lib_size = letoh32(entry->uncompressed_size);
>   if (letoh16(file->compression) == DEFLATE) {
>-    fd = -1;
>-    buf = mmap(NULL, letoh32(entry->uncompressed_size),
>-               PROT_READ | PROT_WRITE | PROT_EXEC,
>-               MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>+    int cache_fd = getLibCache(path + 4);

I strongly dislike this |+ 4| style for several reasons, but it's
already used all over the place already so meh.  Sucks for the person
who has to change this hard-coding.

>diff --git a/other-licenses/android/APKOpen.h b/other-licenses/android/APKOpen.h
>--- a/other-licenses/android/APKOpen.h
>+++ b/other-licenses/android/APKOpen.h
>@@ -47,4 +47,13 @@ struct mapping_info {
> 
> extern struct mapping_info * lib_mapping;
> 
>+struct lib_cache_info {
>+  char name[32];

Please use a #define constant for the length of the name.  Also see
above re: strncpy fenceposting.

How did you compute this constant?  Looking at |find -name '*.so'|, it
appears that our longest lib name is libxpcomsample.so, at 17 chars
(18 bytes).  I wonder if we should enforce limits on lib names
somewhere, but that's work for another bug.

>+  int fd;

(Not using lci_name and lci_fd?  Thought you were a C hacker ;).)

>+};
>+
>+extern struct lib_cache_info * cache_mapping;
>+

Let's use a real documented API for accessing this.  Say,

  mozilla {
  const struct lib_cache_info* GetLibraryCache();
  }

or something non-C++-y if there's any chance of this code being
converted to .c.

Magic non-const externs across huge codebases that spawn many threads
make me very twitchy.  Dunno how ted let you get away with that ;).

The threading model of this API in concert with dlopen() is a little
scary to me.  It seems that we're relying on only dlopen()ing libs
from our lib/ subdir from the android main thread during startup, and
then never again.  And thereafter, only reading the cache from the
Gecko main thread.  I'm not sure what guarantees this.  Could an
extension cause a library from lib/ to be loaded from a non-Gecko-main
thread?  (We can ignore binary extensions, but CTypes is still in the
picture.)  I suspect we're probably OK here, but we need to document
this model.  I'm not sure if there's anything we can do to enforce it.

>+#define MAX_LIB_CACHE_ENTRIES 16

|find| tells we me have 15 .so's in dist/fennec/lib.  This constant is
cutting it too close for my taste; let's go to 32.

>diff --git a/other-licenses/android/linker.c b/other-licenses/android/linker.c
>+        /* crappy hack part 2: make this page writable */
>+        void * reloc_page = reloc & ~PAGE_MASK;
>+        if (reloc < ro_region_end && reloc_page != remapped_page) {
>+            if (remapped_page != NULL)
>+                mprotect(remapped_page, PAGE_SIZE, PROT_READ | PROT_EXEC);
>+            memcpy(copy_page, reloc_page, PAGE_SIZE);
>+            munmap(reloc_page, PAGE_SIZE);
>+            if (mmap(reloc_page, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
>+                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
>+                     -1, 0) == (void *)-1)

s/(void *)-1/MAP_FAILED/

The meat of this patch is shippable IME.  Would like to see one more
iteration.
Attachment #486784 - Flags: review?(jones.chris.g)
(In reply to comment #6)
> >+static void
> >+addLibCache(const char *libName, int fd)
> >+{
> >+  if (!cache_mapping)
> >+    cache_mapping = (struct lib_cache_info *)calloc(MAX_LIB_CACHE_ENTRIES,
> >+                                                    sizeof(*cache_mapping));
> 
> It would be nice to share this code with readLibCache().
> 
> >+
> >+  struct lib_cache_info *info = &cache_mapping[cache_count++];
> >+  strncpy(info->name, libName, 32);
> 
> See upcoming comments re: 32.  Also remember that the limit arg to
> strncpy() is the number of non-'\0' chars it will copy; it doesn't
> include the '\0'.  So you need MAX_LEN-1, or to define MAX_LEN to be
> in non-\0 chars.

Also please add an assertion that the libname is less than the max length.
(Assignee)

Comment 8

9 years ago
(In reply to comment #6)
> Comment on attachment 486784 [details] [diff] [review]
> Use ashmemory to reduce copies of library code, v4
> 
> There's a fairly big problem with this approach that I didn't think of
> on first review.  The spec for ashmem says that the kernel can toss
> out the backing memory whenever it feels like it.  This isn't too bad
> for gfx surfaces and such (we'll just crash), but it's a lot scarier
> for code sections.  I thought for a while, and I don't think there's
> any additional security issue even for ashmem segments mapped +wx,
> because writing to the segment or trying to execute code will just
> crash.  (Definitely worth thinking over a few times though!)
> 
Yeah I'm not quite sure what to do here yet.

> >+  if (cacheStr.IsEmpty())
> >+    cacheStr.AppendLiteral("-");
> 
> Why is this here?  Display purposes?  If so, please add a comment.
> 
It's to ensure there's always a last argument for the child process to consume. (assuming that an empty arg gets consumed along the way, which I'm not sure about)

> >diff --git a/other-licenses/android/APKOpen.cpp b/other-licenses/android/APKOpen.cpp
> >--- a/other-licenses/android/APKOpen.cpp
> >+++ b/other-licenses/android/APKOpen.cpp
> >+static int
> >+create_ashmem(size_t bytes)
> 
> You're mixing this_style() with thisOne(), not sure which is preferred
> in general here.
> 
Yeah it's unfortunate. I think this file tends towards camelCase. There's a few zip functions which aren't but that's because it was copied from a C program I wrote previously.

> I strongly dislike this |+ 4| style for several reasons, but it's
> already used all over the place already so meh.  Sucks for the person
> who has to change this hard-coding.
> 
I have a patch in my queue to kill the + 4 (while adding lzma support).

> Magic non-const externs across huge codebases that spawn many threads
> make me very twitchy.  Dunno how ted let you get away with that ;).
> 
> The threading model of this API in concert with dlopen() is a little
> scary to me.  It seems that we're relying on only dlopen()ing libs
> from our lib/ subdir from the android main thread during startup, and
> then never again.  And thereafter, only reading the cache from the
> Gecko main thread.  I'm not sure what guarantees this.  Could an
> extension cause a library from lib/ to be loaded from a non-Gecko-main
> thread?  (We can ignore binary extensions, but CTypes is still in the
> picture.)  I suspect we're probably OK here, but we need to document
> this model.  I'm not sure if there's anything we can do to enforce it.
> 
So basically, there is no threading when this code is used. These special dlopen functions for opening libraries from the apk are only called during initialization. The moz_mapped_dlopen shouldn't be visible (even though it is right now). The mozload function and everything else should not be accessible via ctypes. The only exposed functions that can call the special library loading code use a fixed list of libraries to load. You could mess with the library fd cache though, but your suggested change fixes that (as long as we return a const pointer).
(In reply to comment #8)
> > >+  if (cacheStr.IsEmpty())
> > >+    cacheStr.AppendLiteral("-");
> > 
> > Why is this here?  Display purposes?  If so, please add a comment.
> > 
> It's to ensure there's always a last argument for the child process to consume.
> (assuming that an empty arg gets consumed along the way, which I'm not sure
> about)
> 

exec() is happy with an empty string, but "-" suits me OK.  Just wondered where "-" came from.

> > Magic non-const externs across huge codebases that spawn many threads
> > make me very twitchy.  Dunno how ted let you get away with that ;).
> > 
> > The threading model of this API in concert with dlopen() is a little
> > scary to me.  It seems that we're relying on only dlopen()ing libs
> > from our lib/ subdir from the android main thread during startup, and
> > then never again.  And thereafter, only reading the cache from the
> > Gecko main thread.  I'm not sure what guarantees this.  Could an
> > extension cause a library from lib/ to be loaded from a non-Gecko-main
> > thread?  (We can ignore binary extensions, but CTypes is still in the
> > picture.)  I suspect we're probably OK here, but we need to document
> > this model.  I'm not sure if there's anything we can do to enforce it.
> > 
> So basically, there is no threading when this code is used. These special
> dlopen functions for opening libraries from the apk are only called during
> initialization. The moz_mapped_dlopen shouldn't be visible (even though it is
> right now). The mozload function and everything else should not be accessible
> via ctypes. The only exposed functions that can call the special library
> loading code use a fixed list of libraries to load. You could mess with the
> library fd cache though, but your suggested change fixes that (as long as we
> return a const pointer).

OK.  I thought we might be doing something magical in dlopen().  Sounds good, but please document that.
(Assignee)

Comment 10

9 years ago
Attachment #486784 - Attachment is obsolete: true
Attachment #486859 - Flags: review?(jones.chris.g)
Comment on attachment 486859 [details] [diff] [review]
Use ashmemory to reduce copies of library code, v5

Looks good.

>diff --git a/ipc/glue/SharedMemoryBasic_android.cpp b/ipc/glue/SharedMemoryBasic_android.cpp
>--- a/ipc/glue/SharedMemoryBasic_android.cpp
>+++ b/ipc/glue/SharedMemoryBasic_android.cpp
>+static void
>+fillLibCache(const char *buf)
>+{
>+  ensureLibCache();
>+
>+  char * str = strdup(buf);
>+  if (!str)
>+    return;
>+
>+  char * saveptr;
>+  char * nextstr = str;
>+  do {
>+    struct lib_cache_info *info = &cache_mapping[cache_count];
>+
>+    char * name = strtok_r(nextstr, ":", &saveptr);
>+    if (!name)
>+      break;
>+    nextstr = NULL;
>+    strncpy(info->name, name, MAX_LIB_CACHE_NAME_LEN - 1);
>+
>+    char * fd_str = strtok_r(NULL, ";", &saveptr);
>+    if (!fd_str)
>+      break;
>+
>+    long int fd = strtol(fd_str, NULL, 10);
>+    if (fd == LONG_MIN || fd == LONG_MAX)
>+      break;
>+    info->fd = fd;
>+  } while (cache_count++ < MAX_LIB_CACHE_ENTRIES);
>+  free(str);
>+}

Please move the fd-string tokenization and strtol() before you write
into |info|.  Fd 0 is valid and might be mappable in some
circumstances, bad things could happen as a result.

>diff --git a/other-licenses/android/APKOpen.h b/other-licenses/android/APKOpen.h
>--- a/other-licenses/android/APKOpen.h
>+++ b/other-licenses/android/APKOpen.h
>@@ -45,6 +45,16 @@ struct mapping_info {
>   size_t offset;
> };
> 
>-extern struct mapping_info * lib_mapping;
>+extern const struct mapping_info * getLibraryMapping();
>+extern const struct lib_cache_info * getLibraryCache();
> 

No need for |extern| on these.

r=me with that.
Attachment #486859 - Flags: review?(jones.chris.g) → review+
(Assignee)

Comment 12

9 years ago
http://hg.mozilla.org/mozilla-central/rev/99233ad2ff70
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
Backed out due to bug 608239

http://hg.mozilla.org/mozilla-central/rev/1cf4edfb9525
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 15

9 years ago
Comment on attachment 486986 [details] [diff] [review]
Avoid fd remap collisions

\o/
Attachment #486986 - Flags: review?(mwu) → review+
(Assignee)

Comment 16

9 years ago
http://hg.mozilla.org/mozilla-central/rev/204b231f8d92
http://hg.mozilla.org/mozilla-central/rev/a5573c64b898
Status: REOPENED → RESOLVED
Last Resolved: 9 years ago9 years ago
Resolution: --- → FIXED
Filed bug 608376 on making the impl a bit more robust.
tracking-fennec: ? → ---
You need to log in before you can comment on or make changes to this bug.