788021 - Dump about:memory to a file upon receiving a signal

Assignee

Description

•

12 years ago

Patches in a moment.

Justin Lebar (not reading bugmail)

Assignee

Comment 1

•

12 years ago

Attached patch Bug 788021 - Part A: Implement signal handler for triggering about:memory dump. (obsolete) — Details — Splinter Review

These three patches are interdependent, and I'll land them folded together.
I'm posting them as three separate patches here because I'm touching three
different peoples' areas of focus.

The functions I'm calling in the signal handler are implemented in part C, and
they're trivially safe to call from a signal handler -- they simply atomically
set a global variable to 1.

I'm using SIGRTMIN and SIGRTMIN + 1 instead of SIGUSR1 and SIGUSR2 because
SIGUSR{1,2} are already used throughout our code for a variety of things.  It
looks like B2G uses SIGUSR1 for the alarm, for example.  Nobody seems to be
using the real-time signals.

One downside of using the realtime signals is that you can't do

  $ kill SIGUSR1 $(pidof firefox)

instead, you have to do

  $ kill -34 $(pid firefox)

It's -35 for SIGRTMIN + 1.  These numbers may be different on non-Linux
machines.  I hope that's the only problem with the approach I've taken here,
but this is unfamiliar territory to me.

(In fact, I don't understand why they're 34 and 35 and not 32 and 33 on my
machine, since /usr/include/linux/signal.h seems to include
/usr/include/asm/signal.h, which clearly defines SIGRTMIN as 32.)

Attachment #657926 - Flags: review?(mh+mozilla)

Justin Lebar (not reading bugmail)

Assignee

Comment 2

•

12 years ago

Attached patch Part B: Add and implement nsIMemoryReporterManager::DumpAboutMemory{,WithGC}(). (obsolete) — Details — Splinter Review

The meat of this patch is simply implementing the minimize-memory-usage button
in C++.  Let me know if you're not comfortable signing off on the Gecko gunk
here and we can definitely find someone else to look over the code.

Attachment #657928 - Flags: review?(n.nethercote)

Justin Lebar (not reading bugmail)

Assignee

Comment 3

•

12 years ago

Attached patch Part C: Add ScheduleAboutMemoryDump{,WithGC} functions to nsThread. (obsolete) — Details — Splinter Review

I'm not entirely comfortable with this patch, because it adds two more atomic
operations to the event loop, for what are very uncommon, low-priority events.

If you'd like, I think I can take this back down to one atomic op by using a
bitmask to shove [pending-low-memory, pending-about-memory-dump,
pending-about-memory-dump-with-gc] into one variable and using atomic
check-and-set operations to set each flag when appropriate.

But I don't have any idea how hot this code is, so I didn't want to prematurely
optimize.

Another option would be to check the pending-about-memory-dump variables at
some other point in time.  But I wasn't sure when else I could do the check.

Attachment #657931 - Flags: review?(benjamin)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Assignee: nobody → justin.lebar+bug

Nicholas Nethercote [inactive]

Comment 4

•

12 years ago

Comment on attachment 657926 [details] [diff] [review]
Bug 788021 - Part A: Implement signal handler for triggering about:memory dump.

Review of attachment 657926 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +548,5 @@
> +
> +// These are effectively constants, but we don't initialize them until
> +// RegisterSignalHandlers() is called to avoid creating static initializers.
> +static int sDumpAboutMemorySignum;
> +static int sDumpAboutMemoryAfterGCSignum;

static integers cause static initializers?  Huh.  In your 3rd patch you have some similar static integers that you initialize...

Nicholas Nethercote [inactive]

Comment 5

•

12 years ago

Comment on attachment 657928 [details] [diff] [review]
Part B: Add and implement nsIMemoryReporterManager::DumpAboutMemory{,WithGC}().

Review of attachment 657928 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks for doing this!

I'm still unclear how this functionality will be used in practice.  For Fennec, we have the front-end Java process and the back-end Gecko process, right?  The Gecko process obviously has the memory reporting infrastructure, but does the Java process have anything?  Does about:memory currently report anything for the Java process?

And for B2G, my understanding is that the process situation is fluid, but we'll likely end up with multiple processes.  What will control the sending of the signals to each process?  I guess that can be worked out later?  Also, about:memory can currently only load a single JSON file at a time, but it shouldn't be hard to change that to handle multiple files.

::: xpcom/base/nsIMemoryReporter.idl
@@ +302,5 @@
> +  /*
> +   * This is just like dumpReports(), except it first runs a series of
> +   * GC/CCs in an attempt to minimize the application's memory usage.
> +   */
> +  void dumpReportsAfterGC ();

Good idea.  Maybe dumpReportAfterGCAndCC?

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +1061,5 @@
> +namespace {
> +
> +/**
> + * This runnable lets us do the equivalent of about:memory's minimize
> + * memory usage button: Fire a heap-minimize notification, spin the

Nit: "minimize memory usage" in quotes.

@@ +1067,5 @@
> + *
> + * When this process finishes, we invoke the callback function passed
> + * to the runnable's constructor.
> + */
> +class MinimizeMemoryUsageRunnable : public nsRunnable

Is it worth exposing this via XPCOM and removing minimizeMemoryUsage3x() from aboutMemory.js, to avoid the code duplication?  Just a thought -- feel free to ignore.

Attachment #657928 - Flags: review?(n.nethercote) → review+

Justin Lebar (not reading bugmail)

Assignee

Comment 6

•

12 years ago

> static integers cause static initializers?  Huh.  In your 3rd patch you have some similar static 
> integers that you initialize...

Well, static data set to 0 is free, regardless.

But actually, I think static integers set to SIGRTMIN {,+1} should be totally
fine.  It's just the C++ initializers which are a problem, AIUI.  But glandium
can tell us for sure.

> I'm still unclear how this functionality will be used in practice.  For
> Fennec, we have the front-end Java process and the back-end Gecko process,
> right?

AIUI it's all one process.

> The Gecko process obviously has the memory reporting infrastructure, but does
> the Java process have anything?  Does about:memory currently report anything
> for the Java process?

I don't believe we report any Java-specific metrics (e.g. Java heap size), but
since it's all one process, the RSS should still be correct.

> And for B2G, my understanding is that the process situation is fluid, but
> we'll likely end up with multiple processes.

We currently have multiple processes and have no plans to remove them, yes.

> What will control the sending of the signals to each process?

My intent is to write a script which will log in to the phone, send the proper
signal to each process, and pull all the files off the phone.  This script
could even merge the files into one big file.

(We scripts in the master B2G repository for "start b2g on the phone within a
remote debugging session", so this script would be in good company.)

We may have to do a bit of work to figure out how to identify each process
meaningfully in about:memory.  I'm not sure yet.

@@ +1067,5 @@
> + *
> + * When this process finishes, we invoke the callback function passed
> + * to the runnable's constructor.
> + */
> +class MinimizeMemoryUsageRunnable : public nsRunnable

> Is it worth exposing this via XPCOM and removing minimizeMemoryUsage3x() from
> aboutMemory.js, to avoid the code duplication?  Just a thought -- feel free
> to ignore.

I didn't do this because exposing callback functions in XPCOM is kind of a
pain, but now that I think about it, I agree we should do it if only for the
sake of not duplicating code.  I'll fix it up.

Nicholas Nethercote [inactive]

Comment 7

•

12 years ago

Thanks for the details!

> We may have to do a bit of work to figure out how to identify each process
> meaningfully in about:memory.  I'm not sure yet.

An identifying string argument to dumpReports() might be useful.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Blocks: 788072

Mike Hommey [:glandium]

Comment 8

•

12 years ago

Comment on attachment 657926 [details] [diff] [review]
Bug 788021 - Part A: Implement signal handler for triggering about:memory dump.

Review of attachment 657926 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +548,5 @@
> +
> +// These are effectively constants, but we don't initialize them until
> +// RegisterSignalHandlers() is called to avoid creating static initializers.
> +static int sDumpAboutMemorySignum;
> +static int sDumpAboutMemoryAfterGCSignum;

Simple types like int have no static initializer problems. However, since these values are constants, please make them const, at least they'd have a chance of being completely inlined and not waste memory.

@@ +565,5 @@
> +void
> +RegisterSignalHandlers()
> +{
> +  sDumpAboutMemorySignum = SIGRTMIN;
> +  sDumpAboutMemoryAfterGCSignum = SIGRTMIN + 1;

The more we're going to have code like that, the more we're going to have problems finding signal codes to use. Maybe it's time to give up on "simple" signals and use a single dispatch signal handler which would react on data given by the app sending the signal. (man sigqueue)
It would require an external program to send the signal, but at least, we wouldn't starve on signal codes. But that'd be for another bug.

Attachment #657926 - Flags: review?(mh+mozilla) → review+

Justin Lebar (not reading bugmail)

Assignee

Comment 9

•

12 years ago

Comment on attachment 657926 [details] [diff] [review]
Bug 788021 - Part A: Implement signal handler for triggering about:memory dump.

This may not work on the device.  The first problem is that the device's kill doesn't take values over 32, so I'm going to have to write that myself.  But then it's not clear that Gecko has a tmp directory to write to.

Anyway, I will continue to investigate.

(not currently active) Ted Mielczarek

Comment 10

•

12 years ago

On B2G you should be able to write to /data/local/b2g, which I *think* should be set as the TMPDIR environment variable.

Mike Hommey [:glandium]

Comment 11

•

12 years ago

The directory service should give proper TmpD locations.

Justin Lebar (not reading bugmail)

Assignee

Comment 12

•

12 years ago

(In reply to Ted Mielczarek [:ted] from comment #10)
> On B2G you should be able to write to /data/local/b2g, which I *think*
> should be set as the TMPDIR environment variable.

Turns out it's /data/local/tmp, but yes, it works!

Justin Lebar (not reading bugmail)

Assignee

Comment 13

•

12 years ago

It takes about 30s or so for a single dump to appear on my Otoro phone.  That's kind of a long time.

If we're I/O bound, gzip'ing the output might make things faster.

Justin Lebar (not reading bugmail)

Assignee

Comment 14

•

12 years ago

Oh, it also looks like these nsIFileOutputStream::Write() calls are unbuffered.  :(

Nicholas Nethercote [inactive]

Comment 15

•

12 years ago

Sigh.  The rss/pss/vsize/swap trees will be accounting for a large chunk of that 30s (I'm assuming they're present on Android?)

Justin Lebar (not reading bugmail)

Assignee

Comment 16

•

12 years ago

(In reply to Nicholas Nethercote [:njn] from comment #15)
> Sigh.  The rss/pss/vsize/swap trees will be accounting for a large chunk of
> that 30s (I'm assuming they're present on Android?)

Yes and yes.

It goes down to ~5s if I buffer the output.  I'm trying to figure out how to compress it now...

Benjamin Smedberg

Updated

•

12 years ago

Attachment #657931 - Flags: review?(benjamin) → review+

Justin Lebar (not reading bugmail)

Assignee

Comment 17

•

12 years ago

With gzip compression, it's roughly 1.5s, so that's a win.

Justin Lebar (not reading bugmail)

Assignee

Comment 18

•

12 years ago

Sorry, guys: I should have tested this more extensively on the device before asking for reviews.

Now that I'm testing in an automated fashion (without touching the phone for long periods), I've noticed that we often send a signal to a content process on the device whose event loop is entirely idle.  As a result, it never notices that the "dump about:memory" bit is set, and we never dump anything.

It's not simple to poke the event loop into running again when we hear the signal, since the signal could happen at any time and on any thread.  glandium suggested we could use the gecko remote debugging API for this, although I'm hesitant to try it because it's so new.  (It wasn't even landed as of a few days ago, iirc.)

Sorry again for not testing well in the first place.

Justin Lebar (not reading bugmail)

Assignee

Comment 19

•

12 years ago

> glandium suggested we could use the gecko remote debugging API for this, although I'm hesitant to 
> try it because it's so new.  (It wasn't even landed as of a few days ago, iirc.)

My plan of the moment is to send a signal only to the main process, whose event loop seems to run fairly often, and then have the main process send IPDL messages to its children instructing them to dump their about:memory contents.  We'll see if that works, but any other ideas are of course appreciated.

(not currently active) Ted Mielczarek

Comment 20

•

12 years ago

It's not that hard to fire an event from a background thread, but I don't know how safe it is from a signal handler.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 21

•

12 years ago

<3

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 22

•

12 years ago

(In reply to Ted Mielczarek [:ted] from comment #20)
> It's not that hard to fire an event from a background thread, but I don't
> know how safe it is from a signal handler.

Can't do that from a signal handler, ja.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Depends on: 790417

Justin Lebar (not reading bugmail)

Assignee

Comment 23

•

12 years ago

As an update as to where I'm at here: With gzip compression, we can dump and pull four processes' memory reports in a second or two.  It's working rather smoothly.

cjones suggested that we could improve our current signal mechanism by poll()'ing a signalfd (of /course/ Linux lets you treat a signal like a file) on B2G's I/O thread, as an alternative to the current mechanism.  Or we could even poll() a named pipe.

This change would be an improvement in two ways:

 - With the current approach, I had to write my own |kill|, because the one on the phone refused to send the realtime signals we're listening to, and

 - With the current approach, when the main process's event loop is quiescent (e.g. the screen is off and you haven't touched the phone in a few minutes), you have to touch the phone again before we notice the signal.

This would theoretically let us go back to my previous approach of signaling each process, as opposed to signaling only the main process.  But I actually like the current approach of signaling only the main process better because it lets us associate a unique identifier with the whole batch of about:memory dumps.

Another thing I still would like to address is the naming of processes.  Right now we name the process in the memory report only by its PID, but we can probably do better than that.  But I may punt this to a follow-up.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 24

•

12 years ago

dhylands already landed the code to prctl(PR_SET_NAME) with the app name (or "(App)" for the prelaunch or "Browser").  You could draft off of that by extending his interface to prctl(PR_GET_NAME) what he's already done the work to set :).

Dave Hylands [:dhylands]

Comment 25

•

12 years ago

The code to set the name is in ipc/glue/ProcessUtils_linux.cpp and a dummy version in ipc/glue/ProcessUtils_none.cpp (for non-linux platforms).

For non-linux platforms we should probably just stash whatever name was set and return it in the get.

Justin Lebar (not reading bugmail)

Assignee

Comment 27

•

12 years ago

I've had a heck of a time trying to get a named pipe to work.

The trick, as glandium explained to me, is that you have to open the fifo as non-blocking, because otherwise open() will block until the other end of the fifo is opened.  But then we have to switch it to a blocking stream after you've opened it, so the I/O thread can select() on it and notify you when the fd is available for read without blocking.

In theory, you can fcntl the fifo to blocking after you open it.  But in practice,

  fcntl(mFd.get(), F_SETFL, 0)

seems to have no effect with respect to whether the fifo is blocking.

I guess I'll see if I have any more luck with a signalfd...

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 28

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #27)
> I've had a heck of a time trying to get a named pipe to work.
> 
> But then we have to switch it to a blocking stream after
> you've opened it, so the I/O thread can select() on it and notify you when
> the fd is available for read without blocking.
> 

epoll should be perfectly happy with either.  Nonblocking just means that if you try to read/write more than current capacity, you immediately get an EWOULDBLOCK error.

Were you getting an error from epoll when the named pipe was in nonblocking mode?

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Attachment #657926 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Attachment #657928 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Attachment #657931 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Comment 29

•

12 years ago

Attached patch Bug 788021 - Part 1: Add nsGzipWriter. (obsolete) — Details — Splinter Review

I'm happy to take suggestions for where to put these files; xpcom/base is
pretty random.

Note that this is slightly different from all the Necko code we have to write
gzip data, because raw gzip data is not a valid .gz file.

Now that I think  of it, perhaps we should change the name of this class to
reflect that difference.  nsGZFileWriter, maybe?

Attachment #661013 - Flags: review?(mh+mozilla)

Justin Lebar (not reading bugmail)

Assignee

Comment 30

•

12 years ago

> Were you getting an error from epoll when the named pipe was in nonblocking mode?

The chromium event loop (which I admit I didn't investigate too deeply) was calling "CanReadWithoutBlocking" in an infinite loop.  I presumed that the problem was related to blocking, although perhaps that's wrong...

Justin Lebar (not reading bugmail)

Assignee

Comment 31

•

12 years ago

Attached patch Bug 788021 - Part 2a: Add PContent::DumpMemoryReportsToFile. — Details — Splinter Review

As mentioned earlier, if we use signals (as we do in this set of patches), we
could signal each process separately, and then we wouldn't need IPC support.

However, I like this because it lets the root process attach an identifier to a
set of memory reports.  We couldn't do that if we signaled each process
separately.

Attachment #661017 - Flags: review?(jones.chris.g)

Justin Lebar (not reading bugmail)

Assignee

Comment 32

•

12 years ago

Attached patch Bug 788021 - Part 2b: Signal handling goop. — Details — Splinter Review

glandium did the last review of the Unix code here, but I'm touching the
chromium IO thread, so perhaps cjones is a better person to look at the new
version.

Attachment #661018 - Flags: review?(jones.chris.g)

Justin Lebar (not reading bugmail)

Assignee

Comment 33

•

12 years ago

Attached patch Bug 788021 - Part 2c: Remaining changes to nsMemoryReporterManager. — Details — Splinter Review

The other patches here are entirely different from the ones before; this one is
only mostly different.  The parts that are the same are mainly the
implementation of minimize-memory-usage.

Attachment #661020 - Flags: review?(n.nethercote)

Justin Lebar (not reading bugmail)

Assignee

Comment 34

•

12 years ago

Attached patch Bug 788021 - Part 3: Use nsMemoryReporterManager::MinimizeMemoryUsage in aboutMemory.js. — Details — Splinter Review

Attachment #661021 - Flags: review?(n.nethercote)

Justin Lebar (not reading bugmail)

Assignee

Comment 35

•

12 years ago

PR for "killer" program in gonk-misc (like kill(1), except it can send signals over 32)

  https://github.com/mozilla-b2g/gonk-misc/pull/26

PR for "get-about-memory" script in B2G (which uses killer)

  https://github.com/mozilla-b2g/B2G/pull/112

Nicholas Nethercote [inactive]

Comment 36

•

12 years ago

Comment on attachment 661013 [details] [diff] [review]
Bug 788021 - Part 1: Add nsGzipWriter.

Review of attachment 661013 [details] [diff] [review]:
-----------------------------------------------------------------

Beautiful comments in this patch, nice work.

::: xpcom/base/nsGzipWriter.cpp
@@ +71,5 @@
> +
> +  // gzwrite uses a return value of 0 to indicate failure.  Otherwise, it
> +  // returns the number of uncompressed bytes written.  To ensure we can
> +  // distinguish between success and failure, don't call gzwrite when we have 0
> +  // bytes to write.

Nice comment.

@@ +79,5 @@
> +
> +  // gzwrite never does a short write -- that is, the return value should
> +  // always be either 0 or aLen, and we shouldn't have to call it multiple
> +  // times in order to get it to read the whole buffer.
> +  int rv = gzwrite(mGZFile, aStr, aLen);

Another nice comment.

@@ +95,5 @@
> +  mFinished = true;
> +  gzclose(mGZFile);
> +
> +  // Ignore errors from gzclose; it's not like there's anything we can do about
> +  // it, at this point!

Another nice comment!

::: xpcom/base/nsIGzipWriter.idl
@@ +34,5 @@
> +   */
> +  void write(in ACString str);
> +
> +  /**
> +   * Write the given char* to the file (not including the null-terminator).

|char*| or |string|?  (Is this an xpcom synonym or something?)

Justin Lebar (not reading bugmail)

Assignee

Comment 37

•

12 years ago

Attached patch (Not working) attempt at using a fifo — Details — Splinter Review

Chris/glandium, I don't know if you're curious to look at this.  I'm reasonably satisfied with the current approach, but if we could make a fifo work, it would be easier to signal the process (no need for this extra "killer" program) and we would be able to pass data in with the signal.

With this patch (the relevant fifo stuff is at the top), OnFileCanReadWithoutBlocking gets called over and over again after we do a write to the fifo (echo -n 1 > /tmp/firefox-fifo).  All the reads are zero-length.  Neither the fcntl nor the read sets errno.

Maybe the zero-length reads are informing me that the writer closed its end of the fifo?  I'm not sure what the right thing would be to do at this point -- re-create the fifo, maybe?

Nicholas Nethercote [inactive]

Comment 38

•

12 years ago

Comment on attachment 661017 [details] [diff] [review]
Bug 788021 - Part 2a: Add PContent::DumpMemoryReportsToFile.

Review of attachment 661017 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentChild.cpp
@@ +427,5 @@
>  }
>  
> +bool
> +ContentChild::RecvDumpMemoryReportsToFile(const nsString& aIdentifier,
> +                                          const bool& aMinimizeMemoryUsage,

Would |aMinimizeMemoryUsageFirst| be a better name (here and in all the other places)?

@@ +435,5 @@
> +        do_GetService("@mozilla.org/memory-reporter-manager;1");
> +    NS_ENSURE_TRUE(mgr, true);
> +    mgr->DumpMemoryReportsToFile(aIdentifier,
> +                                 aMinimizeMemoryUsage,
> +                                 aDumpChildProcesses);

How does this work?  nsIMemoryReporterManager doesn't have a DumpMemoryReportsToFile() method, AFAICT...

Nicholas Nethercote [inactive]

Comment 39

•

12 years ago

Comment on attachment 661018 [details] [diff] [review]
Bug 788021 - Part 2b: Signal handling goop.

Review of attachment 661018 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +553,5 @@
> +#ifdef XP_UNIX // {
> +namespace {
> +
> +/*
> + * The following code supports dumping about:memory upon receiving a signal.

Nit: it dumps all the memory reports.  about:memory shows most but not all of them (e.g. it ignores the compartment and ghost window reports).

Nicholas Nethercote [inactive]

Updated

•

12 years ago

Attachment #661021 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Comment 40

•

12 years ago

Comment on attachment 661020 [details] [diff] [review]
Bug 788021 - Part 2c: Remaining changes to nsMemoryReporterManager.

Review of attachment 661020 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, except I'm puzzled by the use of the temporary "incomplete" file.

BTW, will the signal stuff only work on Fennec/B2G, or will it work on desktop as well?

::: xpcom/base/nsIMemoryReporter.idl
@@ +306,5 @@
> +   * similar, such as memory-reports-<identifier>-<pid>-1.json.gz;  no existing
> +   * file will be overwritten).
> +   *
> +   * @param identifier this identifier will appear in the filename of our
> +   *   about:memory dump and those of our children.  If the identifier is

"those of our children" -- unless dumpChildProcesses is false.

@@ +309,5 @@
> +   * @param identifier this identifier will appear in the filename of our
> +   *   about:memory dump and those of our children.  If the identifier is
> +   *   empty, the dumpMemoryReportsToFile implementation may set it arbitrarily
> +   *   and use that new value for its own dump and the dumps of its child
> +   *   processes.  For example, we may set |identifier| to the number of

Who is "we" in this sentence?  Does it refer to the dumpMemoryReportsToFile implementation?

@@ +391,5 @@
>     * }
>     */
> +  void dumpMemoryReportsToFile (in AString identifier,
> +                                in bool minimizeMemoryUsage,
> +                                in bool dumpChildProcesses);

I thought you didn't like bool parameters :P

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +1086,5 @@
> +
> +    return DumpMemoryReportsToFileImpl(identifier);
> +}
> +
> +#define DUMP_ASCII(o, s) \

A name with four letters would be nice, so that calls to this match calls to DUMP().

Actually, I don't think having two different versions improves things much -- you get rid of five get() calls below at the cost of a duplicated macro here and a duplicated function in nsGzipWriter.  I'd just have one of each, but I'll let you decide what to do.

@@ +1109,5 @@
>      // We only want to dump reports for this process.  If |aProcess| is
>      // non-NULL that means we've received it from another process in response
>      // to a "child-memory-reporter-request" event;  ignore such reports.
>      if (!aProcess.IsEmpty()) {
>          return NS_OK;    

(Pre-existing) trailing whitespace.

@@ +1178,5 @@
> +    //
> +    //   incomplete-memory-report-<-identifier>-<pid>-42.json.gz
> +    //
> +    // in NS_OS_TEMP_DIR for writing.  When we're finished writing the report,
> +    // we'll rename this file and get rid of the "incomplete-" prefix.

I don't understand why the temporary file is necessary.  If it really is, please explain why in the comment.

@@ +1202,4 @@
>      NS_ENSURE_SUCCESS(rv, rv);
>     
>      rv = tmpFile->CreateUnique(nsIFile::NORMAL_FILE_TYPE, 0600); 
>      NS_ENSURE_SUCCESS(rv, rv);

Remove the trailing whitespace while you're here.

@@ +1298,5 @@
>  
>      return NS_OK;
>  }
>  
>  #undef DUMP

Add |#undef DUMP_ASCII|, if you decide to keep it.

@@ +1305,5 @@
> +
> +/**
> + * This runnable lets us implement nsIMemoryReporterManager::MinimizeMemoryUsage().
> + * We fire a heap-minimize notification, spin the event loop, and repeat this
> + * process a few times times.

"times times"

@@ +1337,5 @@
> +
> +    os->NotifyObservers(nullptr, "memory-pressure",
> +                        NS_LITERAL_STRING("heap-minimize").get());
> +    mRemainingIters--;
> +    NS_DispatchToMainThread(this);

I like the way you sort of inverted the logic from the aboutMemory.js version, it's easier to understand the counting now.

Attachment #661020 - Flags: review?(n.nethercote) → review+

Justin Lebar (not reading bugmail)

Assignee

Comment 41

•

12 years ago

> BTW, will the signal stuff only work on Fennec/B2G, or will it work on desktop as well?

It definitely works on desktop Linux.  I'm not sure if it works on Mac; I haven't tried.

I was thinking of pref'ing it off on desktop, though.  Do you think it's worth leaving on?

> +  void dumpMemoryReportsToFile (in AString identifier,
> +                                in bool minimizeMemoryUsage,
> +                                in bool dumpChildProcesses);
>
> I thought you didn't like bool parameters :P

I don't.  :(  I actually tried to make this nicer, but it was hard because have to pass the bools through IPDL and XPIDL, which meant I needed both an XPIDL and an IPDL args struct, and...the only thing worse than this API was the alternative I came up with.

Oh well.

> I don't understand why the temporary file is necessary.  If it really is, please explain why in 
> the comment.

The temporary file helps avoid a race condition when we're pulling files off the device.  We signal the processes and then poll to grab the files.  But we don't want to pull an incomplete file.  I'll definitely add a comment.

> How does this work?  nsIMemoryReporterManager doesn't have a DumpMemoryReportsToFile() method, 
> AFAICT...

You saw that I changed it in part C.  I changed it because I wanted it to match the IPDL name, and I thought the IPDL name needed more context than "dumpReports".

> Actually, I don't think having two different versions improves things much -- you get rid of five 
> get() calls below at the cost of a duplicated macro here and a duplicated function in 
> nsGzipWriter.  I'd just have one of each, but I'll let you decide what to do.

It's mostly a premature optimization to avoid unnecessary calls to strlen.  It's (probably) not  important here, but it's important for the interface, I think, if only so people can push non null-terminated bytes to a gz file.

The ideal interface for nsGzipWriter would have an overloaded method Write() which could take a char* or an nsACString.  I thought you couldn't do that in XPIDL, but actually, I bet I could, if I just made one of the methods noscript (or in a C++ block).  You wouldn't lose anything because JS could still call the other method...

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

12 years ago

Attachment #661017 - Flags: review?(jones.chris.g) → review+

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 42

•

12 years ago

Comment on attachment 661018 [details] [diff] [review]
Bug 788021 - Part 2b: Signal handling goop.

I hate to rain on this parade, but libevent already provides an
interface to do what you're doing here :(.

See http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc#201 .

Let's use that instead.  It'll be a small change and much less code.

Attachment #661018 - Flags: review?(jones.chris.g)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

12 years ago

Blocks: 783230

Mike Hommey [:glandium]

Comment 43

•

12 years ago

Comment on attachment 661013 [details] [diff] [review]
Bug 788021 - Part 1: Add nsGzipWriter.

Review of attachment 661013 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, but I'm not an xpcom peer :)

::: xpcom/base/nsGzipWriter.h
@@ +18,5 @@
> +
> +  NS_DECL_ISUPPORTS
> +  NS_DECL_NSIGZIPWRITER
> +
> +  nsresult Write(const char* aStr, uint32_t aLen);

Any particular reason to make this one public, when it seems to be an implementation detail?

Attachment #661013 - Flags: review?(mh+mozilla)

Attachment #661013 - Flags: review?(benjamin)

Attachment #661013 - Flags: review+

Justin Lebar (not reading bugmail)

Assignee

Comment 44

•

12 years ago

> LGTM, but I'm not an xpcom peer :)

Thing is, it's not really XPCOM code; it just happens to live in xpcom/.  I could move it to toolkit/ and it would suddenly come under an entirely different set of peoples' purview...

> Any particular reason to make this one public, when it seems to be an implementation 
> detail?

I guess I should either put it in the interface or make it private.  I figured it might be useful to someone (say you're reading a buffer from disk; you shouldn't have to wrap that in a nsDependentCString just to appease the interface Gods).

Justin Lebar (not reading bugmail)

Assignee

Comment 45

•

12 years ago

Comment on attachment 661013 [details] [diff] [review]
Bug 788021 - Part 1: Add nsGzipWriter.

I'll attach an updated version of this patch in a few minutes.

Attachment #661013 - Flags: review?(benjamin)

Justin Lebar (not reading bugmail)

Assignee

Comment 46

•

12 years ago

I have a new set of patches, but unfortunately my device is refusing to even start Gecko.  If the past is any guide, I should wait until I can test on the device before assuming everything will work properly, so I'll ask for reviews once I can test.

Justin Lebar (not reading bugmail)

Assignee

Comment 47

•

12 years ago

Looks like it's dying on the device with a null-pointer exception in libevent.  (It appeared from logcat that gecko was never starting, but in fact Gecko started but very quickly died.)

#0  0x40cf289c in evsignal_add (ev=0x14950b0)
    at ../../../gecko/ipc/chromium/src/third_party/libevent/signal.c:221
#1  0x40cf2b92 in epoll_add (arg=0x142c418, ev=0x14950b0)
    at ../../../gecko/ipc/chromium/src/third_party/libevent/epoll.c:265
#2  0x40ceecec in event_add (ev=0x14950b0, tv=0x0)
    at ../../../gecko/ipc/chromium/src/third_party/libevent/event.c:723
#3  0x40d0aa0a in base::MessagePumpLibevent::CatchSignal (this=0x142c3e8, sig=32, 
    sigevent=0x149506c, delegate=0x1495068)
    at ../../../gecko/ipc/chromium/src/base/message_pump_libevent.cc:281
#4  0x40cf900c in MessageLoopForIO::CatchSignal (this=<value optimized out>, sig=32, 
    sigevent=0x0, delegate=0x0) at ../../../gecko/ipc/chromium/src/base/message_loop.cc:581
#5  0x40ce1dde in Start (this=0x1495068)
    at ../../../gecko/xpcom/base/nsMemoryReporterManager.cpp:709

The line we're dying on in frame #0 is

221		TAILQ_INSERT_TAIL(&sig->evsigevents[evsignal], ev, ev_signal_next);

It's hard to tell exactly what's null here; |sig| is optimized out, and anyway who knows what this macro is doing.  It's definitely a write-to-null error, though, judging by the assembly.

I didn't see this on desktop.  Maybe the difference is that I have an opt build on the device and had a debug build on desktop.  Or maybe libevent doesn't like signal 32 on ARM for some reason.  I'll keep digging...

Justin Lebar (not reading bugmail)

Assignee

Comment 48

•

12 years ago

Looks like libevent is not safe to use with signals over 31 on either x86 or ARM.

In libevent's evsignal.h, we do

struct evsignal_info {
  [...]
  struct event_list evsigevents[NSIG];
  sig_atomic_t evsigcaught[NSIG];
  [...]
}

NSIG is defined by the Linux headers to be 32, on both x86 and ARM.

http://lxr.free-electrons.com/source/arch/x86/include/asm/signal.h#L37
http://lxr.free-electrons.com/source/arch/arm/include/asm/signal.h#L26

libevent even asserts that the signal number is less than NSIG, so it's not clear why my code didn't hit this assert when I ran x86-debug, unless libevent asserts are disabled even in debug builds.

I'm re-requesting review on the r-'ed patch, because this doesn't look simple to fix in libevent.  In particular, the Linux headers say not to assume SIGRTMAX is a constant, so I can't just substitute SIGRTMAX + 1 for NSIG everywhere.  (I did observe that it's not actually a constant under some build configuration, although I now forget exactly what I did.)

Other suggestions are welcome, of course.  :)

Justin Lebar (not reading bugmail)

Assignee

Comment 49

•

12 years ago

Comment on attachment 661018 [details] [diff] [review]
Bug 788021 - Part 2b: Signal handling goop.

I guess an alternative would be to go back to using a FIFO instead of a signal.  I think that might work if I explicitly close and then re-open the pipe after read() starts returning 0.

But at this point, I'm not convinced we should do that rather than use the thing that I already have working.

Attachment #661018 - Flags: review?(jones.chris.g)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 50

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #49)
> Comment on attachment 661018 [details] [diff] [review]
> Bug 788021 - Part 2b: Signal handling goop.
> 
> But at this point, I'm not convinced we should do that rather than use the
> thing that I already have working.

Please file a followup bug for this problem.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 51

•

12 years ago

Comment on attachment 661018 [details] [diff] [review]
Bug 788021 - Part 2b: Signal handling goop.

>diff --git a/xpcom/base/nsMemoryReporterManager.cpp b/xpcom/base/nsMemoryReporterManager.cpp

>+void
>+DumpAboutMemorySignalHandler(int aSignum)
>+{

Add a big scary comment noting that every bit of code called from here
has to be async-signal-safe.

>+  if (sDumpAboutMemoryPipeWriteFd != 0) {
>+    uint8_t signum = (uint8_t)aSignum;

Nit: reinterpret_cast.

>+void
>+InitializeDumpAboutMemoryWatcher()
>+{
>+  MOZ_ASSERT(NS_IsMainThread());
>+

Also assert !sSignalFdWatcher.

Looks good.  r=me with these.  Too bad libevent is broken, sorry about
that :/.

Attachment #661018 - Flags: review?(jones.chris.g) → review+

Justin Lebar (not reading bugmail)

Assignee

Comment 52

•

12 years ago

> Please file a followup bug for this problem.

I filed bug 794074.

Justin Lebar (not reading bugmail)

Assignee

Comment 53

•

12 years ago

Attached patch Part 1, v2: Add nsGZFileWriter. (obsolete) — Details — Splinter Review

I made some not-insignificant changes to this since glandium looked at this,
although this patch is still pretty trivial.

Attachment #664505 - Flags: review?(benjamin)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Attachment #661013 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Comment 54

•

12 years ago

Comment on attachment 664505 [details] [diff] [review]
Part 1, v2: Add nsGZFileWriter.

bsmedberg, a few of us are itching to get this into B2G, so if you're comfortable delegating the review to cjones, that would help us.  There's really nothing interesting here beyond the zlib usage that glandium already signed off on; imo it's silly this has to go through an XPCOM peer.  :)

Attachment #664505 - Attachment description: Part 1, v2: Add nsGzipWriter. → Part 1, v2: Add nsGZFileWriter.

Benjamin Smedberg

Comment 55

•

12 years ago

Comment on attachment 664505 [details] [diff] [review]
Part 1, v2: Add nsGZFileWriter.

It's not clear to me why we need the nsIGZFileWriter interface at all; the current use is calling entirely from c++, correct? If so, we could (should?) make this just a C++ class.

However if the IDL interface is important because this is is scripted somewhere, you need to remove the %{C++ block: the XPT file will have the wrong vtable entry slot for all following functions in the file, which means that calling finish() from script will actually call Write(const char* str). The simple way of doing this is to avoid the overload with Write2/Write3. Or just move those methods into the C++ class out of the interface.

Also void write(in ACString) is going to produce unexpected results for most callers if they pass in any non-ASCII text because the data will be byte-truncated instead of converted to UTF8. Pretty much all scriptable interfaces should use AUTF8String instead.

Attachment #664505 - Flags: review?(benjamin) → review-

Justin Lebar (not reading bugmail)

Assignee

Comment 56

•

12 years ago

Thanks a lot for the review; this obviously wasn't as correct as I thought.  :)

> It's not clear to me why we need the nsIGZFileWriter interface at all; the current use is calling 
> entirely from c++, correct?

It's used entirely from C++, but it unfortunately passes through XPCOM an nsISupports.

I could have just the one write method on the interface and then static_cast to the concrete type to gain access to the other methods, but that's not great from an ergonomic pov.  Or I could make the overloads non-virtual, by doing something like:

  write(in AUTF8String);
  writeLen(in string, in length);

  %{C++
  nsresult write(const char* s) {
    return writeLen(s, strlen(s));
  }
  %}

I'd still prefer that writeLen be called "write", but I think I could live with the interface above.  Do you have any preference here?

Justin Lebar (not reading bugmail)

Assignee

Comment 57

•

12 years ago

> I'd still prefer that writeLen be called "write", but I think I could live with the interface above.

Actually...I guess I could wrap the char* in a string class, and that would obviate the need to have writeLen on the JS interface.  Then I /could/ get the interface I want...

Benjamin Smedberg

Comment 58

•

12 years ago

You could add another overload also

%{C++
nsresult Write(const char* s, uint32_t len) {
  return WriteLen(s, len);
}
%}

In any case having virtuals in the C++ block isn't good.

Justin Lebar (not reading bugmail)

Assignee

Comment 59

•

12 years ago

Attached patch Part 1, v3: nsGZFileWriter — Details — Splinter Review

Let's see if git bz does the right thing with this attachment...

Attachment #664505 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Attachment #664975 - Flags: review?(benjamin)

Benjamin Smedberg

Updated

•

12 years ago

Attachment #664975 - Flags: review?(benjamin) → review+

Justin Lebar (not reading bugmail)

Assignee

Comment 60

•

12 years ago

Pushing to try mostly to check the latest bits against a Mac build: https://tbpl.mozilla.org/?tree=Try&rev=1f1b82d8a1a9

Justin Lebar (not reading bugmail)

Assignee

Comment 61

•

12 years ago

Ah, looks like MacOS doesn't have realtime signals.  I'll just make this Linux-only.  :shrug:

Justin Lebar (not reading bugmail)

Assignee

Comment 62

•

12 years ago

https://tbpl.mozilla.org/?tree=Try&rev=2af72d4e38be

Justin Lebar (not reading bugmail)

Assignee

Comment 63

•

12 years ago

We'll get there...

https://tbpl.mozilla.org/?tree=Try&rev=69a5ada834c4

Justin Lebar (not reading bugmail)

Assignee

Comment 64

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/15cc0aabab67
https://hg.mozilla.org/integration/mozilla-inbound/rev/04af81445c05
https://hg.mozilla.org/integration/mozilla-inbound/rev/17b5b5dcd308

Nicholas Nethercote [inactive]

Comment 65

•

12 years ago

\o/

Phil Ringnalda (:philor)

Comment 66

•

12 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/80f9522602c5 because Linux xpcshell was crashing after a few tens of thousands of "WARNING: Error reading from buffer in SignalPipeWatcher::OnFileCanReadWithoutBlocking.: file ../../../xpcom/base/nsMemoryReporterManager.cpp, line 786" (and I think I'm going to blame the talos crash in https://tbpl.mozilla.org/php/getParsedLog.php?id=15728133&tree=Mozilla-Inbound on this, too).

Justin Lebar (not reading bugmail)

Assignee

Comment 67

•

12 years ago

This "don't run all tests on try" thing is not working out as well as I might have thought...

Justin Lebar (not reading bugmail)

Assignee

Comment 68

•

12 years ago

This may fix it.  https://tbpl.mozilla.org/?tree=Try&rev=6bfce6721dbd

Phil Ringnalda (:philor)

Comment 69

•

12 years ago

Odd, "don't run all tests on try" is working exactly how I would have thought.

Justin Lebar (not reading bugmail)

Assignee

Comment 70

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #68)
> This may fix it.  https://tbpl.mozilla.org/?tree=Try&rev=6bfce6721dbd

It would help if I didn't comment out the fix.

https://tbpl.mozilla.org/?tree=Try&rev=8ce27fa2ae4e

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

12 years ago

Blocks: slim-fast

Justin Lebar (not reading bugmail)

Assignee

Comment 71

•

12 years ago

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/bab50a5494bb
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/deb340fb0b07
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/a67a2a9abebb

Ed Morley [:emorley]

Comment 72

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/bab50a5494bb
https://hg.mozilla.org/mozilla-central/rev/deb340fb0b07
https://hg.mozilla.org/mozilla-central/rev/a67a2a9abebb

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla18

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Blocks: 797904

Landry Breuil (:gaston)

Comment 73

•

12 years ago

Attached patch small followup fix for OpenBSD — Details — Splinter Review

Quick followup, this change is needed to fix the build on OpenBSD after the last commits. See http://buildbot.rhaalovely.net/builders/mozilla-central-amd64/builds/535/steps/build/logs/stdio for the failure and bugs #788012 / #785738 for precedents of the same fix.

Attachment #668373 - Flags: review?(justin.lebar+bug)

Justin Lebar (not reading bugmail)

Assignee

Comment 74

•

12 years ago

Comment on attachment 668373 [details] [diff] [review]
small followup fix for OpenBSD

Okay, but this is really silly, and we should push to get it fixed in NSPR (bug 634793).

Do you need me to push this for you?

Attachment #668373 - Flags: review?(justin.lebar+bug) → review+

Landry Breuil (:gaston)

Comment 75

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #74)
> Comment on attachment 668373 [details] [diff] [review]
> small followup fix for OpenBSD
> 
> Okay, but this is really silly, and we should push to get it fixed in NSPR
> (bug 634793).

I cant agree more..

> Do you need me to push this for you?

Pushed :
https://hg.mozilla.org/integration/mozilla-inbound/rev/4ff2eb1a97cd

Ed Morley [:emorley]

Comment 76

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/4ff2eb1a97cd

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

12 years ago

Blocks: 799686

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Blocks: 800486

Justin Lebar (not reading bugmail)

Assignee

Comment 77

•

12 years ago

I'm using this in the latest incarnation of the get_about_memory.py script (now located in $B2G_ROOT/tools).  Try it out and let me know if it doesn't work right!

https://github.com/jlebar/B2G/commit/fe9bbff3f5f50eb9f2cc27d70af5abb985556b26

Bug 788021 - Part A: Implement signal handler for triggering about:memory dump. 12 years ago Justin Lebar (not reading bugmail) 2.19 KB, patch	glandium : review+	Details \| Diff \| Splinter Review
Part B: Add and implement nsIMemoryReporterManager::DumpAboutMemory{,WithGC}(). 12 years ago Justin Lebar (not reading bugmail) 6.02 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
Part C: Add ScheduleAboutMemoryDump{,WithGC} functions to nsThread. 12 years ago Justin Lebar (not reading bugmail) 4.63 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review
Bug 788021 - Part 1: Add nsGzipWriter. 12 years ago Justin Lebar (not reading bugmail) 6.92 KB, patch	glandium : review+	Details \| Diff \| Splinter Review
Bug 788021 - Part 2a: Add PContent::DumpMemoryReportsToFile. 12 years ago Justin Lebar (not reading bugmail) 3.24 KB, patch	cjones : review+	Details \| Diff \| Splinter Review
Bug 788021 - Part 2b: Signal handling goop. 12 years ago Justin Lebar (not reading bugmail) 5.42 KB, patch	cjones : review+	Details \| Diff \| Splinter Review
Bug 788021 - Part 2c: Remaining changes to nsMemoryReporterManager. 12 years ago Justin Lebar (not reading bugmail) 21.49 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
Bug 788021 - Part 3: Use nsMemoryReporterManager::MinimizeMemoryUsage in aboutMemory.js. 12 years ago Justin Lebar (not reading bugmail) 3.51 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
(Not working) attempt at using a fifo 12 years ago Justin Lebar (not reading bugmail) 20.00 KB, patch		Details \| Diff \| Splinter Review
Part 1, v2: Add nsGZFileWriter. 12 years ago Justin Lebar (not reading bugmail) 7.35 KB, patch	benjamin : review-	Details \| Diff \| Splinter Review
Part 1, v3: nsGZFileWriter 12 years ago Justin Lebar (not reading bugmail) 7.54 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review
small followup fix for OpenBSD 12 years ago Landry Breuil (:gaston) 1.27 KB, patch	justin.lebar+bug : review+	Details \| Diff \| Splinter Review