Last Comment Bug 788021 - Dump about:memory to a file upon receiving a signal
: Dump about:memory to a file upon receiving a signal
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: about:memory (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla18
Assigned To: Justin Lebar (not reading bugmail)
:
Mentors:
: 788072 (view as bug list)
Depends on: 768470 790417
Blocks: slim-fast 783230 788072 797904 799686 800486
  Show dependency treegraph
 
Reported: 2012-09-03 15:18 PDT by Justin Lebar (not reading bugmail)
Modified: 2012-10-26 22:20 PDT (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Bug 788021 - Part A: Implement signal handler for triggering about:memory dump. (2.19 KB, patch)
2012-09-03 15:28 PDT, Justin Lebar (not reading bugmail)
mh+mozilla: review+
Details | Diff | Splinter Review
Part B: Add and implement nsIMemoryReporterManager::DumpAboutMemory{,WithGC}(). (6.02 KB, patch)
2012-09-03 15:29 PDT, Justin Lebar (not reading bugmail)
n.nethercote: review+
Details | Diff | Splinter Review
Part C: Add ScheduleAboutMemoryDump{,WithGC} functions to nsThread. (4.63 KB, patch)
2012-09-03 15:34 PDT, Justin Lebar (not reading bugmail)
benjamin: review+
Details | Diff | Splinter Review
Bug 788021 - Part 1: Add nsGzipWriter. (6.92 KB, patch)
2012-09-13 15:28 PDT, Justin Lebar (not reading bugmail)
mh+mozilla: review+
Details | Diff | Splinter Review
Bug 788021 - Part 2a: Add PContent::DumpMemoryReportsToFile. (3.24 KB, patch)
2012-09-13 15:34 PDT, Justin Lebar (not reading bugmail)
cjones.bugs: review+
Details | Diff | Splinter Review
Bug 788021 - Part 2b: Signal handling goop. (5.42 KB, patch)
2012-09-13 15:35 PDT, Justin Lebar (not reading bugmail)
cjones.bugs: review+
Details | Diff | Splinter Review
Bug 788021 - Part 2c: Remaining changes to nsMemoryReporterManager. (21.49 KB, patch)
2012-09-13 15:36 PDT, Justin Lebar (not reading bugmail)
n.nethercote: review+
Details | Diff | Splinter Review
Bug 788021 - Part 3: Use nsMemoryReporterManager::MinimizeMemoryUsage in aboutMemory.js. (3.51 KB, patch)
2012-09-13 15:37 PDT, Justin Lebar (not reading bugmail)
n.nethercote: review+
Details | Diff | Splinter Review
(Not working) attempt at using a fifo (20.00 KB, patch)
2012-09-13 18:00 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Part 1, v2: Add nsGZFileWriter. (7.35 KB, patch)
2012-09-25 07:45 PDT, Justin Lebar (not reading bugmail)
benjamin: review-
Details | Diff | Splinter Review
Part 1, v3: nsGZFileWriter (7.54 KB, patch)
2012-09-26 08:09 PDT, Justin Lebar (not reading bugmail)
benjamin: review+
Details | Diff | Splinter Review
small followup fix for OpenBSD (1.27 KB, patch)
2012-10-05 02:26 PDT, Landry Breuil (:gaston)
justin.lebar+bug: review+
Details | Diff | Splinter Review

Description Justin Lebar (not reading bugmail) 2012-09-03 15:18:14 PDT
Patches in a moment.
Comment 1 Justin Lebar (not reading bugmail) 2012-09-03 15:28:03 PDT
Created attachment 657926 [details] [diff] [review]
Bug 788021 - Part A: Implement signal handler for triggering about:memory dump.

These three patches are interdependent, and I'll land them folded together.
I'm posting them as three separate patches here because I'm touching three
different peoples' areas of focus.

The functions I'm calling in the signal handler are implemented in part C, and
they're trivially safe to call from a signal handler -- they simply atomically
set a global variable to 1.

I'm using SIGRTMIN and SIGRTMIN + 1 instead of SIGUSR1 and SIGUSR2 because
SIGUSR{1,2} are already used throughout our code for a variety of things.  It
looks like B2G uses SIGUSR1 for the alarm, for example.  Nobody seems to be
using the real-time signals.

One downside of using the realtime signals is that you can't do

  $ kill SIGUSR1 $(pidof firefox)

instead, you have to do

  $ kill -34 $(pid firefox)

It's -35 for SIGRTMIN + 1.  These numbers may be different on non-Linux
machines.  I hope that's the only problem with the approach I've taken here,
but this is unfamiliar territory to me.

(In fact, I don't understand why they're 34 and 35 and not 32 and 33 on my
machine, since /usr/include/linux/signal.h seems to include
/usr/include/asm/signal.h, which clearly defines SIGRTMIN as 32.)
Comment 2 Justin Lebar (not reading bugmail) 2012-09-03 15:29:15 PDT
Created attachment 657928 [details] [diff] [review]
Part B: Add and implement nsIMemoryReporterManager::DumpAboutMemory{,WithGC}().

The meat of this patch is simply implementing the minimize-memory-usage button
in C++.  Let me know if you're not comfortable signing off on the Gecko gunk
here and we can definitely find someone else to look over the code.
Comment 3 Justin Lebar (not reading bugmail) 2012-09-03 15:34:34 PDT
Created attachment 657931 [details] [diff] [review]
Part C: Add ScheduleAboutMemoryDump{,WithGC} functions to nsThread.

I'm not entirely comfortable with this patch, because it adds two more atomic
operations to the event loop, for what are very uncommon, low-priority events.

If you'd like, I think I can take this back down to one atomic op by using a
bitmask to shove [pending-low-memory, pending-about-memory-dump,
pending-about-memory-dump-with-gc] into one variable and using atomic
check-and-set operations to set each flag when appropriate.

But I don't have any idea how hot this code is, so I didn't want to prematurely
optimize.

Another option would be to check the pending-about-memory-dump variables at
some other point in time.  But I wasn't sure when else I could do the check.
Comment 4 Nicholas Nethercote [:njn] 2012-09-03 15:59:22 PDT
Comment on attachment 657926 [details] [diff] [review]
Bug 788021 - Part A: Implement signal handler for triggering about:memory dump.

Review of attachment 657926 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +548,5 @@
> +
> +// These are effectively constants, but we don't initialize them until
> +// RegisterSignalHandlers() is called to avoid creating static initializers.
> +static int sDumpAboutMemorySignum;
> +static int sDumpAboutMemoryAfterGCSignum;

static integers cause static initializers?  Huh.  In your 3rd patch you have some similar static integers that you initialize...
Comment 5 Nicholas Nethercote [:njn] 2012-09-03 16:09:17 PDT
Comment on attachment 657928 [details] [diff] [review]
Part B: Add and implement nsIMemoryReporterManager::DumpAboutMemory{,WithGC}().

Review of attachment 657928 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks for doing this!

I'm still unclear how this functionality will be used in practice.  For Fennec, we have the front-end Java process and the back-end Gecko process, right?  The Gecko process obviously has the memory reporting infrastructure, but does the Java process have anything?  Does about:memory currently report anything for the Java process?

And for B2G, my understanding is that the process situation is fluid, but we'll likely end up with multiple processes.  What will control the sending of the signals to each process?  I guess that can be worked out later?  Also, about:memory can currently only load a single JSON file at a time, but it shouldn't be hard to change that to handle multiple files.

::: xpcom/base/nsIMemoryReporter.idl
@@ +302,5 @@
> +  /*
> +   * This is just like dumpReports(), except it first runs a series of
> +   * GC/CCs in an attempt to minimize the application's memory usage.
> +   */
> +  void dumpReportsAfterGC ();

Good idea.  Maybe dumpReportAfterGCAndCC?

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +1061,5 @@
> +namespace {
> +
> +/**
> + * This runnable lets us do the equivalent of about:memory's minimize
> + * memory usage button: Fire a heap-minimize notification, spin the

Nit: "minimize memory usage" in quotes.

@@ +1067,5 @@
> + *
> + * When this process finishes, we invoke the callback function passed
> + * to the runnable's constructor.
> + */
> +class MinimizeMemoryUsageRunnable : public nsRunnable

Is it worth exposing this via XPCOM and removing minimizeMemoryUsage3x() from aboutMemory.js, to avoid the code duplication?  Just a thought -- feel free to ignore.
Comment 6 Justin Lebar (not reading bugmail) 2012-09-03 16:36:10 PDT
> static integers cause static initializers?  Huh.  In your 3rd patch you have some similar static 
> integers that you initialize...

Well, static data set to 0 is free, regardless.

But actually, I think static integers set to SIGRTMIN {,+1} should be totally
fine.  It's just the C++ initializers which are a problem, AIUI.  But glandium
can tell us for sure.

> I'm still unclear how this functionality will be used in practice.  For
> Fennec, we have the front-end Java process and the back-end Gecko process,
> right?

AIUI it's all one process.

> The Gecko process obviously has the memory reporting infrastructure, but does
> the Java process have anything?  Does about:memory currently report anything
> for the Java process?

I don't believe we report any Java-specific metrics (e.g. Java heap size), but
since it's all one process, the RSS should still be correct.

> And for B2G, my understanding is that the process situation is fluid, but
> we'll likely end up with multiple processes.

We currently have multiple processes and have no plans to remove them, yes.

> What will control the sending of the signals to each process?

My intent is to write a script which will log in to the phone, send the proper
signal to each process, and pull all the files off the phone.  This script
could even merge the files into one big file.

(We scripts in the master B2G repository for "start b2g on the phone within a
remote debugging session", so this script would be in good company.)

We may have to do a bit of work to figure out how to identify each process
meaningfully in about:memory.  I'm not sure yet.

@@ +1067,5 @@
> + *
> + * When this process finishes, we invoke the callback function passed
> + * to the runnable's constructor.
> + */
> +class MinimizeMemoryUsageRunnable : public nsRunnable

> Is it worth exposing this via XPCOM and removing minimizeMemoryUsage3x() from
> aboutMemory.js, to avoid the code duplication?  Just a thought -- feel free
> to ignore.

I didn't do this because exposing callback functions in XPCOM is kind of a
pain, but now that I think about it, I agree we should do it if only for the
sake of not duplicating code.  I'll fix it up.
Comment 7 Nicholas Nethercote [:njn] 2012-09-03 17:18:35 PDT
Thanks for the details!

> We may have to do a bit of work to figure out how to identify each process
> meaningfully in about:memory.  I'm not sure yet.

An identifying string argument to dumpReports() might be useful.
Comment 8 Mike Hommey [:glandium] 2012-09-03 23:44:37 PDT
Comment on attachment 657926 [details] [diff] [review]
Bug 788021 - Part A: Implement signal handler for triggering about:memory dump.

Review of attachment 657926 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +548,5 @@
> +
> +// These are effectively constants, but we don't initialize them until
> +// RegisterSignalHandlers() is called to avoid creating static initializers.
> +static int sDumpAboutMemorySignum;
> +static int sDumpAboutMemoryAfterGCSignum;

Simple types like int have no static initializer problems. However, since these values are constants, please make them const, at least they'd have a chance of being completely inlined and not waste memory.

@@ +565,5 @@
> +void
> +RegisterSignalHandlers()
> +{
> +  sDumpAboutMemorySignum = SIGRTMIN;
> +  sDumpAboutMemoryAfterGCSignum = SIGRTMIN + 1;

The more we're going to have code like that, the more we're going to have problems finding signal codes to use. Maybe it's time to give up on "simple" signals and use a single dispatch signal handler which would react on data given by the app sending the signal. (man sigqueue)
It would require an external program to send the signal, but at least, we wouldn't starve on signal codes. But that'd be for another bug.
Comment 9 Justin Lebar (not reading bugmail) 2012-09-06 06:18:02 PDT
Comment on attachment 657926 [details] [diff] [review]
Bug 788021 - Part A: Implement signal handler for triggering about:memory dump.

This may not work on the device.  The first problem is that the device's kill doesn't take values over 32, so I'm going to have to write that myself.  But then it's not clear that Gecko has a tmp directory to write to.

Anyway, I will continue to investigate.
Comment 10 Ted Mielczarek [:ted.mielczarek] 2012-09-06 06:39:58 PDT
On B2G you should be able to write to /data/local/b2g, which I *think* should be set as the TMPDIR environment variable.
Comment 11 Mike Hommey [:glandium] 2012-09-06 06:46:36 PDT
The directory service should give proper TmpD locations.
Comment 12 Justin Lebar (not reading bugmail) 2012-09-06 08:00:39 PDT
(In reply to Ted Mielczarek [:ted] from comment #10)
> On B2G you should be able to write to /data/local/b2g, which I *think*
> should be set as the TMPDIR environment variable.

Turns out it's /data/local/tmp, but yes, it works!
Comment 13 Justin Lebar (not reading bugmail) 2012-09-07 12:52:53 PDT
It takes about 30s or so for a single dump to appear on my Otoro phone.  That's kind of a long time.

If we're I/O bound, gzip'ing the output might make things faster.
Comment 14 Justin Lebar (not reading bugmail) 2012-09-07 13:06:45 PDT
Oh, it also looks like these nsIFileOutputStream::Write() calls are unbuffered.  :(
Comment 15 Nicholas Nethercote [:njn] 2012-09-07 14:17:49 PDT
Sigh.  The rss/pss/vsize/swap trees will be accounting for a large chunk of that 30s (I'm assuming they're present on Android?)
Comment 16 Justin Lebar (not reading bugmail) 2012-09-07 14:25:22 PDT
(In reply to Nicholas Nethercote [:njn] from comment #15)
> Sigh.  The rss/pss/vsize/swap trees will be accounting for a large chunk of
> that 30s (I'm assuming they're present on Android?)

Yes and yes.

It goes down to ~5s if I buffer the output.  I'm trying to figure out how to compress it now...
Comment 17 Justin Lebar (not reading bugmail) 2012-09-09 11:06:27 PDT
With gzip compression, it's roughly 1.5s, so that's a win.
Comment 18 Justin Lebar (not reading bugmail) 2012-09-09 12:31:47 PDT
Sorry, guys: I should have tested this more extensively on the device before asking for reviews.

Now that I'm testing in an automated fashion (without touching the phone for long periods), I've noticed that we often send a signal to a content process on the device whose event loop is entirely idle.  As a result, it never notices that the "dump about:memory" bit is set, and we never dump anything.

It's not simple to poke the event loop into running again when we hear the signal, since the signal could happen at any time and on any thread.  glandium suggested we could use the gecko remote debugging API for this, although I'm hesitant to try it because it's so new.  (It wasn't even landed as of a few days ago, iirc.)

Sorry again for not testing well in the first place.
Comment 19 Justin Lebar (not reading bugmail) 2012-09-09 12:33:34 PDT
> glandium suggested we could use the gecko remote debugging API for this, although I'm hesitant to 
> try it because it's so new.  (It wasn't even landed as of a few days ago, iirc.)

My plan of the moment is to send a signal only to the main process, whose event loop seems to run fairly often, and then have the main process send IPDL messages to its children instructing them to dump their about:memory contents.  We'll see if that works, but any other ideas are of course appreciated.
Comment 20 Ted Mielczarek [:ted.mielczarek] 2012-09-10 05:16:43 PDT
It's not that hard to fire an event from a background thread, but I don't know how safe it is from a signal handler.
Comment 21 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-09-10 19:30:24 PDT
<3
Comment 22 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-09-10 19:31:46 PDT
(In reply to Ted Mielczarek [:ted] from comment #20)
> It's not that hard to fire an event from a background thread, but I don't
> know how safe it is from a signal handler.

Can't do that from a signal handler, ja.
Comment 23 Justin Lebar (not reading bugmail) 2012-09-11 21:08:28 PDT
As an update as to where I'm at here: With gzip compression, we can dump and pull four processes' memory reports in a second or two.  It's working rather smoothly.

cjones suggested that we could improve our current signal mechanism by poll()'ing a signalfd (of /course/ Linux lets you treat a signal like a file) on B2G's I/O thread, as an alternative to the current mechanism.  Or we could even poll() a named pipe.

This change would be an improvement in two ways:

 - With the current approach, I had to write my own |kill|, because the one on the phone refused to send the realtime signals we're listening to, and

 - With the current approach, when the main process's event loop is quiescent (e.g. the screen is off and you haven't touched the phone in a few minutes), you have to touch the phone again before we notice the signal.

This would theoretically let us go back to my previous approach of signaling each process, as opposed to signaling only the main process.  But I actually like the current approach of signaling only the main process better because it lets us associate a unique identifier with the whole batch of about:memory dumps.

Another thing I still would like to address is the naming of processes.  Right now we name the process in the memory report only by its PID, but we can probably do better than that.  But I may punt this to a follow-up.
Comment 24 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-09-11 21:32:12 PDT
dhylands already landed the code to prctl(PR_SET_NAME) with the app name (or "(App)" for the prelaunch or "Browser").  You could draft off of that by extending his interface to prctl(PR_GET_NAME) what he's already done the work to set :).
Comment 25 Dave Hylands [:dhylands] 2012-09-11 22:30:02 PDT
The code to set the name is in ipc/glue/ProcessUtils_linux.cpp and a dummy version in ipc/glue/ProcessUtils_none.cpp (for non-linux platforms).

For non-linux platforms we should probably just stash whatever name was set and return it in the get.
Comment 26 Justin Lebar (not reading bugmail) 2012-09-13 08:33:22 PDT
*** Bug 788072 has been marked as a duplicate of this bug. ***
Comment 27 Justin Lebar (not reading bugmail) 2012-09-13 13:05:47 PDT
I've had a heck of a time trying to get a named pipe to work.

The trick, as glandium explained to me, is that you have to open the fifo as non-blocking, because otherwise open() will block until the other end of the fifo is opened.  But then we have to switch it to a blocking stream after you've opened it, so the I/O thread can select() on it and notify you when the fd is available for read without blocking.

In theory, you can fcntl the fifo to blocking after you open it.  But in practice,

  fcntl(mFd.get(), F_SETFL, 0)

seems to have no effect with respect to whether the fifo is blocking.

I guess I'll see if I have any more luck with a signalfd...
Comment 28 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-09-13 14:54:33 PDT
(In reply to Justin Lebar [:jlebar] from comment #27)
> I've had a heck of a time trying to get a named pipe to work.
> 
> But then we have to switch it to a blocking stream after
> you've opened it, so the I/O thread can select() on it and notify you when
> the fd is available for read without blocking.
> 

epoll should be perfectly happy with either.  Nonblocking just means that if you try to read/write more than current capacity, you immediately get an EWOULDBLOCK error.

Were you getting an error from epoll when the named pipe was in nonblocking mode?
Comment 29 Justin Lebar (not reading bugmail) 2012-09-13 15:28:17 PDT
Created attachment 661013 [details] [diff] [review]
Bug 788021 - Part 1: Add nsGzipWriter.

I'm happy to take suggestions for where to put these files; xpcom/base is
pretty random.

Note that this is slightly different from all the Necko code we have to write
gzip data, because raw gzip data is not a valid .gz file.

Now that I think  of it, perhaps we should change the name of this class to
reflect that difference.  nsGZFileWriter, maybe?
Comment 30 Justin Lebar (not reading bugmail) 2012-09-13 15:33:13 PDT
> Were you getting an error from epoll when the named pipe was in nonblocking mode?

The chromium event loop (which I admit I didn't investigate too deeply) was calling "CanReadWithoutBlocking" in an infinite loop.  I presumed that the problem was related to blocking, although perhaps that's wrong...
Comment 31 Justin Lebar (not reading bugmail) 2012-09-13 15:34:30 PDT
Created attachment 661017 [details] [diff] [review]
Bug 788021 - Part 2a: Add PContent::DumpMemoryReportsToFile.

As mentioned earlier, if we use signals (as we do in this set of patches), we
could signal each process separately, and then we wouldn't need IPC support.

However, I like this because it lets the root process attach an identifier to a
set of memory reports.  We couldn't do that if we signaled each process
separately.
Comment 32 Justin Lebar (not reading bugmail) 2012-09-13 15:35:35 PDT
Created attachment 661018 [details] [diff] [review]
Bug 788021 - Part 2b: Signal handling goop.

glandium did the last review of the Unix code here, but I'm touching the
chromium IO thread, so perhaps cjones is a better person to look at the new
version.
Comment 33 Justin Lebar (not reading bugmail) 2012-09-13 15:36:42 PDT
Created attachment 661020 [details] [diff] [review]
Bug 788021 - Part 2c: Remaining changes to nsMemoryReporterManager.

The other patches here are entirely different from the ones before; this one is
only mostly different.  The parts that are the same are mainly the
implementation of minimize-memory-usage.
Comment 34 Justin Lebar (not reading bugmail) 2012-09-13 15:37:35 PDT
Created attachment 661021 [details] [diff] [review]
Bug 788021 - Part 3: Use nsMemoryReporterManager::MinimizeMemoryUsage in aboutMemory.js.
Comment 35 Justin Lebar (not reading bugmail) 2012-09-13 16:16:55 PDT
PR for "killer" program in gonk-misc (like kill(1), except it can send signals over 32)

  https://github.com/mozilla-b2g/gonk-misc/pull/26

PR for "get-about-memory" script in B2G (which uses killer)

  https://github.com/mozilla-b2g/B2G/pull/112
Comment 36 Nicholas Nethercote [:njn] 2012-09-13 16:37:20 PDT
Comment on attachment 661013 [details] [diff] [review]
Bug 788021 - Part 1: Add nsGzipWriter.

Review of attachment 661013 [details] [diff] [review]:
-----------------------------------------------------------------

Beautiful comments in this patch, nice work.

::: xpcom/base/nsGzipWriter.cpp
@@ +71,5 @@
> +
> +  // gzwrite uses a return value of 0 to indicate failure.  Otherwise, it
> +  // returns the number of uncompressed bytes written.  To ensure we can
> +  // distinguish between success and failure, don't call gzwrite when we have 0
> +  // bytes to write.

Nice comment.

@@ +79,5 @@
> +
> +  // gzwrite never does a short write -- that is, the return value should
> +  // always be either 0 or aLen, and we shouldn't have to call it multiple
> +  // times in order to get it to read the whole buffer.
> +  int rv = gzwrite(mGZFile, aStr, aLen);

Another nice comment.

@@ +95,5 @@
> +  mFinished = true;
> +  gzclose(mGZFile);
> +
> +  // Ignore errors from gzclose; it's not like there's anything we can do about
> +  // it, at this point!

Another nice comment!

::: xpcom/base/nsIGzipWriter.idl
@@ +34,5 @@
> +   */
> +  void write(in ACString str);
> +
> +  /**
> +   * Write the given char* to the file (not including the null-terminator).

|char*| or |string|?  (Is this an xpcom synonym or something?)
Comment 37 Justin Lebar (not reading bugmail) 2012-09-13 18:00:59 PDT
Created attachment 661080 [details] [diff] [review]
(Not working) attempt at using a fifo

Chris/glandium, I don't know if you're curious to look at this.  I'm reasonably satisfied with the current approach, but if we could make a fifo work, it would be easier to signal the process (no need for this extra "killer" program) and we would be able to pass data in with the signal.

With this patch (the relevant fifo stuff is at the top), OnFileCanReadWithoutBlocking gets called over and over again after we do a write to the fifo (echo -n 1 > /tmp/firefox-fifo).  All the reads are zero-length.  Neither the fcntl nor the read sets errno.

Maybe the zero-length reads are informing me that the writer closed its end of the fifo?  I'm not sure what the right thing would be to do at this point -- re-create the fifo, maybe?
Comment 38 Nicholas Nethercote [:njn] 2012-09-13 20:36:03 PDT
Comment on attachment 661017 [details] [diff] [review]
Bug 788021 - Part 2a: Add PContent::DumpMemoryReportsToFile.

Review of attachment 661017 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentChild.cpp
@@ +427,5 @@
>  }
>  
> +bool
> +ContentChild::RecvDumpMemoryReportsToFile(const nsString& aIdentifier,
> +                                          const bool& aMinimizeMemoryUsage,

Would |aMinimizeMemoryUsageFirst| be a better name (here and in all the other places)?

@@ +435,5 @@
> +        do_GetService("@mozilla.org/memory-reporter-manager;1");
> +    NS_ENSURE_TRUE(mgr, true);
> +    mgr->DumpMemoryReportsToFile(aIdentifier,
> +                                 aMinimizeMemoryUsage,
> +                                 aDumpChildProcesses);

How does this work?  nsIMemoryReporterManager doesn't have a DumpMemoryReportsToFile() method, AFAICT...
Comment 39 Nicholas Nethercote [:njn] 2012-09-13 20:38:03 PDT
Comment on attachment 661018 [details] [diff] [review]
Bug 788021 - Part 2b: Signal handling goop.

Review of attachment 661018 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +553,5 @@
> +#ifdef XP_UNIX // {
> +namespace {
> +
> +/*
> + * The following code supports dumping about:memory upon receiving a signal.

Nit: it dumps all the memory reports.  about:memory shows most but not all of them (e.g. it ignores the compartment and ghost window reports).
Comment 40 Nicholas Nethercote [:njn] 2012-09-13 21:12:08 PDT
Comment on attachment 661020 [details] [diff] [review]
Bug 788021 - Part 2c: Remaining changes to nsMemoryReporterManager.

Review of attachment 661020 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, except I'm puzzled by the use of the temporary "incomplete" file.

BTW, will the signal stuff only work on Fennec/B2G, or will it work on desktop as well?

::: xpcom/base/nsIMemoryReporter.idl
@@ +306,5 @@
> +   * similar, such as memory-reports-<identifier>-<pid>-1.json.gz;  no existing
> +   * file will be overwritten).
> +   *
> +   * @param identifier this identifier will appear in the filename of our
> +   *   about:memory dump and those of our children.  If the identifier is

"those of our children" -- unless dumpChildProcesses is false.

@@ +309,5 @@
> +   * @param identifier this identifier will appear in the filename of our
> +   *   about:memory dump and those of our children.  If the identifier is
> +   *   empty, the dumpMemoryReportsToFile implementation may set it arbitrarily
> +   *   and use that new value for its own dump and the dumps of its child
> +   *   processes.  For example, we may set |identifier| to the number of

Who is "we" in this sentence?  Does it refer to the dumpMemoryReportsToFile implementation?

@@ +391,5 @@
>     * }
>     */
> +  void dumpMemoryReportsToFile (in AString identifier,
> +                                in bool minimizeMemoryUsage,
> +                                in bool dumpChildProcesses);

I thought you didn't like bool parameters :P

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +1086,5 @@
> +
> +    return DumpMemoryReportsToFileImpl(identifier);
> +}
> +
> +#define DUMP_ASCII(o, s) \

A name with four letters would be nice, so that calls to this match calls to DUMP().

Actually, I don't think having two different versions improves things much -- you get rid of five get() calls below at the cost of a duplicated macro here and a duplicated function in nsGzipWriter.  I'd just have one of each, but I'll let you decide what to do.

@@ +1109,5 @@
>      // We only want to dump reports for this process.  If |aProcess| is
>      // non-NULL that means we've received it from another process in response
>      // to a "child-memory-reporter-request" event;  ignore such reports.
>      if (!aProcess.IsEmpty()) {
>          return NS_OK;    

(Pre-existing) trailing whitespace.

@@ +1178,5 @@
> +    //
> +    //   incomplete-memory-report-<-identifier>-<pid>-42.json.gz
> +    //
> +    // in NS_OS_TEMP_DIR for writing.  When we're finished writing the report,
> +    // we'll rename this file and get rid of the "incomplete-" prefix.

I don't understand why the temporary file is necessary.  If it really is, please explain why in the comment.

@@ +1202,4 @@
>      NS_ENSURE_SUCCESS(rv, rv);
>     
>      rv = tmpFile->CreateUnique(nsIFile::NORMAL_FILE_TYPE, 0600); 
>      NS_ENSURE_SUCCESS(rv, rv);

Remove the trailing whitespace while you're here.

@@ +1298,5 @@
>  
>      return NS_OK;
>  }
>  
>  #undef DUMP

Add |#undef DUMP_ASCII|, if you decide to keep it.

@@ +1305,5 @@
> +
> +/**
> + * This runnable lets us implement nsIMemoryReporterManager::MinimizeMemoryUsage().
> + * We fire a heap-minimize notification, spin the event loop, and repeat this
> + * process a few times times.

"times times"

@@ +1337,5 @@
> +
> +    os->NotifyObservers(nullptr, "memory-pressure",
> +                        NS_LITERAL_STRING("heap-minimize").get());
> +    mRemainingIters--;
> +    NS_DispatchToMainThread(this);

I like the way you sort of inverted the logic from the aboutMemory.js version, it's easier to understand the counting now.
Comment 41 Justin Lebar (not reading bugmail) 2012-09-14 06:48:45 PDT
> BTW, will the signal stuff only work on Fennec/B2G, or will it work on desktop as well?

It definitely works on desktop Linux.  I'm not sure if it works on Mac; I haven't tried.

I was thinking of pref'ing it off on desktop, though.  Do you think it's worth leaving on?

> +  void dumpMemoryReportsToFile (in AString identifier,
> +                                in bool minimizeMemoryUsage,
> +                                in bool dumpChildProcesses);
>
> I thought you didn't like bool parameters :P

I don't.  :(  I actually tried to make this nicer, but it was hard because have to pass the bools through IPDL and XPIDL, which meant I needed both an XPIDL and an IPDL args struct, and...the only thing worse than this API was the alternative I came up with.

Oh well.

> I don't understand why the temporary file is necessary.  If it really is, please explain why in 
> the comment.

The temporary file helps avoid a race condition when we're pulling files off the device.  We signal the processes and then poll to grab the files.  But we don't want to pull an incomplete file.  I'll definitely add a comment.

> How does this work?  nsIMemoryReporterManager doesn't have a DumpMemoryReportsToFile() method, 
> AFAICT...

You saw that I changed it in part C.  I changed it because I wanted it to match the IPDL name, and I thought the IPDL name needed more context than "dumpReports".

> Actually, I don't think having two different versions improves things much -- you get rid of five 
> get() calls below at the cost of a duplicated macro here and a duplicated function in 
> nsGzipWriter.  I'd just have one of each, but I'll let you decide what to do.

It's mostly a premature optimization to avoid unnecessary calls to strlen.  It's (probably) not  important here, but it's important for the interface, I think, if only so people can push non null-terminated bytes to a gz file.

The ideal interface for nsGzipWriter would have an overloaded method Write() which could take a char* or an nsACString.  I thought you couldn't do that in XPIDL, but actually, I bet I could, if I just made one of the methods noscript (or in a C++ block).  You wouldn't lose anything because JS could still call the other method...
Comment 42 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-09-14 15:38:21 PDT
Comment on attachment 661018 [details] [diff] [review]
Bug 788021 - Part 2b: Signal handling goop.

I hate to rain on this parade, but libevent already provides an
interface to do what you're doing here :(.

See http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc#201 .

Let's use that instead.  It'll be a small change and much less code.
Comment 43 Mike Hommey [:glandium] 2012-09-19 23:43:28 PDT
Comment on attachment 661013 [details] [diff] [review]
Bug 788021 - Part 1: Add nsGzipWriter.

Review of attachment 661013 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, but I'm not an xpcom peer :)

::: xpcom/base/nsGzipWriter.h
@@ +18,5 @@
> +
> +  NS_DECL_ISUPPORTS
> +  NS_DECL_NSIGZIPWRITER
> +
> +  nsresult Write(const char* aStr, uint32_t aLen);

Any particular reason to make this one public, when it seems to be an implementation detail?
Comment 44 Justin Lebar (not reading bugmail) 2012-09-20 07:46:54 PDT
> LGTM, but I'm not an xpcom peer :)

Thing is, it's not really XPCOM code; it just happens to live in xpcom/.  I could move it to toolkit/ and it would suddenly come under an entirely different set of peoples' purview...

> Any particular reason to make this one public, when it seems to be an implementation 
> detail?

I guess I should either put it in the interface or make it private.  I figured it might be useful to someone (say you're reading a buffer from disk; you shouldn't have to wrap that in a nsDependentCString just to appease the interface Gods).
Comment 45 Justin Lebar (not reading bugmail) 2012-09-20 11:54:35 PDT
Comment on attachment 661013 [details] [diff] [review]
Bug 788021 - Part 1: Add nsGzipWriter.

I'll attach an updated version of this patch in a few minutes.
Comment 46 Justin Lebar (not reading bugmail) 2012-09-22 20:56:10 PDT
I have a new set of patches, but unfortunately my device is refusing to even start Gecko.  If the past is any guide, I should wait until I can test on the device before assuming everything will work properly, so I'll ask for reviews once I can test.
Comment 47 Justin Lebar (not reading bugmail) 2012-09-23 16:04:36 PDT
Looks like it's dying on the device with a null-pointer exception in libevent.  (It appeared from logcat that gecko was never starting, but in fact Gecko started but very quickly died.)

#0  0x40cf289c in evsignal_add (ev=0x14950b0)
    at ../../../gecko/ipc/chromium/src/third_party/libevent/signal.c:221
#1  0x40cf2b92 in epoll_add (arg=0x142c418, ev=0x14950b0)
    at ../../../gecko/ipc/chromium/src/third_party/libevent/epoll.c:265
#2  0x40ceecec in event_add (ev=0x14950b0, tv=0x0)
    at ../../../gecko/ipc/chromium/src/third_party/libevent/event.c:723
#3  0x40d0aa0a in base::MessagePumpLibevent::CatchSignal (this=0x142c3e8, sig=32, 
    sigevent=0x149506c, delegate=0x1495068)
    at ../../../gecko/ipc/chromium/src/base/message_pump_libevent.cc:281
#4  0x40cf900c in MessageLoopForIO::CatchSignal (this=<value optimized out>, sig=32, 
    sigevent=0x0, delegate=0x0) at ../../../gecko/ipc/chromium/src/base/message_loop.cc:581
#5  0x40ce1dde in Start (this=0x1495068)
    at ../../../gecko/xpcom/base/nsMemoryReporterManager.cpp:709

The line we're dying on in frame #0 is

221		TAILQ_INSERT_TAIL(&sig->evsigevents[evsignal], ev, ev_signal_next);

It's hard to tell exactly what's null here; |sig| is optimized out, and anyway who knows what this macro is doing.  It's definitely a write-to-null error, though, judging by the assembly.

I didn't see this on desktop.  Maybe the difference is that I have an opt build on the device and had a debug build on desktop.  Or maybe libevent doesn't like signal 32 on ARM for some reason.  I'll keep digging...
Comment 48 Justin Lebar (not reading bugmail) 2012-09-23 16:18:14 PDT
Looks like libevent is not safe to use with signals over 31 on either x86 or ARM.

In libevent's evsignal.h, we do

struct evsignal_info {
  [...]
  struct event_list evsigevents[NSIG];
  sig_atomic_t evsigcaught[NSIG];
  [...]
}

NSIG is defined by the Linux headers to be 32, on both x86 and ARM.

http://lxr.free-electrons.com/source/arch/x86/include/asm/signal.h#L37
http://lxr.free-electrons.com/source/arch/arm/include/asm/signal.h#L26

libevent even asserts that the signal number is less than NSIG, so it's not clear why my code didn't hit this assert when I ran x86-debug, unless libevent asserts are disabled even in debug builds.

I'm re-requesting review on the r-'ed patch, because this doesn't look simple to fix in libevent.  In particular, the Linux headers say not to assume SIGRTMAX is a constant, so I can't just substitute SIGRTMAX + 1 for NSIG everywhere.  (I did observe that it's not actually a constant under some build configuration, although I now forget exactly what I did.)

Other suggestions are welcome, of course.  :)
Comment 49 Justin Lebar (not reading bugmail) 2012-09-23 16:20:20 PDT
Comment on attachment 661018 [details] [diff] [review]
Bug 788021 - Part 2b: Signal handling goop.

I guess an alternative would be to go back to using a FIFO instead of a signal.  I think that might work if I explicitly close and then re-open the pipe after read() starts returning 0.

But at this point, I'm not convinced we should do that rather than use the thing that I already have working.
Comment 50 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-09-24 15:36:55 PDT
(In reply to Justin Lebar [:jlebar] from comment #49)
> Comment on attachment 661018 [details] [diff] [review]
> Bug 788021 - Part 2b: Signal handling goop.
> 
> But at this point, I'm not convinced we should do that rather than use the
> thing that I already have working.

Please file a followup bug for this problem.
Comment 51 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-09-24 15:44:23 PDT
Comment on attachment 661018 [details] [diff] [review]
Bug 788021 - Part 2b: Signal handling goop.

>diff --git a/xpcom/base/nsMemoryReporterManager.cpp b/xpcom/base/nsMemoryReporterManager.cpp

>+void
>+DumpAboutMemorySignalHandler(int aSignum)
>+{

Add a big scary comment noting that every bit of code called from here
has to be async-signal-safe.

>+  if (sDumpAboutMemoryPipeWriteFd != 0) {
>+    uint8_t signum = (uint8_t)aSignum;

Nit: reinterpret_cast.

>+void
>+InitializeDumpAboutMemoryWatcher()
>+{
>+  MOZ_ASSERT(NS_IsMainThread());
>+

Also assert !sSignalFdWatcher.

Looks good.  r=me with these.  Too bad libevent is broken, sorry about
that :/.
Comment 52 Justin Lebar (not reading bugmail) 2012-09-25 07:39:50 PDT
> Please file a followup bug for this problem.

I filed bug 794074.
Comment 53 Justin Lebar (not reading bugmail) 2012-09-25 07:45:09 PDT
Created attachment 664505 [details] [diff] [review]
Part 1, v2: Add nsGZFileWriter.

I made some not-insignificant changes to this since glandium looked at this,
although this patch is still pretty trivial.
Comment 54 Justin Lebar (not reading bugmail) 2012-09-25 13:00:56 PDT
Comment on attachment 664505 [details] [diff] [review]
Part 1, v2: Add nsGZFileWriter.

bsmedberg, a few of us are itching to get this into B2G, so if you're comfortable delegating the review to cjones, that would help us.  There's really nothing interesting here beyond the zlib usage that glandium already signed off on; imo it's silly this has to go through an XPCOM peer.  :)
Comment 55 Benjamin Smedberg [:bsmedberg] 2012-09-26 07:22:43 PDT
Comment on attachment 664505 [details] [diff] [review]
Part 1, v2: Add nsGZFileWriter.

It's not clear to me why we need the nsIGZFileWriter interface at all; the current use is calling entirely from c++, correct? If so, we could (should?) make this just a C++ class.

However if the IDL interface is important because this is is scripted somewhere, you need to remove the %{C++ block: the XPT file will have the wrong vtable entry slot for all following functions in the file, which means that calling finish() from script will actually call Write(const char* str). The simple way of doing this is to avoid the overload with Write2/Write3. Or just move those methods into the C++ class out of the interface.

Also void write(in ACString) is going to produce unexpected results for most callers if they pass in any non-ASCII text because the data will be byte-truncated instead of converted to UTF8. Pretty much all scriptable interfaces should use AUTF8String instead.
Comment 56 Justin Lebar (not reading bugmail) 2012-09-26 07:30:05 PDT
Thanks a lot for the review; this obviously wasn't as correct as I thought.  :)

> It's not clear to me why we need the nsIGZFileWriter interface at all; the current use is calling 
> entirely from c++, correct?

It's used entirely from C++, but it unfortunately passes through XPCOM an nsISupports.

I could have just the one write method on the interface and then static_cast to the concrete type to gain access to the other methods, but that's not great from an ergonomic pov.  Or I could make the overloads non-virtual, by doing something like:

  write(in AUTF8String);
  writeLen(in string, in length);

  %{C++
  nsresult write(const char* s) {
    return writeLen(s, strlen(s));
  }
  %}

I'd still prefer that writeLen be called "write", but I think I could live with the interface above.  Do you have any preference here?
Comment 57 Justin Lebar (not reading bugmail) 2012-09-26 07:32:24 PDT
> I'd still prefer that writeLen be called "write", but I think I could live with the interface above.

Actually...I guess I could wrap the char* in a string class, and that would obviate the need to have writeLen on the JS interface.  Then I /could/ get the interface I want...
Comment 58 Benjamin Smedberg [:bsmedberg] 2012-09-26 07:36:26 PDT
You could add another overload also

%{C++
nsresult Write(const char* s, uint32_t len) {
  return WriteLen(s, len);
}
%}

In any case having virtuals in the C++ block isn't good.
Comment 59 Justin Lebar (not reading bugmail) 2012-09-26 08:09:05 PDT
Created attachment 664975 [details] [diff] [review]
Part 1, v3: nsGZFileWriter

Let's see if git bz does the right thing with this attachment...
Comment 60 Justin Lebar (not reading bugmail) 2012-09-26 08:54:21 PDT
Pushing to try mostly to check the latest bits against a Mac build: https://tbpl.mozilla.org/?tree=Try&rev=1f1b82d8a1a9
Comment 61 Justin Lebar (not reading bugmail) 2012-09-26 11:12:20 PDT
Ah, looks like MacOS doesn't have realtime signals.  I'll just make this Linux-only.  :shrug:
Comment 62 Justin Lebar (not reading bugmail) 2012-09-26 11:17:22 PDT
https://tbpl.mozilla.org/?tree=Try&rev=2af72d4e38be
Comment 63 Justin Lebar (not reading bugmail) 2012-09-27 09:38:50 PDT
We'll get there...

https://tbpl.mozilla.org/?tree=Try&rev=69a5ada834c4
Comment 65 Nicholas Nethercote [:njn] 2012-10-01 21:17:51 PDT
\o/
Comment 66 Phil Ringnalda (:philor) 2012-10-01 23:05:03 PDT
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/80f9522602c5 because Linux xpcshell was crashing after a few tens of thousands of "WARNING: Error reading from buffer in SignalPipeWatcher::OnFileCanReadWithoutBlocking.: file ../../../xpcom/base/nsMemoryReporterManager.cpp, line 786" (and I think I'm going to blame the talos crash in https://tbpl.mozilla.org/php/getParsedLog.php?id=15728133&tree=Mozilla-Inbound on this, too).
Comment 67 Justin Lebar (not reading bugmail) 2012-10-02 08:15:21 PDT
This "don't run all tests on try" thing is not working out as well as I might have thought...
Comment 68 Justin Lebar (not reading bugmail) 2012-10-02 08:43:17 PDT
This may fix it.  https://tbpl.mozilla.org/?tree=Try&rev=6bfce6721dbd
Comment 69 Phil Ringnalda (:philor) 2012-10-02 08:48:34 PDT
Odd, "don't run all tests on try" is working exactly how I would have thought.
Comment 70 Justin Lebar (not reading bugmail) 2012-10-02 11:01:28 PDT
(In reply to Justin Lebar [:jlebar] from comment #68)
> This may fix it.  https://tbpl.mozilla.org/?tree=Try&rev=6bfce6721dbd

It would help if I didn't comment out the fix.

https://tbpl.mozilla.org/?tree=Try&rev=8ce27fa2ae4e
Comment 73 Landry Breuil (:gaston) 2012-10-05 02:26:23 PDT
Created attachment 668373 [details] [diff] [review]
small followup fix for OpenBSD

Quick followup, this change is needed to fix the build on OpenBSD after the last commits. See http://buildbot.rhaalovely.net/builders/mozilla-central-amd64/builds/535/steps/build/logs/stdio for the failure and bugs #788012 / #785738 for precedents of the same fix.
Comment 74 Justin Lebar (not reading bugmail) 2012-10-05 07:35:55 PDT
Comment on attachment 668373 [details] [diff] [review]
small followup fix for OpenBSD

Okay, but this is really silly, and we should push to get it fixed in NSPR (bug 634793).

Do you need me to push this for you?
Comment 75 Landry Breuil (:gaston) 2012-10-06 02:05:25 PDT
(In reply to Justin Lebar [:jlebar] from comment #74)
> Comment on attachment 668373 [details] [diff] [review]
> small followup fix for OpenBSD
> 
> Okay, but this is really silly, and we should push to get it fixed in NSPR
> (bug 634793).

I cant agree more..

> Do you need me to push this for you?

Pushed :
https://hg.mozilla.org/integration/mozilla-inbound/rev/4ff2eb1a97cd
Comment 76 Ed Morley [:emorley] 2012-10-06 12:40:39 PDT
https://hg.mozilla.org/mozilla-central/rev/4ff2eb1a97cd
Comment 77 Justin Lebar (not reading bugmail) 2012-10-26 22:20:57 PDT
I'm using this in the latest incarnation of the get_about_memory.py script (now located in $B2G_ROOT/tools).  Try it out and let me know if it doesn't work right!

https://github.com/jlebar/B2G/commit/fe9bbff3f5f50eb9f2cc27d70af5abb985556b26

Note You need to log in before you can comment on or make changes to this bug.