Last Comment Bug 800486 - Dump GC/CC logs upon receiving a signal
: Dump GC/CC logs upon receiving a signal
Status: RESOLVED FIXED
[MemShrink]
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla19
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 788021 805856 806239 806473
Blocks: 723783
  Show dependency treegraph
 
Reported: 2012-10-11 12:07 PDT by Justin Lebar (not reading bugmail)
Modified: 2012-10-29 11:27 PDT (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2. (23.36 KB, patch)
2012-10-11 21:06 PDT, Justin Lebar (not reading bugmail)
continuation: review+
cjones.bugs: review+
n.nethercote: review+
Details | Diff | Splinter Review
Part 2, v1: Move signal-watching and memory-report-dumping code out of nsMemoryReporterManager and into a new file, MemoryInfoDumper. (54.61 KB, patch)
2012-10-11 21:06 PDT, Justin Lebar (not reading bugmail)
n.nethercote: review+
Details | Diff | Splinter Review
Part 1, as checked in (comment 24): Dump a GC and CC log upon receiving SIGRTMIN + 2. r=mccr8,njn,cjones (23.98 KB, patch)
2012-10-15 19:17 PDT, Justin Lebar (not reading bugmail)
justin.lebar+bug: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
Part 2 as checked in (comment 24): Move signal-watching and memory-report-dumping code out of nsMemoryReporterManager and into a new file, MemoryInfoDumper. r=njn (56.42 KB, patch)
2012-10-15 19:17 PDT, Justin Lebar (not reading bugmail)
justin.lebar+bug: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
Part 3 as checked in (comment 24): Minor fixups to MemoryInfoDumper.cpp. r=njn (5.77 KB, patch)
2012-10-15 19:17 PDT, Justin Lebar (not reading bugmail)
justin.lebar+bug: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Justin Lebar (not reading bugmail) 2012-10-11 12:07:14 PDT
We currently dump memory reporters upon receiving a signal, mostly for testing on B2G.  We can do the same thing with GC/CC logs, which it seems would be useful.

I have a mostly-complete patch to make this work (and also to overhaul a few aspects of logging).
Comment 1 Andrew McCreight [:mccr8] 2012-10-11 12:11:53 PDT
How much is space at a premium?  It might be handy to have ways to dump GC and CC logs separately, as GC logs can be huge.  They are independent: we only dump them at the same time out for laziness/simplicity.
Comment 2 Justin Lebar (not reading bugmail) 2012-10-11 12:17:34 PDT
(In reply to Andrew McCreight [:mccr8] from comment #1)
> How much is space at a premium?  It might be handy to have ways to dump GC
> and CC logs separately, as GC logs can be huge.  They are independent: we
> only dump them at the same time out for laziness/simplicity.

I was thinking we'd compress them as part of the dumping process.  We do this already with the memory dumps (which can also be huge), and doing so provided a substantial speedup.

I feel like compression is a good way to start, since splitting up GC/CC doesn't speed up the GC dumps when you need/want them.

(FYI, I'm also probably going to add a method to dump all strings, since that seems particularly useful.  Maybe as a generalization, we could just dump all live JS objects instead of dumping only the reachable ones?)
Comment 3 Andrew McCreight [:mccr8] 2012-10-11 12:22:26 PDT
Bill has a patch for that in bug 739681, that perhaps died on the hills of my overly long review comments.
Comment 4 Justin Lebar (not reading bugmail) 2012-10-11 21:06:15 PDT
Created attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.
Comment 5 Justin Lebar (not reading bugmail) 2012-10-11 21:06:25 PDT
Created attachment 670680 [details] [diff] [review]
Part 2, v1: Move signal-watching and memory-report-dumping code out of nsMemoryReporterManager and into a new file, MemoryInfoDumper.
Comment 6 Justin Lebar (not reading bugmail) 2012-10-11 21:12:27 PDT
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

 dom/base/Makefile.in                     |    1 +
 dom/ipc/ContentChild.cpp                 |   11 +++
 dom/ipc/ContentChild.h                   |    5 +-
 dom/ipc/PContent.ipdl                    |    9 ++

r? cjones.  These are pretty mechanical changes.

 xpcom/base/nsCycleCollector.cpp          |  152 +++++++++++++++++++++--
 xpcom/base/nsICycleCollectorListener.idl |    5 +-

r? mccr8.  These are changes that I need mostly so I can integrate gc/cc log dumping with my program which pulls the files off the device.

One thing I didn't do here is compress the logs.  We can do that later, or not, but it was going to be kind of involved, so I wanted to get this out here.

 xpcom/base/nsIMemoryReporter.idl         |    7 ++
 xpcom/base/nsMemoryReporterManager.cpp   |  102 +++++++++++++++++---

r? njn.  This is pretty mechanical, just adding the infrastructure needed for cc/gc log dumping here.
Comment 7 Justin Lebar (not reading bugmail) 2012-10-11 21:19:26 PDT
Comment on attachment 670680 [details] [diff] [review]
Part 2, v1: Move signal-watching and memory-report-dumping code out of nsMemoryReporterManager and into a new file, MemoryInfoDumper.

I think Nick gets to review this one by himself.

I made this change for two reasons.

  1) nsMemoryReporterManager was getting unwieldy with all the code I'd been adding.
  2) CC logging doesn't have anything to do with memory reporting.

I changed basically nothing here -- it's almost entirely a simple cut-and-paste job on top of patch v1.  I did change the indentation of the file, though, so now it's consistently 2 spaces.  :)

It may make sense to move some of the CC logging code into this file as well, but that's for later.
Comment 8 Nicholas Nethercote [:njn] 2012-10-11 21:55:46 PDT
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

Review of attachment 670679 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +831,5 @@
> +      // Dump our memory reports (but run this on the main thread!).
> +      nsRefPtr<DumpMemoryReportsRunnable> runnable =
> +        new DumpMemoryReportsRunnable(
> +            /* identifier = */ EmptyString(),
> +            signum == sDumpAboutMemoryAfterMMUSignum,

I'd pull this comparison out into a local variable called "doMMUFirst" or something like that.

@@ +1230,5 @@
> +    // children's, allowing us to identify which files are from the same
> +    // request.
> +    nsString identifier(aIdentifier);
> +    if (identifier.IsEmpty()) {
> +        identifier.AppendInt(static_cast<int64_t>(PR_Now()) / 1000000);

This code is identical to that at the top of DumpMemoryReportsToFile(), and the comment is almost identical.  Factor out into a separate function, please.
Comment 9 Nicholas Nethercote [:njn] 2012-10-11 22:09:26 PDT
Comment on attachment 670680 [details] [diff] [review]
Part 2, v1: Move signal-watching and memory-report-dumping code out of nsMemoryReporterManager and into a new file, MemoryInfoDumper.

Review of attachment 670680 [details] [diff] [review]:
-----------------------------------------------------------------

I skimmed this a bit.  But I did notice that you can't call the dumping functions from JS code any more.  I guess that doesn't matter.

::: xpcom/base/MemoryInfoDumper.h
@@ +16,5 @@
> +/**
> + * This class facilitates dumping information about our memory usage to disk.
> + *
> + * Its cpp file also has Linux-only code which watches various OS signals and
> + * dumps memory info upon receiving a signal.  You can active these listeners

s/active/activate/

@@ +17,5 @@
> + * This class facilitates dumping information about our memory usage to disk.
> + *
> + * Its cpp file also has Linux-only code which watches various OS signals and
> + * dumps memory info upon receiving a signal.  You can active these listeners
> + * by calling Initialize();

s/;/./ ?

@@ +22,5 @@
> + */
> +class MemoryInfoDumper
> +{
> +public:
> +  static void Initialize();

Initialize() is a very generic name... why not just remove Initialize(), and move InitializeSignalWatcher() into this class, guarded with the #ifdef XP_LINUX.

::: xpcom/base/nsIMemoryReporter.idl
@@ +225,5 @@
>     */
>    readonly attribute int64_t explicitNonHeap;
>  };
>  
> +[scriptable, builtinclass, uuid(8b670411-ea2a-44c2-a36b-529db0670821)]

What does |builtinclass| mean?
Comment 10 Justin Lebar (not reading bugmail) 2012-10-12 06:16:12 PDT
> But I did notice that you can't call the dumping functions from JS code any more.  I guess that 
> doesn't matter.

Call it my own private war against XPCOM.  :)

If we need to call this from JS, I'd see if we could use WebIDL (which can make chrome-only interfaces), although I'm not sure if WebIDL has the equivalent of do_GetService().  But for now, it doesn't seem to matter...

> Initialize() is a very generic name... why not just remove Initialize(), and move 
> InitializeSignalWatcher() into this class, guarded with the #ifdef XP_LINUX.

Ooh, I don't want the public interface of a class to change depending on what platform you're on.  Then if for example we changed the platforms where we had signal handlers, we'd have to change all the callers.

Initialize() gives us information hiding, which is good?

> What does |builtinclass| mean?

That the interface must be implemented in C++.  You can only put [infallible] on attributes in a builtinclass, for reasons that were not entirely clear to me -- I think there was a concern that JS might accidentally throw exceptions, or that the engine might generate exceptions without our code's intervention.
Comment 11 Andrew McCreight [:mccr8] 2012-10-12 07:50:28 PDT
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

Review of attachment 670679 [details] [diff] [review]:
-----------------------------------------------------------------

Please check that this works on Windows and one of {OSX, Linux}.  If you don't have a Windows machine you can try this on, then let me know and I can check it out with a try build. This stuff tends to be finicky on different platforms, and unfortunately we don't have any tests for it.

I like how the state for GC dumping is teased apart from the CC dumping a bit more. It lays the groundwork for some future refactoring to separate out GC dumping into a separate function or something.

::: xpcom/base/nsCycleCollector.cpp
@@ -1322,2 @@
>  
> -        ++gLogCounter;

Please don't drop the log counter. For manually triggered logging it doesn't matter much, but a very common work flow is to log every CC, and I'd like to have something clearer than timestamps to try to order 30 or more CC logs.

It looks to me like you could fix that just by leaving the gLogCounter code alone, then adding some logic into CreateTempFile that will add the current log counter into the filename somewhere before the process ID.

You probably only need to actually include it if mFilenameIdentifier.IsEmpty().

@@ +1323,5 @@
> +        // completes.
> +        //
> +        // (We do this because we don't want scripts which poll the filesystem
> +        // looking for gc/cc dumps to grab a file before we're finished writing
> +        // to it.)

No need to split this into a bunch of paragraphs. The parens can probably also be dropped.

@@ +1359,5 @@
>          }
>  
> +        // Open a file for dumping the CC graph.  We again prefix with
> +        // "incomplete-".
> +        mOutFile = CreateTempFile("incomplete-cc-edges");

You should do an ENSURE_STATE thing for mOutFile.

All this ENSURE_STATE stuff is a little tedious. If only we had monads. ;)

@@ +1473,5 @@
> +            NS_ENSURE_STATE(!logFileFinalDestinationName.IsEmpty());
> +
> +            mOutFile->MoveTo(/* directory = */ nullptr,
> +                             logFileFinalDestinationName);
> +            mOutFile = nullptr;

Please add back console logging of the cycle collector log name.

@@ +1532,5 @@
> +    {
> +        nsPrintfCString filename("%s.%d%s%s.log",
> +            aPrefix,
> +            base::GetCurrentProcId(),
> +            !mFilenameIdentifier.IsEmpty() ? "." : "",

Maybe reverse the cases here to eliminate the ! from the trinary operator.
Comment 12 Justin Lebar (not reading bugmail) 2012-10-12 07:57:59 PDT
> For manually triggered logging it doesn't matter much, but a very common work flow is to log every 
> CC, and I'd like to have something clearer than timestamps to try to order 30 or more CC logs.

The code as written before would overwrite any old cc logs which happened to be sitting around in the temp directory.  This new code will never overwrite a file; instead, it will create a file named something like cc-log.$WHATEVER-2.log.

So if you don't specify an identifier, you'll get files named

  cc-log.$PID.log
  cc-log.$PID-1.log
  cc-log.$PID-2.log

and so on.

Is that OK?
Comment 13 Andrew McCreight [:mccr8] 2012-10-12 08:06:28 PDT
Ah, nice!  Sorry, I'm not familiar with our file functions.  I guess you'll get weird behavior in cases with PID collisions, if somebody deletes some but not all log files, etc., but no big deal.

r=me with the other comments addressed.
Comment 14 Nicholas Nethercote [:njn] 2012-10-12 13:47:13 PDT
> Initialize() gives us information hiding, which is good?

Fair enough.
Comment 15 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-10-12 16:15:52 PDT
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

>diff --git a/dom/base/Makefile.in b/dom/base/Makefile.in

I don't know if exporting this header is a good idea.  If you're not
sure, ask a DOM peer.  I'll leave it up to your judgment.

>diff --git a/dom/ipc/PContent.ipdl b/dom/ipc/PContent.ipdl

>+    /**
>+     * Dump this process's GC and CC logs.
>+     *
>+     * For documentation on the args, see
>+     * nsIMemoryReporterManager::dumpGCAndCCLogsToFile.
>+     */
>+    async DumpGCAndCCLogsToFile(nsString identifier,
>+                                bool dumpChildProcesses);

Document what the identifier is.
Comment 16 Justin Lebar (not reading bugmail) 2012-10-15 07:28:34 PDT
> Document what the identifier is.

I feel pretty dumb copy-pasting the full documentation for DumpGCAndCCLogsToFile into PContent, because there's no way we're going to keep the PContent version up to date.  We'd also have to copy the docs for DumpMemoryReportsToFile, to be consistent.

Copying the docs has the additional problem that it doesn't make it clear that the PContent version is in fact identical to the MemoryInfoDumper version -- you have to manually diff the comments in order to see that.

I could add a brief comment just about the |identifier| argument, but that feels arbitrary.

Neither of these changes seems like an improvement to me over the existing comments, but I'll do whatever you want here; just let me know what your preference is.
Comment 17 Justin Lebar (not reading bugmail) 2012-10-15 08:04:44 PDT
Windows builds are a'coming to https://tbpl.mozilla.org/?tree=Try&rev=2cd33246dfe9.
Comment 18 Justin Lebar (not reading bugmail) 2012-10-15 12:29:27 PDT
(In reply to Justin Lebar [:jlebar] from comment #17)
> Windows builds are a'coming to
> https://tbpl.mozilla.org/?tree=Try&rev=2cd33246dfe9.

I apparently pulled a busted version of m-c.  Let's try again:
https://tbpl.mozilla.org/?tree=Try&rev=b1d82b934b42
Comment 19 Justin Lebar (not reading bugmail) 2012-10-15 13:48:04 PDT
Either I'm misreading the logs, my commit is producing very strange errors, or I'm having seriously bad luck here.

e:\builds\moz2_slave\try-w32-dbg\build\obj-firefox\dist\include\nsError.h(135) : warning C4005: 'ERROR' : macro redefinition

I /know/ this is based off good commit.  https://tbpl.mozilla.org/?tree=Try&rev=85b58d6a4678
Comment 20 Justin Lebar (not reading bugmail) 2012-10-15 14:43:43 PDT
Ah, apparently I'm hitting bug 794490, which means it's a real compile error, and a busted compile log.  :(
Comment 21 Mozilla RelEng Bot 2012-10-15 15:30:32 PDT
Try run for 5741ca704be9 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=5741ca704be9
Results (out of 2 total builds):
    failure: 2
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-5741ca704be9
Comment 22 Mozilla RelEng Bot 2012-10-15 16:30:35 PDT
Try run for 23510880ff6e is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=23510880ff6e
Results (out of 1 total builds):
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-23510880ff6e
Comment 23 Mozilla RelEng Bot 2012-10-15 19:00:32 PDT
Try run for deda9ddd2e25 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=deda9ddd2e25
Results (out of 49 total builds):
    exception: 2
    success: 1
    failure: 46
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-deda9ddd2e25
Comment 24 Justin Lebar (not reading bugmail) 2012-10-15 19:13:30 PDT
> Document what the identifier is.

We resolved this on IRC; cjones is fine with the existing "For documentation, see nsIMemoryReporterManager::dumpGCAndCCLogsToFile".

Word from the recently enwisened: NS_LITERAL_STRING("foo" "bar") doesn't work.  What seems to be happening is that on MSVC NS_LITERAL_STRING("foo" "bar") expands to something like L"foo" "bar" (where L"foo" means "foo" as a wide string), which then is a compile error because we're trying to concatenate a wide string and a narrow string.

Anywho, inbound for FF19.

https://hg.mozilla.org/integration/mozilla-inbound/rev/e676a99a7a8d
https://hg.mozilla.org/integration/mozilla-inbound/rev/295110f8c892
https://hg.mozilla.org/integration/mozilla-inbound/rev/bdb3e55bddda
Comment 25 Justin Lebar (not reading bugmail) 2012-10-15 19:13:48 PDT
(Oh, and I did test that CC logging still works on Windows.)
Comment 26 Justin Lebar (not reading bugmail) 2012-10-15 19:16:09 PDT
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

[Approval Request Comment]
Affects only logging code, necessary to retrieve these logs from B2G devices.
Comment 27 Justin Lebar (not reading bugmail) 2012-10-15 19:17:03 PDT
Created attachment 671700 [details] [diff] [review]
Part 1, as checked in (comment 24): Dump a GC and CC log upon receiving SIGRTMIN + 2. r=mccr8,njn,cjones
Comment 28 Justin Lebar (not reading bugmail) 2012-10-15 19:17:17 PDT
Created attachment 671701 [details] [diff] [review]
Part 2 as checked in (comment 24): Move signal-watching and memory-report-dumping code out of nsMemoryReporterManager and into a new file, MemoryInfoDumper. r=njn
Comment 29 Justin Lebar (not reading bugmail) 2012-10-15 19:17:33 PDT
Created attachment 671702 [details] [diff] [review]
Part 3 as checked in (comment 24): Minor fixups to MemoryInfoDumper.cpp. r=njn
Comment 30 Justin Lebar (not reading bugmail) 2012-10-15 19:20:14 PDT
Comment on attachment 671700 [details] [diff] [review]
Part 1, as checked in (comment 24): Dump a GC and CC log upon receiving SIGRTMIN + 2. r=mccr8,njn,cjones

Triage people: Can you provide guidance on whether you want to a+ patches as checked in to the tree, or whether I can request approval on attachments which differ from what I've checked in?

It's a non-negligible amount of work to re-attach patches etc., and I don't expect anyone is a+'ing these patches based on a careful examination of their contents, so I'd like not to go through the work of attaching final patches it if it's not helping you somehow.
Comment 32 Kartikaya Gupta (email:kats@mozilla.com) 2012-10-16 23:32:15 PDT
Aaah. Patch #2, which moved functions from nsMemoryReporterManager to MemoryInfoDumper, regressed my patch from bug 799476, since it removed the functions from the IDL as well. Are there any plans to expose the new MemoryInfoDumper code to script? Or alternatively, is there some supported programmatic way to invoke this so I don't have to duplicate the code?
Comment 33 Nicholas Nethercote [:njn] 2012-10-17 01:47:48 PDT
(In reply to Kartikaya Gupta (:kats) from comment #32)
> Aaah. Patch #2, which moved functions from nsMemoryReporterManager to
> MemoryInfoDumper, regressed my patch from bug 799476, since it removed the
> functions from the IDL as well.

See the top of comment 10! :/
Comment 34 Justin Lebar (not reading bugmail) 2012-10-17 08:47:45 PDT
bz &co, is there a way to expose a service to chrome JS through WebIDL (like XPCOM's GetService)?  Or should I just go with XPCOM here?
Comment 35 Boris Zbarsky [:bz] 2012-10-17 09:49:30 PDT
I'm not sure what the WebIDL question is.  Are you trying to create a method on window that's only available in chrome and hands out an object?  Or are you trying to hand out an object always but only expose API on that object in chrome?
Comment 36 Justin Lebar (not reading bugmail) 2012-10-17 10:39:15 PDT
> Are you trying to create a method on window that's only available in chrome and hands out an 
> object?

Like that, except we don't necessarily have a window, so I was hoping to create the object out of thin air (like GetService).
Comment 37 Boris Zbarsky [:bz] 2012-10-17 11:00:43 PDT
Ah.  I don't think we have available infrastructure for that in WebIDL....  We _could_ try to set up something that would only register a constructor on chrome scopes, but it would still work only on JS globals that have WebIDL things going on to start with.
Comment 38 Alex Keybl [:akeybl] 2012-10-17 16:51:45 PDT
Comment on attachment 671700 [details] [diff] [review]
Part 1, as checked in (comment 24): Dump a GC and CC log upon receiving SIGRTMIN + 2. r=mccr8,njn,cjones

This risk to desktop/mobile is manageable. Approving for Aurora 18 uplift.
Comment 39 Alex Keybl [:akeybl] 2012-10-17 16:53:54 PDT
(In reply to Justin Lebar [:jlebar] from comment #30)
> It's a non-negligible amount of work to re-attach patches etc., and I don't
> expect anyone is a+'ing these patches based on a careful examination of
> their contents, so I'd like not to go through the work of attaching final
> patches it if it's not helping you somehow.

It's mostly a matter of book-keeping, so that we know what we can tell in the future if what you landed was what you intended to land. If only very minor changes are made, we'll remain flexible.
Comment 40 Justin Lebar (not reading bugmail) 2012-10-17 17:11:14 PDT
> It's mostly a matter of book-keeping, so that we know what we can tell in the future if what you 
> landed was what you intended to land.

Can you compare the diff on m-c and aurora?  That seems like a surer bet than comparing patches anyway.

Note You need to log in before you can comment on or make changes to this bug.