Closed Bug 800486 Opened 12 years ago Closed 12 years ago

Dump GC/CC logs upon receiving a signal

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla19
Tracking Status
firefox18 --- fixed

People

(Reporter: justin.lebar+bug, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink])

Attachments

(3 files, 2 obsolete files)

We currently dump memory reporters upon receiving a signal, mostly for testing on B2G.  We can do the same thing with GC/CC logs, which it seems would be useful.

I have a mostly-complete patch to make this work (and also to overhaul a few aspects of logging).
Depends on: 788021
Whiteboard: [MemShrink]
How much is space at a premium?  It might be handy to have ways to dump GC and CC logs separately, as GC logs can be huge.  They are independent: we only dump them at the same time out for laziness/simplicity.
(In reply to Andrew McCreight [:mccr8] from comment #1)
> How much is space at a premium?  It might be handy to have ways to dump GC
> and CC logs separately, as GC logs can be huge.  They are independent: we
> only dump them at the same time out for laziness/simplicity.

I was thinking we'd compress them as part of the dumping process.  We do this already with the memory dumps (which can also be huge), and doing so provided a substantial speedup.

I feel like compression is a good way to start, since splitting up GC/CC doesn't speed up the GC dumps when you need/want them.

(FYI, I'm also probably going to add a method to dump all strings, since that seems particularly useful.  Maybe as a generalization, we could just dump all live JS objects instead of dumping only the reachable ones?)
Bill has a patch for that in bug 739681, that perhaps died on the hills of my overly long review comments.
Blocks: 723783
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

 dom/base/Makefile.in                     |    1 +
 dom/ipc/ContentChild.cpp                 |   11 +++
 dom/ipc/ContentChild.h                   |    5 +-
 dom/ipc/PContent.ipdl                    |    9 ++

r? cjones.  These are pretty mechanical changes.

 xpcom/base/nsCycleCollector.cpp          |  152 +++++++++++++++++++++--
 xpcom/base/nsICycleCollectorListener.idl |    5 +-

r? mccr8.  These are changes that I need mostly so I can integrate gc/cc log dumping with my program which pulls the files off the device.

One thing I didn't do here is compress the logs.  We can do that later, or not, but it was going to be kind of involved, so I wanted to get this out here.

 xpcom/base/nsIMemoryReporter.idl         |    7 ++
 xpcom/base/nsMemoryReporterManager.cpp   |  102 +++++++++++++++++---

r? njn.  This is pretty mechanical, just adding the infrastructure needed for cc/gc log dumping here.
Attachment #670679 - Flags: review?(n.nethercote)
Attachment #670679 - Flags: review?(jones.chris.g)
Attachment #670679 - Flags: review?(continuation)
Comment on attachment 670680 [details] [diff] [review]
Part 2, v1: Move signal-watching and memory-report-dumping code out of nsMemoryReporterManager and into a new file, MemoryInfoDumper.

I think Nick gets to review this one by himself.

I made this change for two reasons.

  1) nsMemoryReporterManager was getting unwieldy with all the code I'd been adding.
  2) CC logging doesn't have anything to do with memory reporting.

I changed basically nothing here -- it's almost entirely a simple cut-and-paste job on top of patch v1.  I did change the indentation of the file, though, so now it's consistently 2 spaces.  :)

It may make sense to move some of the CC logging code into this file as well, but that's for later.
Attachment #670680 - Flags: review?(n.nethercote)
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

Review of attachment 670679 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +831,5 @@
> +      // Dump our memory reports (but run this on the main thread!).
> +      nsRefPtr<DumpMemoryReportsRunnable> runnable =
> +        new DumpMemoryReportsRunnable(
> +            /* identifier = */ EmptyString(),
> +            signum == sDumpAboutMemoryAfterMMUSignum,

I'd pull this comparison out into a local variable called "doMMUFirst" or something like that.

@@ +1230,5 @@
> +    // children's, allowing us to identify which files are from the same
> +    // request.
> +    nsString identifier(aIdentifier);
> +    if (identifier.IsEmpty()) {
> +        identifier.AppendInt(static_cast<int64_t>(PR_Now()) / 1000000);

This code is identical to that at the top of DumpMemoryReportsToFile(), and the comment is almost identical.  Factor out into a separate function, please.
Attachment #670679 - Flags: review?(n.nethercote) → review+
Comment on attachment 670680 [details] [diff] [review]
Part 2, v1: Move signal-watching and memory-report-dumping code out of nsMemoryReporterManager and into a new file, MemoryInfoDumper.

Review of attachment 670680 [details] [diff] [review]:
-----------------------------------------------------------------

I skimmed this a bit.  But I did notice that you can't call the dumping functions from JS code any more.  I guess that doesn't matter.

::: xpcom/base/MemoryInfoDumper.h
@@ +16,5 @@
> +/**
> + * This class facilitates dumping information about our memory usage to disk.
> + *
> + * Its cpp file also has Linux-only code which watches various OS signals and
> + * dumps memory info upon receiving a signal.  You can active these listeners

s/active/activate/

@@ +17,5 @@
> + * This class facilitates dumping information about our memory usage to disk.
> + *
> + * Its cpp file also has Linux-only code which watches various OS signals and
> + * dumps memory info upon receiving a signal.  You can active these listeners
> + * by calling Initialize();

s/;/./ ?

@@ +22,5 @@
> + */
> +class MemoryInfoDumper
> +{
> +public:
> +  static void Initialize();

Initialize() is a very generic name... why not just remove Initialize(), and move InitializeSignalWatcher() into this class, guarded with the #ifdef XP_LINUX.

::: xpcom/base/nsIMemoryReporter.idl
@@ +225,5 @@
>     */
>    readonly attribute int64_t explicitNonHeap;
>  };
>  
> +[scriptable, builtinclass, uuid(8b670411-ea2a-44c2-a36b-529db0670821)]

What does |builtinclass| mean?
Attachment #670680 - Flags: review?(n.nethercote) → review+
> But I did notice that you can't call the dumping functions from JS code any more.  I guess that 
> doesn't matter.

Call it my own private war against XPCOM.  :)

If we need to call this from JS, I'd see if we could use WebIDL (which can make chrome-only interfaces), although I'm not sure if WebIDL has the equivalent of do_GetService().  But for now, it doesn't seem to matter...

> Initialize() is a very generic name... why not just remove Initialize(), and move 
> InitializeSignalWatcher() into this class, guarded with the #ifdef XP_LINUX.

Ooh, I don't want the public interface of a class to change depending on what platform you're on.  Then if for example we changed the platforms where we had signal handlers, we'd have to change all the callers.

Initialize() gives us information hiding, which is good?

> What does |builtinclass| mean?

That the interface must be implemented in C++.  You can only put [infallible] on attributes in a builtinclass, for reasons that were not entirely clear to me -- I think there was a concern that JS might accidentally throw exceptions, or that the engine might generate exceptions without our code's intervention.
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

Review of attachment 670679 [details] [diff] [review]:
-----------------------------------------------------------------

Please check that this works on Windows and one of {OSX, Linux}.  If you don't have a Windows machine you can try this on, then let me know and I can check it out with a try build. This stuff tends to be finicky on different platforms, and unfortunately we don't have any tests for it.

I like how the state for GC dumping is teased apart from the CC dumping a bit more. It lays the groundwork for some future refactoring to separate out GC dumping into a separate function or something.

::: xpcom/base/nsCycleCollector.cpp
@@ -1322,2 @@
>  
> -        ++gLogCounter;

Please don't drop the log counter. For manually triggered logging it doesn't matter much, but a very common work flow is to log every CC, and I'd like to have something clearer than timestamps to try to order 30 or more CC logs.

It looks to me like you could fix that just by leaving the gLogCounter code alone, then adding some logic into CreateTempFile that will add the current log counter into the filename somewhere before the process ID.

You probably only need to actually include it if mFilenameIdentifier.IsEmpty().

@@ +1323,5 @@
> +        // completes.
> +        //
> +        // (We do this because we don't want scripts which poll the filesystem
> +        // looking for gc/cc dumps to grab a file before we're finished writing
> +        // to it.)

No need to split this into a bunch of paragraphs. The parens can probably also be dropped.

@@ +1359,5 @@
>          }
>  
> +        // Open a file for dumping the CC graph.  We again prefix with
> +        // "incomplete-".
> +        mOutFile = CreateTempFile("incomplete-cc-edges");

You should do an ENSURE_STATE thing for mOutFile.

All this ENSURE_STATE stuff is a little tedious. If only we had monads. ;)

@@ +1473,5 @@
> +            NS_ENSURE_STATE(!logFileFinalDestinationName.IsEmpty());
> +
> +            mOutFile->MoveTo(/* directory = */ nullptr,
> +                             logFileFinalDestinationName);
> +            mOutFile = nullptr;

Please add back console logging of the cycle collector log name.

@@ +1532,5 @@
> +    {
> +        nsPrintfCString filename("%s.%d%s%s.log",
> +            aPrefix,
> +            base::GetCurrentProcId(),
> +            !mFilenameIdentifier.IsEmpty() ? "." : "",

Maybe reverse the cases here to eliminate the ! from the trinary operator.
Attachment #670679 - Flags: review?(continuation) → review-
> For manually triggered logging it doesn't matter much, but a very common work flow is to log every 
> CC, and I'd like to have something clearer than timestamps to try to order 30 or more CC logs.

The code as written before would overwrite any old cc logs which happened to be sitting around in the temp directory.  This new code will never overwrite a file; instead, it will create a file named something like cc-log.$WHATEVER-2.log.

So if you don't specify an identifier, you'll get files named

  cc-log.$PID.log
  cc-log.$PID-1.log
  cc-log.$PID-2.log

and so on.

Is that OK?
Ah, nice!  Sorry, I'm not familiar with our file functions.  I guess you'll get weird behavior in cases with PID collisions, if somebody deletes some but not all log files, etc., but no big deal.

r=me with the other comments addressed.
Attachment #670679 - Flags: review- → review+
> Initialize() gives us information hiding, which is good?

Fair enough.
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

>diff --git a/dom/base/Makefile.in b/dom/base/Makefile.in

I don't know if exporting this header is a good idea.  If you're not
sure, ask a DOM peer.  I'll leave it up to your judgment.

>diff --git a/dom/ipc/PContent.ipdl b/dom/ipc/PContent.ipdl

>+    /**
>+     * Dump this process's GC and CC logs.
>+     *
>+     * For documentation on the args, see
>+     * nsIMemoryReporterManager::dumpGCAndCCLogsToFile.
>+     */
>+    async DumpGCAndCCLogsToFile(nsString identifier,
>+                                bool dumpChildProcesses);

Document what the identifier is.
Attachment #670679 - Flags: review?(jones.chris.g) → review+
> Document what the identifier is.

I feel pretty dumb copy-pasting the full documentation for DumpGCAndCCLogsToFile into PContent, because there's no way we're going to keep the PContent version up to date.  We'd also have to copy the docs for DumpMemoryReportsToFile, to be consistent.

Copying the docs has the additional problem that it doesn't make it clear that the PContent version is in fact identical to the MemoryInfoDumper version -- you have to manually diff the comments in order to see that.

I could add a brief comment just about the |identifier| argument, but that feels arbitrary.

Neither of these changes seems like an improvement to me over the existing comments, but I'll do whatever you want here; just let me know what your preference is.
(In reply to Justin Lebar [:jlebar] from comment #17)
> Windows builds are a'coming to
> https://tbpl.mozilla.org/?tree=Try&rev=2cd33246dfe9.

I apparently pulled a busted version of m-c.  Let's try again:
https://tbpl.mozilla.org/?tree=Try&rev=b1d82b934b42
Either I'm misreading the logs, my commit is producing very strange errors, or I'm having seriously bad luck here.

e:\builds\moz2_slave\try-w32-dbg\build\obj-firefox\dist\include\nsError.h(135) : warning C4005: 'ERROR' : macro redefinition

I /know/ this is based off good commit.  https://tbpl.mozilla.org/?tree=Try&rev=85b58d6a4678
Ah, apparently I'm hitting bug 794490, which means it's a real compile error, and a busted compile log.  :(
Try run for 5741ca704be9 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=5741ca704be9
Results (out of 2 total builds):
    failure: 2
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-5741ca704be9
Try run for 23510880ff6e is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=23510880ff6e
Results (out of 1 total builds):
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-23510880ff6e
Try run for deda9ddd2e25 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=deda9ddd2e25
Results (out of 49 total builds):
    exception: 2
    success: 1
    failure: 46
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-deda9ddd2e25
> Document what the identifier is.

We resolved this on IRC; cjones is fine with the existing "For documentation, see nsIMemoryReporterManager::dumpGCAndCCLogsToFile".

Word from the recently enwisened: NS_LITERAL_STRING("foo" "bar") doesn't work.  What seems to be happening is that on MSVC NS_LITERAL_STRING("foo" "bar") expands to something like L"foo" "bar" (where L"foo" means "foo" as a wide string), which then is a compile error because we're trying to concatenate a wide string and a narrow string.

Anywho, inbound for FF19.

https://hg.mozilla.org/integration/mozilla-inbound/rev/e676a99a7a8d
https://hg.mozilla.org/integration/mozilla-inbound/rev/295110f8c892
https://hg.mozilla.org/integration/mozilla-inbound/rev/bdb3e55bddda
(Oh, and I did test that CC logging still works on Windows.)
Comment on attachment 670679 [details] [diff] [review]
Part 1, v1: Dump a GC and CC log upon receiving SIGRTMIN + 2.

[Approval Request Comment]
Affects only logging code, necessary to retrieve these logs from B2G devices.
Attachment #670679 - Flags: approval-mozilla-aurora?
Attachment #670679 - Attachment is obsolete: true
Attachment #670679 - Flags: approval-mozilla-aurora?
Attachment #670680 - Attachment is obsolete: true
Comment on attachment 671700 [details] [diff] [review]
Part 1, as checked in (comment 24): Dump a GC and CC log upon receiving SIGRTMIN + 2. r=mccr8,njn,cjones

Triage people: Can you provide guidance on whether you want to a+ patches as checked in to the tree, or whether I can request approval on attachments which differ from what I've checked in?

It's a non-negligible amount of work to re-attach patches etc., and I don't expect anyone is a+'ing these patches based on a careful examination of their contents, so I'd like not to go through the work of attaching final patches it if it's not helping you somehow.
Attachment #671700 - Flags: review+
Attachment #671700 - Flags: approval-mozilla-aurora?
Attachment #671701 - Flags: approval-mozilla-aurora?
Attachment #671702 - Flags: approval-mozilla-aurora?
Attachment #671701 - Flags: review+
Attachment #671702 - Flags: review+
Aaah. Patch #2, which moved functions from nsMemoryReporterManager to MemoryInfoDumper, regressed my patch from bug 799476, since it removed the functions from the IDL as well. Are there any plans to expose the new MemoryInfoDumper code to script? Or alternatively, is there some supported programmatic way to invoke this so I don't have to duplicate the code?
(In reply to Kartikaya Gupta (:kats) from comment #32)
> Aaah. Patch #2, which moved functions from nsMemoryReporterManager to
> MemoryInfoDumper, regressed my patch from bug 799476, since it removed the
> functions from the IDL as well.

See the top of comment 10! :/
bz &co, is there a way to expose a service to chrome JS through WebIDL (like XPCOM's GetService)?  Or should I just go with XPCOM here?
I'm not sure what the WebIDL question is.  Are you trying to create a method on window that's only available in chrome and hands out an object?  Or are you trying to hand out an object always but only expose API on that object in chrome?
> Are you trying to create a method on window that's only available in chrome and hands out an 
> object?

Like that, except we don't necessarily have a window, so I was hoping to create the object out of thin air (like GetService).
Ah.  I don't think we have available infrastructure for that in WebIDL....  We _could_ try to set up something that would only register a constructor on chrome scopes, but it would still work only on JS globals that have WebIDL things going on to start with.
Comment on attachment 671700 [details] [diff] [review]
Part 1, as checked in (comment 24): Dump a GC and CC log upon receiving SIGRTMIN + 2. r=mccr8,njn,cjones

This risk to desktop/mobile is manageable. Approving for Aurora 18 uplift.
Attachment #671700 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #671701 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #671702 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Justin Lebar [:jlebar] from comment #30)
> It's a non-negligible amount of work to re-attach patches etc., and I don't
> expect anyone is a+'ing these patches based on a careful examination of
> their contents, so I'd like not to go through the work of attaching final
> patches it if it's not helping you somehow.

It's mostly a matter of book-keeping, so that we know what we can tell in the future if what you landed was what you intended to land. If only very minor changes are made, we'll remain flexible.
> It's mostly a matter of book-keeping, so that we know what we can tell in the future if what you 
> landed was what you intended to land.

Can you compare the diff on m-c and aurora?  That seems like a surer bet than comparing patches anyway.
Depends on: 805856
Depends on: 806239
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: