Last Comment Bug 708159 - Avoid unnecessary work done by multi-reporters nsMemoryReporterManager::GetExplicit
: Avoid unnecessary work done by multi-reporters nsMemoryReporterManager::GetEx...
Status: RESOLVED FIXED
[Snappy]
: dev-doc-complete
Product: Core
Classification: Components
Component: General (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla12
Assigned To: Nicholas Nethercote [:njn]
:
Mentors:
Depends on:
Blocks: 703143
  Show dependency treegraph
 
Reported: 2011-12-06 18:33 PST by Nicholas Nethercote [:njn]
Modified: 2012-04-24 05:22 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (36.69 KB, patch)
2011-12-12 21:37 PST, Nicholas Nethercote [:njn]
justin.lebar+bug: review+
Details | Diff | Review
patch, v2 (28.72 KB, patch)
2011-12-13 18:38 PST, Nicholas Nethercote [:njn]
no flags Details | Diff | Review
patch v3 (13.55 KB, patch)
2011-12-14 16:43 PST, Nicholas Nethercote [:njn]
justin.lebar+bug: review+
Details | Diff | Review
patch v4 (42.80 KB, patch)
2011-12-15 13:43 PST, Nicholas Nethercote [:njn]
no flags Details | Diff | Review
patch v5 (42.17 KB, patch)
2011-12-28 23:38 PST, Nicholas Nethercote [:njn]
bent.mozilla: review+
Details | Diff | Review

Description Nicholas Nethercote [:njn] 2011-12-06 18:33:20 PST
nsMemoryReporterManager::GetExplicit() calculates the "explicit" memory measurement.  This consists of "heap-allocated" (which is cheap to get) plus the sum of all the explicit NONHEAP memory reports.  Most of the work done by GetExplicit is doing the NONHEAP measurements.

For nsIMemoryReporters, no unnecessary work is done -- GetExplicit checks if the kind is NONHEAP and the path starts with "explicit", and if not, the "amount" field isn't accessed.  However, for nsIMemoryMultiReporters, lots of unnecessary work is done.  We call CollectReports, which does all the measurements and passes them back via a callback.  GetExplicit's callback ignores any non-NONHEAP, non-explicit measurements.

To put this more concretely, the JS multi-reporter touches every live thing on the JS heap every time it runs.  With telemetry enabled, this is once per minute.  This is bad.  On Linux, it's even worse because we also read /proc/self/smaps on every telemetry ping.

So, how to fix?  I thought about passing an extra arg to CollectReports that is a mask of the kinds we need to know about.  But that's still going to do more work than we need.

A more efficient approach is one that targets GetExplicit's specific use case.  We'd add nsIMemoryMultiReporter::SizeOfExplicitNonHeap(), which would just return the sum of the explicit, NONHEAP measurements.  For most multi-reporters (e.g. StorageSQLiteMultiReporter and MapsReporter) this would just return 0.  For XPConnectJSCompartmentsMultiReporter it would just need to do some much quicker, higher-level traversals of the GC heap and a few other things.
Comment 1 Justin Lebar (not reading bugmail) 2011-12-06 19:01:43 PST
:sigh.

Adding nsIMemoryMultiReporter::SizeOfExplicitNonHeap is an unfortunate hack, but I'm not sure how to do this better.

I wish we could at least check that SizeOfExplicitNonHeap returns close to the right thing, but even that is hard, because the world can change under you.
Comment 2 Nicholas Nethercote [:njn] 2011-12-08 15:00:53 PST
Another possibility is to not do this and remove "explicit" from the telemetry stats.  We'd still get "heap-allocated" and "resident" in telemetry, so it wouldn't be the end of the world, and we'd avoid the whole possibility of the two different measurement paths giving different results.

Having said that, there are already two paths for computing "explicit" -- about:memory basically just sums all the expclit HEAP and NONHEAP values, whereas GetExplicit sums "heap-allocated" and all the explicit NONHEAP values.
Comment 3 Justin Lebar (not reading bugmail) 2011-12-08 15:27:53 PST
So a multi-reporter could, when registering, indicate the shared path prefix of all its reporters.  We could say that if you're giving any explicit/ reporters, you must register with a prefix starting with "explicit/".  I'd like that API much better than "give me all your explicit nonheap numbers."

We'd have to iterate over all of explicit in order to use this, though.  :-/
Comment 4 Nicholas Nethercote [:njn] 2011-12-12 21:37:56 PST
Created attachment 581175 [details] [diff] [review]
patch

This patch implements nsIMemoryReporter::explicitNonHeap.

- bent, can you review the WorkerPrivate.cpp changes?  I had to genericize
  the existing code in order to do two different GC traversals within the
  framework you created.  I did this by rearranging things, passing in a
  function that does the traversal, and changing |IterateData* data| to
  |void* data|.

- billm, can you review the JS and xpconnect changes?  I had to create
  IterateCompartments.  And the xpconnect changes are required for the
  WorkerPrivate.cpp changes.

- jlebar, can you review everything else?  I added GetExplicitNonHeap
  implementations where necessary.  I also added a new "explicit" reporter
  that gets the result from nsMemoryReporterManager::GetExplicit(), i.e.
  uses the new code.  The good thing about this is that it shows up in the
  "Other Measurements" list in about:memory, so you can compared it to the
  one at the top of about:memory.  I've found that they're usually within
  100KB of each other, and often within a few KB.  I've seen them get as
  much as 1MB or so apart when the browser is really busy, e.g. restoring an
  old session.  So if the two methods of measuring "explicit" ever get out
  of sync we'll be able to notice.  (In fact, this is how I found bug 709653
  -- the new method was giving the right answer, the old method was not.)

  This creation of the stand-alone "explicit" reporter also simplified
  TelemetryPing.js.

  Oh, I had to make a minor change for the swap tree in about:memory.  If
  it's zero it now looks like this:

    0 B (100.0%) -- swap
    └──0 B (100.0%) -- total

  Because we now have a reporter named "explicit" as well as a tree rooted 
  at "explicit", I had to change the |treeNameMatches| function to check for
  "explicit/" at the start of the path instead of "explicit", so that the 
  "explicit" reporter isn't mistakenly put in the "explicit" tree.  And for 
  an emmpty swap tree the name of the tree was "map/swap" and the only entry
  in it was also called "map/swap", so the version of |treeNameMatches|
  failed.  With "map/swap/total" it doesn't fail.  (I hope that made sense.)

  I also did some other testing that isn't visible -- while writing the
  patch I first implemented the new version of GetExplicit() in parallel
  with the old one to make sure I got it right.  It took a few goes (the DOM
  worker's GetExplicitNonHeap was a bit tricky), but eventually I got them
  to always give the same result, which gives me confidence.  The old method
  is gone in the patch, of course.
Comment 5 Justin Lebar (not reading bugmail) 2011-12-13 08:23:03 PST
Comment on attachment 581175 [details] [diff] [review]
patch

>@@ -218,17 +225,17 @@ MapsReporter::CollectReports(nsIMemoryMu
>   // 'map/vsize'; otherwise we're probably not reading smaps correctly.  If we
>   // didn't create a node under 'map/swap', create one here so about:memory
>   // knows to create an empty 'map/swap' tree.  See also bug 682735.
> 
>   NS_ASSERTION(categoriesSeen.mSeenVsize, "Didn't create a vsize node?");
>   NS_ASSERTION(categoriesSeen.mSeenVsize, "Didn't create a resident node?");
>   if (!categoriesSeen.mSeenSwap) {
>     aCallback->Callback(NS_LITERAL_CSTRING(""),
>-                        NS_LITERAL_CSTRING("map/swap"),
>+                        NS_LITERAL_CSTRING("map/swap/total"),

OOC, why did you have to do this?

> NS_IMETHODIMP
> nsMemoryReporterManager::GetExplicit(PRInt64 *aExplicit)
> {
>-    // Get KIND_NONHEAP measurements from multi-reporters, too.
>+    PRInt64 explicitNonHeapNormal = 0;
>+    for (PRUint32 i = 0; i < explicitNonheapNormal.Length(); i++) {
>+        explicitNonHeapNormal += explicitNonheapNormal[i].amount;
>+    }
>+
>+    // For each multi-reporter we could call CollectReports and filter out the
>+    // non-explicit, non-NONHEAP measurements.  But that's lots of wasted work,
>+    // so we instead use SizeOfExplicitNonHeap() which exists purely for this
>+    // purpose.

Can you indicate (until bug 700508 lands) that we're assuming the
multi-reporters don't overlap?

Also, how would you feel about doing a sanity check here in debug builds to
ensure that SizeOfExplicitNonHeap is close to what you'd get if you iterated
over the multi-reporter and filtered its results?  We can't make it a fatal
assertion, but this would catch bugs where someone, say, had an explicit,
non-heap reporter but always returned 0 in SizeOfExplicitNonHeap.
Comment 6 Bill McCloskey (:billm) 2011-12-13 10:17:57 PST
Comment on attachment 581175 [details] [diff] [review]
patch

Review of attachment 581175 [details] [diff] [review]:
-----------------------------------------------------------------

The JS stuff looks okay. Is there someone who would be more qualified to review the xpconnect changes? I haven't looked at that code in a while and it's changed a lot.
Comment 7 Nicholas Nethercote [:njn] 2011-12-13 13:57:41 PST
> >     aCallback->Callback(NS_LITERAL_CSTRING(""),
> >-                        NS_LITERAL_CSTRING("map/swap"),
> >+                        NS_LITERAL_CSTRING("map/swap/total"),
> 
> OOC, why did you have to do this?

See the paragraph in comment 4 that begins "Oh, I had to make a minor change..."
> Also, how would you feel about doing a sanity check here in debug builds to
> ensure that SizeOfExplicitNonHeap is close to what you'd get if you iterated
> over the multi-reporter and filtered its results?  We can't make it a fatal
> assertion, but this would catch bugs where someone, say, had an explicit,
> non-heap reporter but always returned 0 in SizeOfExplicitNonHeap.

I don't want to keep the iterate-over-multi-reporters code, because it's moderately complicated.  But it did occur to me last night to add a test that loads about:memory and checks the two explicit values are within 1% (or something like that).
Comment 8 Justin Lebar (not reading bugmail) 2011-12-13 15:14:38 PST
> See the paragraph in comment 4 that begins "Oh, I had to make a minor change..."

Oh, sorry; I completely skipped over the explanatory paragraph.

> But it did occur to me last night to add a test that loads about:memory and checks the 
> two explicit values are within 1% (or something like that).

Okay, cool.
Comment 9 Nicholas Nethercote [:njn] 2011-12-13 15:17:26 PST
> > But it did occur to me last night to add a test that loads about:memory and checks the 
> > two explicit values are within 1% (or something like that).
> 
> Okay, cool.

Thinking some more, measuring twice is a much stronger guarantee, so I'll think about adding it back in for debug builds.
Comment 10 Justin Lebar (not reading bugmail) 2011-12-13 15:21:56 PST
about:memory does basically the same computation as was removed in this patch, right?  So you're still measuring twice if you add the test?
Comment 11 Nicholas Nethercote [:njn] 2011-12-13 16:41:27 PST
(In reply to Justin Lebar [:jlebar] from comment #10)
> about:memory does basically the same computation as was removed in this
> patch, right?  So you're still measuring twice if you add the test?

That's right.  But the comparison would have to be a "within 1% or so", and would only be done for whatever set of pages we load in the test.

Whereas if I keep the old code the comparison would be exact and would run all the time in debug builds, on whatever set of pages people are viewing.  So as a checking mechanism it's both more precise and much more wide-ranging.  (And the old code will become a bit less obnoxious if bug 700508 ever re-lands.)
Comment 12 Justin Lebar (not reading bugmail) 2011-12-13 16:48:23 PST
> Whereas if I keep the old code the comparison would be exact

Is that true?  My assumption was that it wouldn't be exact, since we have at least a tiny bit of concurrency in Gecko.  It's easy enough to check, I guess.
Comment 13 Nicholas Nethercote [:njn] 2011-12-13 16:51:56 PST
The results were exactly the same for the testing I did, but that testing wasn't extensive.
Comment 14 Justin Lebar (not reading bugmail) 2011-12-13 16:57:34 PST
If you made it just a warning, that'd probably be fine.  NS_ASSERTION and NS_WARNING are the same level, anyway.  But I suspect a worker, for example, could cause the numbers to be different.
Comment 15 Nicholas Nethercote [:njn] 2011-12-13 16:58:13 PST
I did some measurements.  With 10 tabs open (gmail + 9 news sites), with the old code, computing "explicit" was taking 60--70ms.  I thought a lot of it might be due to reading /proc/<pid>/smaps, so I disabled that and it still took 50--60ms.

With the new code, it takes about 0.3ms, so it's ~200x faster.  

Furthermore, I represent the least worst case because I have a very fast machine with 16GB of RAM so I'm not seeing any slow-down from paging.  The improvement could be much bigger for people on machines with less RAM.
Comment 16 Justin Lebar (not reading bugmail) 2011-12-13 16:59:22 PST
One 50ms jank down, some unknown number to go!
Comment 17 Nicholas Nethercote [:njn] 2011-12-13 18:38:39 PST
Created attachment 581509 [details] [diff] [review]
patch, v2

bent, can you review the xpconnect changes?  They're very closely related to the DOM worker changes.

jlebar, I put the double-checking we talked about in GetExplicit().  After browsing for a while, I'm yet to see a case where the two measurements don't match exactly.  I also fixed test_aboutmemory.xul.
Comment 18 Justin Lebar (not reading bugmail) 2011-12-14 06:51:21 PST
Can you make the #ifdef DEBUG code more self-contained?  Right now, I have to read through much of it while I'm trying to understand the release code, which is a bummer...
Comment 19 Justin Lebar (not reading bugmail) 2011-12-14 07:43:37 PST
Also, a comment explaining what class of bugs you're trying to catch by computing |explicit| twice would be appreciated.
Comment 20 Nicholas Nethercote [:njn] 2011-12-14 16:43:11 PST
Created attachment 581828 [details] [diff] [review]
patch v3

(In reply to Justin Lebar [:jlebar] from comment #18)
> Can you make the #ifdef DEBUG code more self-contained?  Right now, I have
> to read through much of it while I'm trying to understand the release code,
> which is a bummer...

Done.

(In reply to Justin Lebar [:jlebar] from comment #19)
> Also, a comment explaining what class of bugs you're trying to catch by
> computing |explicit| twice would be appreciated.

Is this existing comment unclear?

    // (Actually, in debug builds we also do it the slow way and compare the
    // result to the result obtained from GetExplicitNonHeap().  This
    // guarantees the two measurement paths are equivalent.  This is wise
    // because it's easy for memory reporters to have bugs.)
Comment 21 Justin Lebar (not reading bugmail) 2011-12-14 17:16:13 PST
Comment on attachment 581828 [details] [diff] [review]
patch v3

Sorry I missed the comment earlier (I should use interdiff, rather than scanning) -- it's fine.  Thanks!

> +    fprintf(stderr, "my-explicit: %d, %d\n",
> +            (int)explicitNonHeapMultiSize, (int)explicitNonHeapMultiSize2);

Maybe we should only print something when the assertion fires?
Comment 22 Nicholas Nethercote [:njn] 2011-12-14 19:21:43 PST
> > +    fprintf(stderr, "my-explicit: %d, %d\n",
> > +            (int)explicitNonHeapMultiSize, (int)explicitNonHeapMultiSize2);
> 
> Maybe we should only print something when the assertion fires?

Oh, right, that's my debugging code :)
Comment 23 Nicholas Nethercote [:njn] 2011-12-15 13:43:09 PST
Created attachment 582097 [details] [diff] [review]
patch v4

patch v3 was only half the story;  I had the patch in two parts in my queue and only attached the 2nd one.  jlebar, you've now reviewed both parts so no need to look at it again.  bent, you're up!
Comment 24 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-12-27 15:29:11 PST
Comment on attachment 582097 [details] [diff] [review]
patch v4

Review of attachment 582097 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerPrivate.cpp
@@ +183,5 @@
>                    NS_LITERAL_CSTRING(")/");
>    }
>  
> +  nsresult
> +  CollectForRuntime(CollectFun aFun, void* aData)

Please add an 'AssertIsOnMainThread' here too.

@@ +195,5 @@
>        // Don't ever try to talk to the worker again.
>        if (disabled) {
>  #ifdef DEBUG
>          {
> +          nsCAutoString message("Unable to get report memory for ");

Nit: please revert this.

@@ +221,5 @@
> +    IterateData data;
> +    nsresult rv = CollectForRuntime(mozilla::xpconnect::memory::CollectCompartmentStatsForRuntime,
> +                                    &data);
> +    if (NS_FAILED(rv))
> +      return rv;

Nit: Please brace single line if blocks.

::: dom/workers/WorkerPrivate.h
@@ +473,5 @@
>    { }
>  #endif
>  };
>  
> +typedef JSBool (*CollectFun)(JSRuntime* aRt, void* aData);

Hm, instead of doing this here I think you can just pass a boolean flag to the BlockAndCollectRuntimeStats function indicating if this is a full report or just the quick one. Then you can pass that all the way through and call the appropriate function from the BlockAndCollectRuntimeStats function.
Comment 25 Nicholas Nethercote [:njn] 2011-12-27 16:48:51 PST
> > +typedef JSBool (*CollectFun)(JSRuntime* aRt, void* aData);
> 
> Hm, instead of doing this here I think you can just pass a boolean flag to
> the BlockAndCollectRuntimeStats function indicating if this is a full report
> or just the quick one. Then you can pass that all the way through and call
> the appropriate function from the BlockAndCollectRuntimeStats function.

That was my initial plan, but the more general approach wasn't any harder and seemed much less hacky.
Comment 26 Nicholas Nethercote [:njn] 2011-12-27 17:30:11 PST
BTW, I don't know what the outcome of your review is.  You haven't given me r+ or r-, but I guess it's really an r- because you want to see another version of the patch with your suggested changes?  Is that right?  (Don't be afraid to r- me if you don't like the patch, that's much clearer than leaving the r? in place.)
Comment 27 Nicholas Nethercote [:njn] 2011-12-28 23:38:05 PST
Created attachment 584712 [details] [diff] [review]
patch v5

I really think the function pointer version is superior to the boolean flag version.  Oh well, I've made the change.
Comment 28 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-12-29 08:26:37 PST
Comment on attachment 584712 [details] [diff] [review]
patch v5

Review of attachment 584712 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks!

::: dom/workers/WorkerPrivate.cpp
@@ +1451,5 @@
>      JSAutoSuspendRequest asr(aCx);
>  
> +    *mSucceeded = mIsQuick
> +      ? mozilla::xpconnect::memory::GetExplicitNonHeapForRuntime(JS_GetRuntime(aCx), mData)
> +      : mozilla::xpconnect::memory::CollectCompartmentStatsForRuntime(JS_GetRuntime(aCx), mData);

Nit: ? and : on preceding lines.
Comment 29 Nicholas Nethercote [:njn] 2011-12-29 22:10:29 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/26acd30cab90
Comment 30 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-12-30 04:50:42 PST
https://hg.mozilla.org/mozilla-central/rev/26acd30cab90
Comment 31 Eric Shepherd [:sheppy] 2012-04-24 05:22:49 PDT
Documentation updated:

https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIMemoryMultiReporter

Listed on Firefox 12 for developers.

Note You need to log in before you can comment on or make changes to this bug.