Closed Bug 697041 Opened 8 years ago Closed 8 years ago

about:memory perturbs what it is measuring

Categories

(Toolkit :: about:memory, defect)

defect
Not set

Tracking

()

RESOLVED DUPLICATE of bug 698846

People

(Reporter: njn, Assigned: njn)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2][see comment 6 and on])

Spun off from bug 697038, where RSS feeds were found to use several MB of xpconnect native thingies.  See attachment 569293 [details] for relevent DMD output.
Blocks: DarkMatter
Depends on: DMD
bholley, can you take this?  I can help you with how memory reporters work.
Assignee: nobody → bobbyholley+bmo
Whiteboard: [MemShrink] → [MemShrink:P2]
A lot of these look string related, not XPCWrappedNative related.
Summary: Add memory reporters for xpconnect native thingies → Add memory reporters for xpconnect thingies
(In reply to Nicholas Nethercote [:njn] from comment #1)
> bholley, can you take this?  I can help you with how memory reporters work.

So, I think this stuff might be a red herring. Most of the big allocations here are coming through XPCConvert, which we use to transform parameters from JS->C++ or C++->JS during method call.

The important thing to note is that these allocations are very ephemeral. Assuming we're not straight-up leaking, all of these allocations get freed by the time CallMethod returns. They're heap-allocated, since we don't know what we'll need at compile time, but they have stack scope.

So if the point of the memory reporters is to find memory we're holding onto, this won't be very much help. It would also be quite a lot of code, because XPConnect does these sorts of mallocs all over the place. :-(
How epheneral?  DMD analyzes the heap at a particular point in time.  Maybe I just got lucky and hit a point in time where lots of ephemeral stuff happened to be alive?  I'll do some more profiling, see how common these are.  I did see the heap-unclassified number bouncing around a lot.
(In reply to Nicholas Nethercote [:njn] from comment #4)
> How epheneral?

They should disappear by the time the call returns.

I'm seeing both a JS->C++ and C++->JS method call here. Assuming there's only one thread involved, that would imply that you're calling into a JS-implemented component (where, interestingly enough, we use two wrappers rather than none). Does that make sense in this context?
So I just looked at three old DMD profiles I had lying around;  one was with 5 gmail tabs open, one was with various sites open, and one was with a single huge hg.mozilla.org log open.

Let's consider the records in attachment 569293 [details]:

- 1. (nsStringBuffer::Alloc)  bz asked for more context.  Here's a run with a 20-deep stack trace:

Unreported: 1,169,360 (cumulative: 522,935,088) bytes in 2,595 heap block(s) in record 54 of 27713:
 Requested bytes unreported: 963,290 / 963,290
 Slop      bytes unreported: 206,070 / 206,070
   at 0x402A063: malloc (vg_replace_malloc.c:263)
   by 0x403C044: moz_malloc (mozalloc.cpp:113)
   by 0x7A71BB9: nsStringBuffer::Alloc(...) (nsSubstring.cpp:209)
   by 0x7A71F45: nsAString_internal::MutatePrep(...) (nsTSubstring.cpp:162)
   by 0x7A72018: nsAString_internal::ReplacePrepInternal(...) (nsTSubstring.cpp:198)
   by 0x7A718F3: nsAString_internal::ReplacePrep(...) (nsTSubstring.h:685)
   by 0x7A72445: nsAString_internal::Assign(...) (nsTSubstring.cpp:335)
   by 0x7A725F1: nsAString_internal::Assign(...) (nsTSubstring.cpp:396)
   by 0x6774881: nsString::nsString(...) (nsTString.h:99)
   by 0x7A1D970: nsVariant::SetFromAString(...) (nsVariant.cpp:1480)
   by 0x7A1E647: nsVariant::SetAsAString(...) (nsVariant.cpp:2043)
   by 0x7A0CD4D: nsHashPropertyBag::SetPropertyAsAString(...) (nsHashPropertyBag.cpp:288)
   by 0x7A682DC: NS_InvokeByIndex_P (xptcinvoke_x86_64_unix.cpp:195)
   by 0x73AD83D: XPCWrappedNative::CallMethod(...) (XPCWrappedNative.cpp:2907)
   by 0x73B915B: XPC_WN_CallMethod(...) (XPCWrappedNativeJSOps.cpp:1553)
   by 0x7EEEFD8: js::InvokeKernel(...) (jscntxtinlines.h:297)
   by 0x7F06AAB: js::Interpret(...) (jsinterp.cpp:4002)
   by 0x7EEECE3: js::RunScript(...) (jsinterp.cpp:584)
   by 0x7EEF103: js::InvokeKernel(...) (jsinterp.cpp:647)
   by 0x7E6A8AA: js::Invoke(...) (jsinterp.h:148)

  I had to run DMD a couple of times to get an entry like this, sometimes it was 
  much smaller.

- 2. (XPCStringConvert::ReadableToJSVal) Has very small numbers, 4KB--50KB, so seems ok to ignore.

- 3. (UTF8ToNewUnicode)  All three had significant sizes numbers here, ranging from 700KB to just over 1MB.  Definitely worth reporting.

- 4. (Native2WrappedNativeMap::Remove)  Two of them had ~100KB, one didn't have big enough stack traces to tell.

- 5, 7. (XPCWrappedNative::GetNewOrUsed)  This function showed up in lots of different reports... 60KB here, 50KB there, enough to add up a bit.  Might be worth tracking.

- 6. (nsHashPropertyBag::SetProperty*)  Insignificant.


Oh crap... looking at bigger stack traces I see that (3) above is due to stuff done by xpconnect while about:memory is being generated:

Unreported: 608,432 (cumulative: 541,943,364) bytes in 3,624 heap block(s) in record 59 of 20237:
 Requested bytes unreported: 582,682 / 582,682
 Slop      bytes unreported: 25,750 / 25,750
  at 0x402A063: malloc (vg_replace_malloc.c:263)
  by 0x403C044: moz_malloc (mozalloc.cpp:113)
  by 0x7A5B0D1: NS_Alloc_P (nsMemoryImpl.cpp:198)
  by 0x66FEE83: nsMemory::Alloc(unsigned long) (nsMemory.h:68)
  by 0x7A6E584: UTF8ToNewUnicode(...) (nsReadableUtils.cpp:373)
  by 0x73792BF: XPCConvert::NativeData2JS(...) (XPCConvert.cpp:412)
  by 0x73781BE: XPCConvert::NativeData2JS(...) (xpcprivate.h:3234)
  by 0x73A2F40: nsXPCWrappedJSClass::CallMethod(...) (XPCWrappedJSClass.cpp:1444)
  by 0x739AC62: nsXPCWrappedJS::CallMethod(...) (XPCWrappedJS.cpp:553)
  by 0x7A69212: PrepareAndDispatch (xptcstubs_x86_64_linux.cpp:153)
  by 0x7A68342: SharedStub (in /home/njn/moz/mi8/v64/toolkit/library/libxul.so)
  by 0x7A6781E: mozilla::MapsMemoryReporter::MapsReporter::ParseMapBody(...) (MapsMemoryReporter.cpp:516)

If Bobby is right, maybe all of this stuff is caused by about:memory itself.  DMD does its analysis just after the last memory reporter has been queried by aboutMemory.js.
(In reply to Bobby Holley (:bholley) from comment #5)
> 
> I'm seeing both a JS->C++ and C++->JS method call here. Assuming there's
> only one thread involved, that would imply that you're calling into a
> JS-implemented component (where, interestingly enough, we use two wrappers
> rather than none). Does that make sense in this context?

Yes.  aboutMemory.js calls collectReports on the JS multi-reporter object.  That function is implemented in C++, and one of its arguments is a JS callback, which gets called lots of times.
(In reply to Nicholas Nethercote [:njn] from comment #6)

> Oh crap... looking at bigger stack traces I see that (3) above is due to
> stuff done by xpconnect while about:memory is being generated:
 
> If Bobby is right, maybe all of this stuff is caused by about:memory itself.
> DMD does its analysis just after the last memory reporter has been queried
> by aboutMemory.js.

That sounds like a reasonable theory. In many cases we make temporary copies of incoming string data. If this string data is the generated HTML for about:memory, I could imagine it being on the large side.
Resetting assignee to default given the ambiguity here.
Assignee: bobbyholley+bmo → nobody
(In reply to Bobby Holley (:bholley) from comment #8)
> In many cases we make temporary copies
> of incoming string data. If this string data is the generated HTML for
> about:memory, I could imagine it being on the large side.

It's not generated HTML, but there can be a lot of moderately long strings.  Details below.

----

Some more background about how all this works...

about:memory is generated by aboutMemory.js.  It calls on things called "multi-reporters".  The relevant IDL:

 [scriptable, function, uuid(5b15f3fa-ba15-443c-8337-7770f5f0ce5d)]
 interface nsIMemoryMultiReporterCallback : nsISupports
 {
   void callback(in ACString process, in AUTF8String path, in PRInt32 kind,
                 in PRInt32 units, in PRInt64 amount,
                 in AUTF8String description, in nsISupports closure);
 };

 /*
  * An nsIMemoryMultiReporter reports multiple memory measurements via a
  * callback function which is called once for each measurement.  Use this
  * when you want to gather multiple measurements in a single operation (eg.
  * a single traversal of a large data structure).
  */ 
 [scriptable, uuid(eae277ad-b67d-4389-95f4-03fa11c09d06)]
 interface nsIMemoryMultiReporter : nsISupports
 {     
   void collectReports(in nsIMemoryMultiReporterCallback callback,
                       in nsISupports closure);
 };

If 'mmr' is a multi-reporter, aboutMemory.js calls mmr.collectReports.  The multi-reporters we have so far are all implemented in C++, and the callback is always a JS function from aboutMemory.js.

More abstractly, the data gathering is all done in C++, and the data presentation is all done in JS.  We need to pass lots (100s or even 1000s) of tuples of this form:

  (process, path, kind, units, amount, description)

from C++ to JS.  I did this via the callback mechanism, I think I copied it from some code that Shaver wrote and I couldn't see a better way to do it.  But my xpcom knowledge is pretty basic, maybe there's a better (more memory-efficient) way to do it.

It looks like the strings are a big part of the problem, e.g. all those calls to UTF8ToNewUnicode() calls.  'process' is always empty in desktop builds (though it can be non-empty in Fennec), but 'path' can be moderately long, e.g. "explicit/js/compartment([System Principal], 0x7f6e10025000)/gc-heap/objects", and 'description' is often even longer, being a short paragraph of prose describing a particular measurement.  (Even worse, for the JS multi-reporter there's only a smallish number of description strings, and each one is passed to the callback many times.)

The more tabs you have open, the more compartments you'll have, and the more memory that will be used by xpconnect to generate about:memory.

And the problem is worse on Linux, because we have the RSS, PSS, Virtual and Swap multi-reporters.  (They aren't used on Mac or Windows.)
Summary: Add memory reporters for xpconnect thingies → about:memory perturbs what it is measuring
Whiteboard: [MemShrink:P2] → [MemShrink:P2][see comment 6 and on]
(1) from comment 6 is a setPropertyAsAString call from JS into C++.  The only such non-test callers in our tree are crash reporter submission and the feed processor.  I would assume that you're seeing the latter...

Can we file a separate bug on that?
(In reply to Boris Zbarsky (:bz) from comment #11)
> (1) from comment 6 is a setPropertyAsAString call from JS into C++.  The
> only such non-test callers in our tree are crash reporter submission and the
> feed processor.  I would assume that you're seeing the latter...
> 
> Can we file a separate bug on that?

We can, but what is the bug?  Should we not be allocating memory there?  What should we be doing instead?
I think we may be able to avoid the allocation, yes, depending on what the JSString looks like.  A bug with steps to reproduce so I can catch this in a debugger would be lovely.
I did some measurements on about:home with DMD.  In the first I did a normal run.  In the 2nd I disabled aboutMemory.js so it just generates a blank page without calling any multi-reporters.

Differences:

- The UTF8ToNewUnicode case dropped from 626KB to 1.5KB.

- The XPCWrappedNative::GetNewOrUsed entries dropped from maybe ~350KB to roughly half that.

- Native2WrappedNativeMap::Remove dropped from 99KB to 48KB.

- The nsHashPropertyBag::SetProperty* dropped from ~50KB to less than 2KB.

- XPCStringConvert::ReadableToJSVal dropped from 29KB to 3.5KB.

That's close to 1MB.  It's hard to gauge what else might be affected, but you get the idea.

So, ideas for improving things:

- Can we structure multi-reporters differently to avoid all those C++-to-JS calls (assuming that each one causes memory allocations)?

- We need to drastically cut down our string passing.  Simply removing descriptions would go a long way;  it would be a shame, but arguably the lesser of two evils.

- We can avoid building the array of Reporters in aboutMemory.js, which isn't related to xpconnect but will still reduce the amount of memory used when generating about:memory.  I filed bug 697327 for that.
Assignee: nobody → nnethercote
No longer depends on: 697327
(In reply to Boris Zbarsky (:bz) from comment #13)
>  A bug with steps to reproduce so I can catch this in a debugger would be lovely.

Bug 697332.
For a browser run with only about:memory open, the total size of all the 'path' and 'description' strings on my Linux box is 260,000 characters.  The vast majority of these are from the RSS/PSS/virtual trees.

jlebar, I'm less happy with the whole "always generate those trees and hide them by default" approach now :/
Dropp(In reply to Nicholas Nethercote [:njn] from comment #16)
> For a browser run with only about:memory open, the total size of all the
> 'path' and 'description' strings on my Linux box is 260,000 characters.

Cutting out all descriptions cuts that to 117,000.  Then dropping all the PSS/RSS/virtual tree paths cuts that further to 9,000.
Making about:memory use less memory is a good goal, but shouldn't it also be possible to either special case memory used by it, run it in a separate process (and special case it that way), or statically allocate the description strings? It just seems to me that minimizing the footprint of about:memory may not be tenable when considering future enhancements (but maybe that's not an issue).
A big part of this is that about:memory knows very little about the memory reporters in Firefox.  It just has a generic strategy for presenting info passed to it.  Now, if it knew a lot more about the exact reporters in Firefox (e.g. their descriptions) then much less info would need to be passed from C++ to JS.  But that would lose a lot of flexibility.
It's still unclear to me why the presentation phase is inextricably tied to the measurement phase. Can't we just do all the measurements first (which would presumably just involve collecting a few dozen ints), and only generate the markup once the data is finalized?
The measurement phase and presentation phase are separate.  The problem is that the measurement phase involves a whole lot of calls from C++ to JS, and there are lots of large-ish strings passed as arguments to those calls.

Now that I think about it, although this xpconnect memory usage is showing up under DMD, it may not be showing up in about:memory.  Let's assume "heap-allocated" is the first measurement taken (and it was for me in my testing yesterday, though that's by luck rather than design).  Now, "heap-unclassified" is derived from "heap-allocated", by subtracting all the heap memory that is reported.  If all these xpconnect allocations are occurring after the first measurement, then they'll basically be invisible to about:memory.  But DMD's seeing them because its doing its analysis after all the reporters have run.  Huh.

So maybe I should do something to guarantee "heap-allocated" is the first measurement taken.  And it would also be good if the JS multi-reporter always ran next.  It already collects all its stats before calling the callback, so with those two running first the fact that JS code runs subsequently would have extremely little impact on the results.  Good!  I think the problem is not as bad as I thought.  But I'll need to know the names of the multi-reporters, which is bug 689583.
Depends on: 689583
Assignee: nnethercote → nobody
Component: XPConnect → about:memory
Product: Core → Toolkit
QA Contact: xpconnect → about.memory
Assignee: nobody → nnethercote
Running about:memory perturbs RSS by about 5mb on my machine.  Changing the order in which you read the reporters is only going to solve this general problem if you run about:memory just once, right?
Here's what I think is happening.  There's various sources of allocations when abou:memory is generated.

(1) XPConnect stuff, which lives only while about:memory is being generated.

(2) JS stuff, which lives until the next GC.

(3) DOM/layout stuff, which lives while about:memory is open.

Changing the order means that (1) is not a problem, even on subsequent runs.  (2) and (3) are always a problem if you re-run soon enough (e.g. before the next GC, or without closing about:memory first) -- that's unavoidable, but if we didn't always generate the big RSS/PSS/virtual trees it would be a much smaller problem.
Marking as a dup of bug 698846 -- the real solution to minimize perturbation is to reimplement about:memory in C++, as painful as that is.  That would get rid of all the JS perturbation, and the only memory used by about:memory would be rendering the page.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 698846
No longer depends on: 698846
You need to log in before you can comment on or make changes to this bug.