Last Comment Bug 722969 - Don't use innerHTML when generating about:memory
: Don't use innerHTML when generating about:memory
Status: RESOLVED FIXED
[MemShrink]
:
Product: Toolkit
Classification: Components
Component: about:memory (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla13
Assigned To: Nicholas Nethercote [:njn]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-31 20:23 PST by Nicholas Nethercote [:njn]
Modified: 2012-02-07 12:13 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (31.89 KB, patch)
2012-01-31 20:23 PST, Nicholas Nethercote [:njn]
justin.lebar+bug: review+
Details | Diff | Splinter Review

Description Nicholas Nethercote [:njn] 2012-01-31 20:23:31 PST
Created attachment 593303 [details] [diff] [review]
patch

There are four basic stages involved in generating the "explicit" tree for a
process, which is most of the work that aboutMemory.js does.

1. We enumerate all the memory reporters, resulting in a list of reports.
Each reporter has a path that encodes its position in the tree.

2. We build a tree from that list.

3. We serialize that tree as HTML.

4. We set .innerHTML on the top-level <div> element, causing Firefox to parse
it and convert it to a DOM tree.

In other words, we build a tree, serialize it, then deserialize it.  This
allocates a lot more memory than necessary, something which is obviously
undesirable for about:memory.

A more obvious thing is to generate the DOM tree immediately in step 2.  The
form of the tree in step 2 is sufficiently different to the DOM tree that this
isn't easy.  However, we can certainly skip step 3.  That's what this patch 
does:  it generates the DOM tree directly (with document.createElement, etc).

The main advantage of this is that it causes about:memory to perturb its own 
measurements less.  I found a methodology for evaluating this that wasn't too
bad.  If you hit the "update" button on about:memory?verbose over and over 
again, the counters bounce around, but some of them tend to increase by the
same amount frequently.  I instrumented aboutMemory.js to show the delta
for "js-string-total", "js-objects-total" and "heap-unclassified", all of
which have this behaviour, and together which they account for much of the
memory allocated by about:memory, AFAICT.  (I played with this a *lot*, trying
various metrics before settling on this one.)

Results:

                  | string-total    obj-total   heap-unc        combined
  -----------------------------------------------------------------------------
  before          | 5.83MB          6.23MB      4.30MB          16.25MB
  after           | 2.98MB          4.37MB      4.18MB          11.83MB

So, before the patch, when hitting "update" the js-string-total number would 
usually go up by ~5.83MB.

So, roughly a 30% reduction.  Note that this was on Linux which shows the 
vsize/rss/pss/swap trees.  If those trees are omitted (as happens on other
platforms, and on Linux in non-verbose mode) the numbers are much smaller.

The other advantage of avoiding innerHTML is that we don't have to worry about
sanitizing URLs that contain chars like '<', '&', etc.  So it's a minor 
security risk reduction.
Comment 1 Justin Lebar (not reading bugmail) 2012-02-01 08:43:27 PST
Doesn't treeNameMatches need to operate on safe paths?  Similarly, buildTree does r._unsafePath.split('/').  But isn't the whole point of "unsafe" now that the name may contain bad forward slashes?

I'm really tempted to push the "make safe" routine into the memory reporters (only two include URLs).  Then we could have confidence that we handle "/" characters correctly, and we could get rid of all this safe/unsafe gunk in about:memory.

Note that about:memory is not the only consumer of memory reporters, and any correct consumer is going to have to figure out which "/" chars are reporter separators and which are part of URIs.  So there's a good reason to push this up into the relevant memory reporters.

(Using the analogy to filesystems we have going on here, you'd never expect a filesystem to figure out when to escape your forward slashes.  Forward slash means new directory, and if you want something else, you need to use an escape sequence.)
Comment 2 Justin Lebar (not reading bugmail) 2012-02-01 08:45:11 PST
Comment on attachment 593303 [details] [diff] [review]
patch

r- per comment 1.
Comment 3 Justin Lebar (not reading bugmail) 2012-02-02 17:17:13 PST
njn
Comment 4 Justin Lebar (not reading bugmail) 2012-02-02 17:21:13 PST
er...njn has a plan to get rid of the safe/unsafe distinction, either by ignoring path separators inside parentheses, or by using a different path separator, or by not performing the "\" to "/" dance we currently do.

In any case, that work will be a follow-up, so I'm back on the hook to review this.
Comment 5 Justin Lebar (not reading bugmail) 2012-02-06 06:29:57 PST
Comment on attachment 593303 [details] [diff] [review]
patch

>+// Forward slashes in URLs in paths are represented with backslashes to avoid
>+// being mistaken for path separators.  Paths/names/descriptions where this
>+// hasn't been undone are prefixed with "unsafe", the rest are "safe".

Nit: Comma splice.  (You seem to do this frequently, your style is
unimpeachable otherwise, I wonder if it's a question of dialect?)

> function makeSafe(aUnsafeStr)
> {
>-  return escapeAll(flipBackslashes(aUnsafeStr));
>+  return aUnsafeStr.replace(/\\/g, '/');
> }

It occurs to me that "safe" is not the best word here; this may be why I was confused earlier about what was going on.

When you replace \ with /, the string is now *unsafe*, in that you can't tell where the real path separators lie!

But if you're going to get rid of the safe/unsafe distinction as a follow-up, it can stay for now.

>+function appendTextNode(aD, aText)
>+{
>+  var e = document.createTextNode(aText);
>+  aD.appendChild(e);
>+  return e;
>+}

What's "D" stand for here and elsewhere?  If it's "description", that certainly
doesn't match how it's used...

>+function appendElement(aD, aTagName, aClassName)
>+{
>+  var e = document.createElement(aTagName);
>+  if (aClassName) {
>+    e.className = aClassName;
>+  }
>+  aD.appendChild(e);
>+  return e;
>+}
>+
>+function appendElementWithText(aD, aTagName, aClassName, aText)
>+{
>+  var e = appendElement(aD, aTagName, aClassName);
>+  appendTextNode(e, aText);
>+  return e;
>+}
>+
>+function appendElementWithLink(aD, aTagName, aClassName, aLink, aText)
>+{
>+  var e = appendElementWithText(aD, "a", aClassName, aText);
>+  e.href = aLink;
>+  return e;
>+}
>+
>+function appendButton(aD, aTitle, aOnClick, aText, aId)
>+{
>+  var b = appendElementWithText(aD, "button", "", aText);
>+  b.title = aTitle;
>+  b.onclick = aOnClick
>+  if (aId) {
>+    b.id = aId;
>+  }
>+  return b;
>+}

In general, I'm not a fan of short functions with many arguments, because they
obfuscate code.

appendTextNode and appendElementWithText are fine, although I'm not sure why we
need the |if aClassName| branch; is setting e.className = "" or null wrong?

I'd get rid of appenElementWithLink; it's not adding much over
appendElementWithText and occurs infrequently.

I'd be fine with appendButton if it were declared right above where it's used.
It doesn't look like you use the |if aId| branch.

See also http://doc.qt.nokia.com/qq/qq13-apis.html#theconveniencetrap

> function update()
> {
>   // First, clear the page contents.  Necessary because update() might be
>   // called more than once due to ChildMemoryListener.
>-  var content = $("content");
>-  content.parentNode.replaceChild(content.cloneNode(false), content);
>-  content = $("content");
>-
>-  if (gVerbose)
>-    content.parentNode.classList.add('verbose');
>-  else
>-    content.parentNode.classList.add('non-verbose');
>+  var oldContent = document.getElementById("content");
>+  var content = oldContent.cloneNode(false);
>+  oldContent.parentNode.replaceChild(content, oldContent);
>+  content.parentNode.classList.add(gVerbose ? 'verbose' : 'non-verbose');

This kind of looks like we're trying to support using update() to switch between verbose and non-verbose, but of course this code won't work for that.

Could we set content.parentNode.classList from onload()?

Also, it looks like content.parentNode is the <html> element.  I have no idea what the consequences are of setting a className on that element, but if it works, I guess it's OK.

>-function genProcessText(aProcess, aReporters, aHasMozMallocUsableSize)
>+function appendProcessElements(aD, aProcess, aReporters,
>+                               aHasMozMallocUsableSize)
> {
>+  appendElementWithText(aD, "h1", "", aProcess + " Process");
>+  appendTextNode(aD, "\n\n");   // gives nice spacing when we cut and paste
>+
>+  // We'll fill this in later.
>+  var warningsDiv = appendElement(aD, "div", "accuracyWarning");
>+
>   var explicitTree = buildTree(aReporters, 'explicit');
>   var hasKnownHeapAllocated = fixUpExplicitTree(explicitTree, aReporters);
>   sortTreeAndInsertAggregateNodes(explicitTree._amount, explicitTree);
>-  var explicitText = genTreeText(explicitTree, aProcess);
>+  appendTreeElements(aD, explicitTree, aProcess);
> 
>+  var nameSpan = appendElementWithText(aD, "span", "mrName",
>+                                       makeSafe(aUnsafeName));
>+  nameSpan.title = kindToString(aKind) + makeSafe(aUnsafeDesc);

Ugh.  I'd much rather make the description safe as soon as we create the node.
But whatever, it's all going away.

>- * Generates the text for the tree, including its heading.
>+ * Appends the elements for the tree, including its heading.
>  *
>+ * @param aD
>+ *        The parent DOM node.

So D stands for "DOM"?  I guess you want to distinguish between DOM parents and
memory reporter parents?  I'd still go with "aP" or "aParent"; it's maybe
slightly misleading, but still more descriptive than "aD".  aNode or aN would
be fine with me too.

Also, the parameter here is aDOuter, not aD.
Comment 6 Nicholas Nethercote [:njn] 2012-02-06 13:30:32 PST
> Nit: Comma splice.  (You seem to do this frequently, your style is
> unimpeachable otherwise, I wonder if it's a question of dialect?)

It's probably just a Nick-ism.  I'll fix.

> > function makeSafe(aUnsafeStr)
> > {
> >-  return escapeAll(flipBackslashes(aUnsafeStr));
> >+  return aUnsafeStr.replace(/\\/g, '/');
> > }
> 
> It occurs to me that "safe" is not the best word here; this may be why I was
> confused earlier about what was going on.
> 
> When you replace \ with /, the string is now *unsafe*, in that you can't
> tell where the real path separators lie!

"Safe" was clearer when we had to escape chars like '<' -- it meant that the string was "safe to assign to innerHTML".  Now it means "safe to be put in a DOM node".  As you say, it'll be good to get rid of the distinction.


> >+  var e = document.createTextNode(aText);
> >+  aD.appendChild(e);
> >+  return e;
> >+}
> 
> What's "D" stand for here and elsewhere?  If it's "description", that
> certainly
> doesn't match how it's used...

It was meant to be "DOM node", and I admit it is a poor name.  I'll use |aP| as you suggest below.

> I'd be fine with appendButton if it were declared right above where it's
> used.
> It doesn't look like you use the |if aId| branch.

It is used for the "Update" button.


> This kind of looks like we're trying to support using update() to switch
> between verbose and non-verbose, but of course this code won't work for that.
> 
> Could we set content.parentNode.classList from onload()?

This patch didn't materially change how this code works, but I'll see if I can set it from onload().
Comment 7 Nicholas Nethercote [:njn] 2012-02-06 15:51:26 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/5bbfd1e5ebf7
Comment 8 Ed Morley [:emorley] 2012-02-07 12:13:26 PST
https://hg.mozilla.org/mozilla-central/rev/5bbfd1e5ebf7

Note You need to log in before you can comment on or make changes to this bug.