Closed Bug 472209 Opened 15 years ago Closed 10 years ago

Graph memory usage over time

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 923275

People

(Reporter: vlad, Unassigned)

References

Details

(Whiteboard: [MemShrink:P3][mentor=nnethercote@mozilla.com])

Attachments

(4 files, 4 obsolete files)

Attached patch initial ideas (obsolete) — Splinter Review
Now that bug 392351 is in, we should implement various reporters for the different components.  I'm going to attach a patch here with a rough first pass, but eventually this can probably be turned into a tracking bug with separate component bugs.

Preemptively marking this wanted1.9.2, and it's potentially blocking 1.9.2 -- this would be a great feature to have.
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Here's a simple html tool that'll query the memory report data and graph/display the results.  Needs privs.
Flags: blocking1.9.2? → blocking1.9.2-
Blocks: DarkMatter
Vlad, attachment 355473 [details] [diff] [review] is totally out of date now, right?  Looks like much of what it adds has been added elsewhere.

The HTML in attachment 355474 [details] is still interesting, though.
Attachment #355473 - Attachment is obsolete: true
I've obsoleted attachment 355473 [details] [diff] [review] and renamed the bug to reflect the graphing code in attachment 355474 [details].
Summary: Implement memory reporters for various components → Graph memory usage over time
Blocks: MemShrinkTools
No longer blocks: DarkMatter
Whiteboard: [MemShrink:P2]
Jez, since you're looking at this stuff, we decided in the MemShrink meeting to assign this to you :)  I don't know if the existing code is any use, feel free to ignore it if not.
Assignee: vladimir → jezreel
Depends on: 670084
Attached patch Patch v1 (WIP) (obsolete) — Splinter Review
Very much a WIP. This patch plots a stacked line graph of the memory reporters. One needs to apply the patch in bug 670084 as well for it to work. What I'm thinking of doing next:

- right now the colors are assigned randomly. I'm planning to assign 'canonical' colors to the reporters, so they can be recognized at a glance.
- improve appearance: labels aren't very clear at the moment, and aliasing makes things kind of ugly (not sure if I can fix the latter)
- tidy up the code: move the graphing code into a separate .js file, remove hacks

I would love to get your feedback on what I should improve -- e.g. what kinds of information you would like to see when mousing over the graph.
It's Friday night here, I'll take a look on Monday morning.  Good to see you're making progress.
Hey, that's a pretty cool start!  Thoughts:

- This should only turn on when the user asks for it.  Eg. add a button.  And a "stop" button would also be good.

- It's a shame you lose data off the left side once the graph fills up.  Is it worth compressing things horizontally so you don't?  This would particularly make sense if you had "start" and "stop" buttons.  If you hit "start" again maybe it would clear any old data.

- I also wonder if a manual mode would be good -- ie. a mode where the graph is only updated when the user hits a "snapshot" button instead.  Or maybe you could combine both modes by always allowing manual snapshots, and also allowing the auto-tick time to be "infinite".

- The auto-scaling of the y-axis as memory usage increases is good.  When memory decreases it sometimes scales too much -- part of the graph disappears up the top.

- After a while I saw something weird going on as the graph moves leftwards;  every second tick some of the cliffs didn't shift leftwards -- the shape of the graph was changing as it was sliding left, which surely isn't right.  I saw this after loading a page and then closing it, ie. once past the peak usage point.

- The user should be able to choose their update interval.  Eg. if you're running this overnight to diagnose a slow leak, you'd only want it to update every minute or even less frequently.  One second seems reasonable as a minimum update interval, and maybe the default should be 5 or 10.

- The x-axis needs labels.  Timestamps would be ideal, though I'm not sure the best way to fit them in.

- If someone takes a screenshot, it won't be very informative because you can't tell what any of the graph segments are.  Labels would be good but I can see that would be difficult.

- Have you broken the threshold code?  I'm seeing lots of sub-trees that account for less than 0.5% of 'explicit' and should be hidden.  For example:

│      └──0.10 MB (00.14%) -- search.sqlite
│         ├──0.09 MB (00.13%) -- cache-used
│         ├──0.01 MB (00.01%) -- stmt-used
│         └──0.00 MB (00.00%) -- schema-used

  The only sub-trees being omitted now seem to be the ones that have zero bytes.  I wonder if the patch in bug 670084 is responsible.

- The JS code that runs each tick has a very noticeable effect.  If I have only about:memory open I see a clear sawtooth pattern.  It would be good to know what accounts for most of this -- buildTree() or the graph construction.  You could determine this by changing buildTree() to return a fake, constant tree, and see if the sawtooth diminishes a little or a lot.
Attached image Screenshot for Patch v1 (obsolete) —
I'm uploading a screenshot so others can have a look without applying the patch.
Attached patch Patch v2 (obsolete) — Splinter Review
> - This should only turn on when the user asks for it.  Eg. add a button. 
> And a "stop" button would also be good.
Done! The same button does start and stop. I could probably place it in a better location, though -- right now it's just placed at the bottom.
 
> - It's a shame you lose data off the left side once the graph fills up.  Is
> it worth compressing things horizontally so you don't?  This would
> particularly make sense if you had "start" and "stop" buttons.  If you hit
> "start" again maybe it would clear any old data.
Done!

> - I also wonder if a manual mode would be good -- ie. a mode where the graph
> is only updated when the user hits a "snapshot" button instead.  Or maybe
> you could combine both modes by always allowing manual snapshots, and also
> allowing the auto-tick time to be "infinite".
Not done yet. Will be working on it soon if the other things about the graph are okay.

> - The auto-scaling of the y-axis as memory usage increases is good.  When
> memory decreases it sometimes scales too much -- part of the graph
> disappears up the top.
Should be fixed now. Actually, since we're not losing any data, the y-axis never needs to scale down, so I've taken out most of the scaling code.

> - After a while I saw something weird going on as the graph moves leftwards;
> every second tick some of the cliffs didn't shift leftwards -- the shape of
> the graph was changing as it was sliding left, which surely isn't right.  I
> saw this after loading a page and then closing it, ie. once past the peak
> usage point.
I'm not too sure how this came about -- I saw it earlier, too, but I thought I'd fixed it already -- but once again this shouldn't show up if we're not deleting old values.

> - The user should be able to choose their update interval.  Eg. if you're
> running this overnight to diagnose a slow leak, you'd only want it to update
> every minute or even less frequently.  One second seems reasonable as a
> minimum update interval, and maybe the default should be 5 or 10.
Done! It's currently set at 1s for ease of testing, but I'll change the default later.

> - The x-axis needs labels.  Timestamps would be ideal, though I'm not sure
> the best way to fit them in.
Done! The labels might not be 100% precise though -- I'm calculating them based on the given update interval, not the exact timestamp. So if the browser blocks for a few seconds on some operation and the graph doesn't update during that time, the labels on the x-axis will not reflect that.

> - If someone takes a screenshot, it won't be very informative because you
> can't tell what any of the graph segments are.  Labels would be good but I
> can see that would be difficult.
I've given the most common categories a fixed hue -- so /js/ reporters are all blueish, for instance. The saturation and value numbers are still generated randomly. There's also the option (for us, not the user) to given a specific reporter a precise color -- e.g. explicit/heap-unclassified has HSV values of [0, 0.7, 0.7], which is a dull red.

> - Have you broken the threshold code?  I'm seeing lots of sub-trees that
> account for less than 0.5% of 'explicit' and should be hidden.  For example:
> 
> │      └──0.10 MB (00.14%) -- search.sqlite
> │         ├──0.09 MB (00.13%) -- cache-used
> │         ├──0.01 MB (00.01%) -- stmt-used
> │         └──0.00 MB (00.00%) -- schema-used
> 
>   The only sub-trees being omitted now seem to be the ones that have zero
> bytes.  I wonder if the patch in bug 670084 is responsible.
Yeah, will be updating the patch to 670084 to fix that.

> - The JS code that runs each tick has a very noticeable effect.  If I have
> only about:memory open I see a clear sawtooth pattern.  It would be good to
> know what accounts for most of this -- buildTree() or the graph
> construction.  You could determine this by changing buildTree() to return a
> fake, constant tree, and see if the sawtooth diminishes a little or a lot.
This issue is still present. If we really want to avoid it, I'm thinking that we could log the memory data directly to a file and do our postprocessing / graphing later, thereby avoiding most of the memory usage.
Attachment #544741 - Attachment is obsolete: true
Attached image Screenshot for Patch v2
Updated screenshot.
Attachment #546665 - Attachment is obsolete: true
Attached patch Patch v2Splinter Review
Sorry for the spam, I obsoleted the wrong file when uploading the screenshot.. here's patch v2 again.
Attachment #545967 - Attachment is obsolete: true
Comment on attachment 546666 [details]
Screenshot for Patch v2

The colour changes look really good.

You can un-obsolete a patch under the "edit details" link.
Comments on the graph (not looking at the code):

- It'd be nice to have <hr></hr> before the "start plotting" button, to separate it out a bit.

- You can enter '0' in as the sampling frequency and it'll plot really fast.  That's fine, but the x-axis labels are all 0.  Likewise, you can put a negative number and get negative x-axis labels.

- I get a light coloured vertical strip on the LHS of the graph which covers the first 7 or so plot points.  (See the attached screenshot.)  That doesn't look right.

- The y-axis labels are in bytes.  Since they're all multiples of 1024*1024, MB would be better.

- Once the graph reaches the RHS, it rescales itself to take up half the available horizontal space.  I wonder if it would be better if it always kept itself the full width of the canvas?  The advantage would be that if you want to take a screenshot, you're always getting the best resolution possible.  Or if that looks weird, a compromise might be to shrink to 80% of the width instead of 50%?

- As mentioned before, a manual snapshot mode would be great.

- The whole show-the-name-when-you-hover-the-mouse-over-a-segment feature seems a little flaky -- often when I move the mouse around the graph, the name doesn't update as fast and reliably as I'd expect.  In particular, when going over lots of small segments it's often blank.  It also doesn't seem to work at all in the light-coloured strip on the LHS.

- Canvas has this weird thing where lines come out crisper if you draw them along coordinates that end in 0.5, as opposed to integral (or anything else) coordinates.  Eg. change this line:

          var pos = (1 - val / this.yMax) * this.graphHeight;

  to this:

          var pos = Math.floor((1 - val / this.yMax) * this.graphHeight) + 0.5;

  and you get much crisper horizontal scale lines.  I think the text is a bit crisper too.  (Without this change, sometimes as the graph goes up and the scale lines are shifted, you see them go from blurry to crisp or vice versa because of this.)  I did a similar thing in sg_PlotLine and got the graph itself looking less blurry.  And there may be other places you can do the same thing.

- Arial's a pretty gross font for the labels :P

- Another problem is that a lot of the segments are very small.  For these ones ideally you wouldn't show the leaf nodes, interior nodes that are bigger -- e.g. you might like to combine all the sqlite reporters into a single segment.  But that seems hard.

- What determines the vertical ordering of the segments in the graph, BTW?  Is it consistent?  Seems like it should be.

- Using similar colours for reporters belonging to the same sub-tree (eg. blueish colours for js/*) is really good.  But it's not clear which colours apply where, where one group of colours starts and the next begins.  A legend might be nice?

That's a lot of comments, but I think it's looking really good!  I'm still worried about the sawtoothing, but I tried a slower profile where it was snapshotting every 3 seconds and it seems much less of a problem then, so that's good.
(In reply to comment #13)
> Created attachment 547612 [details]
> screenshot with light-coloured bar on LHS
> 
> Comments on the graph (not looking at the code):
> 
> - It'd be nice to have <hr></hr> before the "start plotting" button, to
> separate it out a bit.

Yeah, the current layout was meant to be ad-hoc because I wasn't sure how many other controls we were going to add.

> 
> - I get a light coloured vertical strip on the LHS of the graph which covers
> the first 7 or so plot points.  (See the attached screenshot.)  That doesn't
> look right.
> 

It's intentional -- without the white strip, the y-axis labels become impossible to read.. I'm trying to decide whether I should just put the labels outside the graph instead. The downside is that quite a lot of space would be wasted on the labels alone.

> - The y-axis labels are in bytes.  Since they're all multiples of 1024*1024,
> MB would be better.

I co-opted formatBytes(), and I guess you clicked on 'more verbose' which made it print out bytes. Yeah, I'll fix it.

> 
> - Once the graph reaches the RHS, it rescales itself to take up half the
> available horizontal space.  I wonder if it would be better if it always
> kept itself the full width of the canvas?  The advantage would be that if
> you want to take a screenshot, you're always getting the best resolution
> possible.  Or if that looks weird, a compromise might be to shrink to 80% of
> the width instead of 50%?
> 

What about adding a 'zoom to fit' button?

> - The whole show-the-name-when-you-hover-the-mouse-over-a-segment feature
> seems a little flaky -- often when I move the mouse around the graph, the
> name doesn't update as fast and reliably as I'd expect.  In particular, when
> going over lots of small segments it's often blank.  It also doesn't seem to
> work at all in the light-coloured strip on the LHS.

I can't really help this -- the name is obtained by getting the color under the cursor. Unfortunately small segments get antialiaised, so their rendered color doesn't match their assigned one. I don't think there's a good way around this in canvas.

> 
> - What determines the vertical ordering of the segments in the graph, BTW? 
> Is it consistent?  Seems like it should be.
>

The code copies the order of the initial ascii tree. If new reporters come in halfway, they are pushed onto the end of the array for their category -- so storage/sqlite/[newreporter] will appear as the storage/sqlite/* reporter that is closest to the x-axis. So the order is not necessarily consistent across runs, but it's consistent within each run.
 
> - Using similar colours for reporters belonging to the same sub-tree (eg.
> blueish colours for js/*) is really good.  But it's not clear which colours
> apply where, where one group of colours starts and the next begins.  A
> legend might be nice?

It would be a pretty long list though. Do we want to keep the graph constantly updated as well? Then we could append the colors to their respective reporters in the graph.
(In reply to comment #14)
> 
> What about adding a 'zoom to fit' button?

Hmm, I like my suggestions better :)

> > - Using similar colours for reporters belonging to the same sub-tree (eg.
> > blueish colours for js/*) is really good.  But it's not clear which colours
> > apply where, where one group of colours starts and the next begins.  A
> > legend might be nice?
> 
> It would be a pretty long list though. Do we want to keep the graph
> constantly updated as well? Then we could append the colors to their
> respective reporters in the graph.

I was imagining that you'd just have one entry for each primary sub-tree:  js, storage, images, etc.  Next to the "js" label you'd have a little box that was shaded to include all the colours that js segments get in the graph, i.e. dark blue to light blue.
With the latest patch (patch v3) in bug 670084, the current patch needs to have 'getReporters' renamed to 'getReportersByProcess' on line 1195 for it to work.
Jez has finished his internship and won't have time to work on this any more.  But he's made great progress, and hopefully someone else will be able to complete it.  Thanks, Jez!
Assignee: jezreel → nobody
Whiteboard: [MemShrink:P2] → [MemShrink:P2][mentor=nnethercote@mozilla.com]
Assigning to myself after I got the old patch working. Next step: Shape the code so I don't have to be embarrassed when submitting it.
Assignee: nobody → archaeopteryx
ToDos for this bug:

Code
- move the graphing code into a separate .js file
- validate user input update interval

UI
- update interval: decide on values: 1s minimum, 5 or 10 default?
- 'canonical' colors, not random, so they can be recognized at a glance: sqlite ("Storage") or IndexedDB as storage.
- axis labels outside of the graph, so no overlap with graph data
- implement graph labels / legend
- always use full width

Ideas
- keep data until graph cleared, maybe use horizontal scrolling?
- run in manual mode, only getting memory data on user request
- export as image
- export as data
- zooming

Questions
1. Should old data be cleared when a measurement starts and a former automatic one had been stopped before?
2. As far as I see, memory reporters rising above a threshold (what value? 0.5%?) should be plotted separately from a catch-everything-below-threshold plot. This can cause some repainting and combined with anti-aliasing problems at the borders between plots, I favor SVG for creating the graph. Any opinions on how to manage it? What about diving the memory reporter value by a constant (25,000?) and use that value as the internal "height" of the reporter. The SVG's viewport would be managed to handle the scaling.
3. How should the x scale (time) be managed? If the user manually takes memory snapshots, moving on with a fixed width of pixels looks odd. On the other side, long pauses between intervals of frequent measurements looks strange if it is really time synchronous. 
4. The mass of memory reporters raises the question of what should be regarded as one dataset, so it will be still recognized in the whole chart. I propose going up to the second level reporters (layout shells, js compartments, ...), but the question remains if the result will still be something useful ("Ok, we see website example.com using a huge amount of memory, but don't see why?")
> 1. Should old data be cleared when a measurement starts and a former
> automatic one had been stopped before?

Hmm, no point throwing data away if we don't have to.  But having a "clear" button might be useful in case you want to throw the data away.

> 2. As far as I see, memory reporters rising above a threshold (what value?
> 0.5%?) should be plotted separately from a catch-everything-below-threshold
> plot. This can cause some repainting and combined with anti-aliasing
> problems at the borders between plots, I favor SVG for creating the graph.
> Any opinions on how to manage it? What about diving the memory reporter
> value by a constant (25,000?) and use that value as the internal "height" of
> the reporter. The SVG's viewport would be managed to handle the scaling.

The threshold in about:memory's "explicit" tree is 0.5%, but that's just something that about:memory chooses to do.  (about:memory?verbose shows everything.)  So the threshold is orthogonal to a graph.

As for SVG, I don't have a strong preference.  If it works, use it!

> 3. How should the x scale (time) be managed? If the user manually takes
> memory snapshots, moving on with a fixed width of pixels looks odd. On the
> other side, long pauses between intervals of frequent measurements looks
> strange if it is really time synchronous. 

Good question.  I don't know.  Experimentation is probably needed.

> 4. The mass of memory reporters raises the question of what should be
> regarded as one dataset, so it will be still recognized in the whole chart.
> I propose going up to the second level reporters (layout shells, js
> compartments, ...), but the question remains if the result will still be
> something useful ("Ok, we see website example.com using a huge amount of
> memory, but don't see why?")

Another good question, and again I don't know.  FWIW, there's a medium-term plan to report memory on a per-tab basis instead of the js/gfx/layout/storage/... split, see bug 687724.

Thanks for taking this on, BTW!
Whiteboard: [MemShrink:P2][mentor=nnethercote@mozilla.com] → [MemShrink:P3][mentor=nnethercote@mozilla.com]
We downgraded this to P3 because we can't think of any cases where this graphing would have actually made a difference in hunting down a problem such as a leak.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.