Open Bug 1656553 Opened 5 years ago Updated 5 years ago

Long-running mode would be useful for tracking memory leaks

Categories

(Core :: Gecko Profiler, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: sfink, Unassigned)

Details

Over in bug 1653088, I realized that the JS or native memory allocation tracking is exactly what I need to best track down where memory is being leaked. Well, "leaked" is a strong word; I suspect that the memory is reachable, it's just that a lot of it is piling up.

Except! It doesn't work because the memory growth happens at an unknown time. Having a limited rolling time window will lose the allocation information about large piles of accumulated memory that never gets freed, unless the allocations happen to be recent when you click the upload button.

(It's possible that I'm completely wrong about this, and what I'm requesting in this bug is already being done.)

I guess what I imagine wanting here is for a mode where the rolling window for memory allocations is based on memory usage, rather than time: instead of keeping the buffer size limited by discarding old entries, expire randomly selected allocations (perhaps weighted by size?) That way, after some time the buffer would settle into an unbiased sampling of memory usage -- if there are a lot of Call objects retained, for example, then the sample would have a relatively high proportion of surviving Call object allocation stacks.

And please let me know if this is already the way it works! I can tell the bug reporter to use it.

You can get the rolling window to be quite large, with the following settings:

  • Set the buffer size to the maximum (It will only allocate as much as it needs. It doesn't reserve the 2GB from the start.)
  • Check "No Periodic Sampling"
  • Uncheck "Screenshots"
  • Only profile the GeckoMain thread, no other threads
  • Set the interval to the maximum (to sample other counters less frequently)

After that, the only things that will take up noticeable amounts of buffer space will be the memory data, and regular markers from the main thread. It would be nice if we could turn off the regular markers as well, but I think we can't do that at the moment. Removing "GeckoMain" from the thread filter makes the front-end complain about having no threads, and it probably means that the memory data is lost, too.

See Markus' comment 1 for suggestions to capture longer time windows.
I would add:

  • Disable Fission if you had it.
  • Decrease the number of processes, because each process would add data and therefore reduce the profiling window. Modify dom.ipc.processCount in about:config.

When we capture markers (and anything else) they get serialized into the profiler buffer, after which we can't (or couldn't easily) separate them and choose to discard some and not others.

There are some vague ideas about selecting subsets of markers (similar to how we already select threads), so maybe in the future it would be possible to only capture memory allocations markers.
We are also thinking of implementing "infinite"-ish profiling, by storing the profiling data to disk; But then analyzing mountains of data may be difficult!

Have you tried DMD? It only deals with memory tracking so it might be better for your case, but it's much less user-friendly. https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD

Severity: -- → N/A
Priority: -- → P3

Thanks, turning off sampling and things sounds like it might work.

I am personally familiar with using DMD, but my question here originated in a scenario where I am trying to get information from an end user (bug 1653088). I am unable to reproduce the large memory usage myself. They have a pinned tab that occasionally grows to an enormous JS GC heap size. about:memory tells me it's a combination of Call + Object + Array + Function objects (I'm guessing each allocation is about 1 of each), which sounds to me like it might be retaining lots of closures. I wanted to see the JS stacks for them, since I suspect it's some misbehaving JS code that's hanging onto a lot of stuff and sending us into a near-endless GC spiral.

(It'd be kind of cool to have a rough estimate of how long the rolling window is, based on the options you have selected, but I guess it would be wildly approximate. Seconds vs minutes vs hours?)

Oh, sorry. I should mention that although I could perhaps get the reporter to run a DMD build, my understanding is that that would only give me native stacks. And here, I think it's the JS stacks that would be most relevant.

You need to log in before you can comment on or make changes to this bug.