Closed Bug 1551625 Opened 5 years ago Closed 5 years ago

PROFILER_DEFAULT_STARTUP_ENTRIES value leads to excessive memory consumption

Categories

(Core :: Gecko Profiler, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: alexical, Assigned: mozbugz)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

On 4GB RAM systems, the default number of entries for startup profiling ends up starving the system of memory. This can produce significant misreadings in the profile as we eat time paging.

In bug 1540114 the default was bumped from 1 million to 10 million entries when MOZ_PROFILER_STARTUP is set.
This was because testing on the "reference" machine was so slow that the bigger buffer was required to hold the full startup profile, forcing us to write MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_STARTUP_ENTRIES=10000000 every time.

Of course we cannot please everybody. 😅

Doug, what do you think would be the highest value you can work with?
Florian, what would be your smallest value?

Additionally, we could theoretically make this value dependent on the system memory size. Would this help? Suggestions for a formula?

(Also, while we're at it we should use an exact power of 2, as any value is rounded up to that anyway, see bug 1543407.)

Doug, in the meantime, you can use MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_STARTUP_ENTRIES=1000000 for your tests.

Priority: -- → P3
Regressed by: 1540114
Keywords: regression

I tried a startup profile with 50M entries, and this is what the memory use in the task manager looked like: https://i.imgur.com/ZPH6ofQ.png This is on the 2018 ref hardware (so also 4GB of ram). https://perfht.ml/2WK8WMb is the profile I captured. It doesn't seem particularly slower than usual for a cold startup profile.

I'm likely fine with 50M (67108864?) if it works for you Florian*. If it turns out to not work for me I can always adjust it as I have been.

However, I'm realizing that a sensible default may not actually be the way to go about this. The core problem is just that it's not particularly obvious from a profile how much memory the profiler is using, and whether that is causing the system to be bottlenecked or not. I could easily see 50M being too much again if conditions change, either from more processes or larger entries(?) or <thing I don't know about>, and at that point we could start seeing spurious profiles with no clear indication that they are spurious. I only noticed the original problem because I was scratching my head at a profile, and happened to notice the lack of memory availability when I reprofiled it, but someone could easily just come to a bad conclusion, post the profile in a bug somewhere, and there would be no indication from that profile that the underlying cause was just paging.

So I'm happy with this as a band-aid, but maybe the conversation should start here about what the long-term solution should be? I see a few options:

  • Track total free physical memory on the system
  • Track page faults
  • Just include total physical memory as well as memory used specifically by the profiler in the Platform section of the top right dropdown
    • This would allow us to compute the remaining memory available, and if it's below some heuristic threshold, just display a warning to the user

It seems like the latter would be the cheapest to implement, and would generally cover us from bad conclusions. The other two bullet points would be nice, but if necessary someone can manually collect those on their system and cross-reference.

Thoughts?

* EDIT: Florian pointed out that 50M is significantly higher than what we have now, so this comment is based on a misunderstanding. I still think it stands on its own somewhat, but I need to investigate further why I'm seeing such different results on my own system.

4194304 might be a sane default worth trying until we figure out a better solution.

I often profile startup with 5M on the reference hardware and that usually seems fine.

Would it make sense to have a different buffer size for the parent process and for child processes?

Now starting with a maximum of 1u << 22, i.e. 4,194,304 entries, or 36MB per
process. (Using powers of two, because that's what we round up to anyway.)

Also giving more information in MOZ_PROFILER_HELP:

  • Reminding this is a number of entries per process.
  • Bytes per entry, and resulting total buffer sizes per process.
Assignee: nobody → gsquelart
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/38b3d475e3c1
Lower profiler max startup entries - r=florian
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: