Closed Bug 1585974 Opened 5 years ago Closed 5 years ago

DAMP profile recorded via `./mach try fuzzy --gecko-profile` misses most of the test data

Categories

(Core :: Gecko Profiler, defect, P1)

71 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: jdescottes, Assigned: jdescottes)

References

Details

Attachments

(1 file)

I pushed to try to run Talos DAMP tests with profiling using:

./mach try fuzzy --gecko-profile

and selecting test-linux64/opt-talos-damp-e10s

You can see the push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=c4048b6d959d3819662f9766d98763a811419124

If you select the damp job and open the profile (https://perfht.ml/32hJ9Oc), all the profiles only have data for the end of each cycle (last 30 seconds).

This might be a buffer size issue, but it seems that ./mach try doesn't support the argument --gecko-profile-entries.

May be due to bug 1562604, which stores markers directly into the profiler buffer (they were allocated outside before).
I need to investigate how much room the markers take, and either pre-allocate a bigger buffer to handle the extra load, and/or add --gecko-profile-entries for these tests.

Assignee: nobody → gsquelart
Priority: -- → P1
Version: unspecified → 71 Branch

Similar (or exact same?) issue reported on Slack:

we (@bdahl and I [@bgrinstead]) have been debugging tart issue for bug 1492582 and found when running ./mach talos-test we were not getting very useful profiles. We figured out this was due to gecko_profile_interval = 10 being set as the default value (which is also done on a number of other tests https://searchfox.org/mozilla-central/search?q=gecko_profile_interval&path=testing%2Ftalos%2Ftalos%2Ftest.py).
We worked around this (now running a command like ./mach talos-test -a tart --cycles 1 --gecko-profile --gecko-profile-interval=1 for testing locally), but I wonder if there's a reason for that default. If not, could it be changed to 1 like the other tests?

Julian, could you please try this --gecko-profile-interval=1, if you have the time?

Flags: needinfo?(jdescottes)

I did a small experiment, and recorded how the profiler buffer was used when recording just the main thread, and in real-life tests the new storage uses around 60% more than before, due to the extra marker data!
With default settings (main thread, compositor, dom workers) the extra use is around 23%.
And with lots of threads, it only takes a few percents more -- threads are heavy!

Now if we removed the markers, we would "only" add about 40% more data to the profiles, meaning in the example from comment 0, the ~30-second window would grow to ~45 seconds, still well short of the full 185 seconds.

Julian, would you have a link to an old "better" profile?
Was the data covering the full hundreds-of-seconds range?

(In reply to Gerald Squelart [:gerald] (he/him) from comment #2)

Julian, could you please try this --gecko-profile-interval=1, if you have the time?

Just thought about this one a bit more, and in fact that's a different issue, where the sampling interval was too small to be useful -- sorry Brian!
So Julian, there's no need to check for that.

Not sure, here is one which is 135s long from Firefox 67 and recorded in early March, https://perfht.ml/2J6q75Z .
But I don't know if it comes from a local DAMP run or from a CI push.

I am pretty sure I had valid profiles when investigating perf issues in the following bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1556729#c12 . But all the profiles linked here are now 404 (I guess they were relying on the taskcluster artifacts).

I also found the following bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1486374 where Alex found that gecko_profile_entries had to be bumped from 1,000,000 to 2,000,000 in order to record a full run (https://searchfox.org/mozilla-central/source/testing/talos/talos/test.py#458)

I will try bumping it to 10,000,000 (some tests use this value) to see if it changes anything.

Flags: needinfo?(jdescottes)

Using 10,000,000 entries allows to see the full profile: https://perfht.ml/2ppmgde (73 MB !)
I guess you are trying to understand why so many entries are needed before we decide if gecko_profile_entries should be updated

Edit: interestingly the older profile from Firefox 67 was not that small either, 66MB.

Do you think we should bump gecko_profile_entries temporarily while you investigate this on the profiler side?

Flags: needinfo?(gsquelart)

Sorry I didn't get back to you yesterday.

(In reply to Julian Descottes [:jdescottes] from comment #5)

Using 10,000,000 entries allows to see the full profile: https://perfht.ml/2ppmgde (73 MB !)
I guess you are trying to understand why so many entries are needed before we decide if gecko_profile_entries should be updated

Edit: interestingly the older profile from Firefox 67 was not that small either, 66MB.

(In reply to Julian Descottes [:jdescottes] from comment #6)

Do you think we should bump gecko_profile_entries temporarily while you investigate this on the profiler side?

Yes, I wanted to make sure that the new storage structure (including marker data) was the cause of the now-limited profiles.
I'm still not 100% certain, but anyway you've demonstrated that making the buffer bigger fixes your problem, so that should be good enough to go ahead, and I can research further later on.

You're more familiar with updating these numbers in these tests, and you obviously already have a patch, so would you mind just going ahead with it? 😉
I'm happy to review, but you may also want to involve the main authors and/or maintainers of these tests.

Assignee: gsquelart → jdescottes
Flags: needinfo?(gsquelart)

The current limit of 2 000 000 entries only allows
to record part of a DAMP run. The profiler team will investigate
if the number of entries can/should be reduced, but for the time
being we bump the limit to 10 000 000 in order to record full
profiles when running DAMP on try.

Works for me! Patch added.

Attachment #9101783 - Attachment description: Bug 1585974 - Increase gecko_profile_entries to 10 000 000 in talos DAMP configuration → Bug 1585974 - Increase gecko_profile_entries to 10 million in talos DAMP configuration
Pushed by jdescottes@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3b1f06fb2cb2
Increase gecko_profile_entries to 10 million in talos DAMP configuration r=rwood
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: