Telemetry probes using TOPIC_CYCLE_COLLECTOR_BEGIN may take more time than the CC slice itself

RESOLVED FIXED in Firefox 56

Status

()

Core
XPCOM
P3
normal
RESOLVED FIXED
10 months ago
6 months ago

People

(Reporter: smaug, Assigned: erahm)

Tracking

(Blocks: 2 bugs)

unspecified
mozilla56
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 fixed)

Details

(Whiteboard: [qf:p3][measurement:client])

Attachments

(1 attachment)

https://perfht.ml/2lbd7hU is a profile from bug 1316864 and I was surprised to see slow
telemetry stuff during CC slice.
Could we change/remove the relevant probes?
(Reporter)

Updated

10 months ago
Flags: needinfo?(n.nethercote)
Is this about the GCTelemetry?
billm added that in bug 1308040.
As far as I see, this is not about bug 1308040, but bug 678085 which this bug blocks.
For context:
https://dxr.mozilla.org/mozilla-central/rev/0eef1d5a39366059677c6d7944cfe8a97265a011/toolkit/components/telemetry/TelemetrySession.jsm#1881
https://dxr.mozilla.org/mozilla-central/rev/0eef1d5a39366059677c6d7944cfe8a97265a011/toolkit/components/telemetry/TelemetrySession.jsm#1095

Is anyone consuming & monitoring all those probes?
If yes, is there value in recording those at regular intervals (instead of e.g. once)?

Comment 4

10 months ago
MEMORY_TOTAL was added so we could monitor memory use under real-world Firefox use across processes. It was (still is?) an e10s release criteria. As recently as last week I've recommended its use as a release criteria for e10s-multi.

I have little knowledge of the other memory probes.

As for when to query these things... well, that's an interesting one, isn't it. The historical data is based on periodic and triggered reporting which almost certainly over-weights contributions from users who leave their browsers open. This alone should be reason to change our recording methodology... if only we had a good replacement.
Could we collect MEMORY_TOTAL in non-main-thread without blocking the main thread?

Comment 6

10 months ago
I suspect so, but I suspect one of the reasons for the current topic subscriptions may be to capture something close to the maximum memory level... in other words, it might be a feature, not a bug.
(In reply to Chris H-C :chutten from comment #6)
> I suspect so, but I suspect one of the reasons for the current topic
> subscriptions may be to capture something close to the maximum memory
> level... in other words, it might be a feature, not a bug.

chutten, you added this probe in bug 1198209, so you should know as well as anyone whether this is the case :)

I wasn't aware of this probe until now. Measuring it off-main-thread seems reasonable to me. resident-fast typically doesn't change that fast and so slight variations in the timing of the measurement shouldn't matter much.
Flags: needinfo?(n.nethercote)

Comment 8

10 months ago
Well, to be fair I only added MEMORY_TOTAL, I didn't add the collector topic subscription :)

So, uh, do we have a recipe (addTask?) for off-main-thread privileged JS? Or will this be a C++ rewrite?
It would need to be rewritten in C++.
See Also: → bug 1341473
Priority: -- → P3
Whiteboard: [measurement:client]
Does anyone have good criteria for what we need from these measurements?

Do we need or track anything besides MEMORY_TOTAL?
How often do we need to collect this, every N minutes?

If we rewrite MEMORY_TOTAL in C++, can we get the same results by accumulating through histograms?
(That is the only memory thing we special IPC messaging for with MESSAGE_TELEMETRY_USS & MESSAGE_TELEMETRY_GET_CHILD_USS [1])

For the record, this is what we collect in [2] right now:
(Histogram name, nsIMemoryReporterManager property)
"MEMORY_VSIZE", "vsize"
"MEMORY_VSIZE_MAX_CONTIGUOUS", "vsizeMaxContiguous"
"MEMORY_RESIDENT_FAST", "residentFast"
"MEMORY_UNIQUE", "residentUnique"
"MEMORY_HEAP_ALLOCATED", "heapAllocated"
"MEMORY_HEAP_OVERHEAD_FRACTION", "heapOverheadFraction"
"MEMORY_JS_GC_HEAP", "JSMainRuntimeGCHeap"
"MEMORY_JS_COMPARTMENTS_SYSTEM", "JSMainRuntimeCompartmentsSystem"
"MEMORY_JS_COMPARTMENTS_USER", "JSMainRuntimeCompartmentsUser"
"MEMORY_IMAGES_CONTENT_USED_UNCOMPRESSED", "imagesContentUsedUncompressed"
"MEMORY_STORAGE_SQLITE", "storageSQLite"
"LOW_MEMORY_EVENTS_VIRTUAL", "lowMemoryEventsVirtual"
"LOW_MEMORY_EVENTS_PHYSICAL", "lowMemoryEventsPhysical"
"GHOST_WINDOWS", "ghostWindows"
"PAGE_FAULTS_HARD", "pageFaultsHard"

1: https://dxr.mozilla.org/mozilla-central/rev/32dcdde1fc64fc39a9065dc4218265dbc727673f/toolkit/components/telemetry/TelemetrySession.jsm#1674,1692
2: https://dxr.mozilla.org/mozilla-central/rev/32dcdde1fc64fc39a9065dc4218265dbc727673f/toolkit/components/telemetry/TelemetrySession.jsm#1190
Benjamin, do you know who else might monitor or need any of these probes in comment 10?
Flags: needinfo?(benjamin)
ni?chutten for MEMORY_TOTAL input.
Flags: needinfo?(chutten)

Comment 13

10 months ago
The MEMORY_* probes have alert emails mentioning memshrink. Is this a current effort?

From what I remember when MEMORY_TOTAL was proposed, the MEMORY_* probes were put in place to get an idea of "typical" memory use in the wild. That's why they're histograms that are accumulated to at various points in time (events, timers) instead of just single numbers (what would now be scalars).

As for rewriting MEMORY_TOTAL in C++, I'm not sure if one process is allowed to query the memory characteristics of child processes, or if we'd need to rewrite the messaging stuff as well. It is necessary to get the RSS and USS measures at or near the same time as each other so they can be summed properly.

Wasn't there to be an about:performance rewrite that would require knowing (amongst other things) memory allocations amongst the processes? Maybe there's some work we can reuse there.
Flags: needinfo?(chutten)
(In reply to Chris H-C :chutten from comment #13)
> The MEMORY_* probes have alert emails mentioning memshrink. Is this a
> current effort?

Yes, Eric Rahm and I are still working on it. I'm not sure I recall any regressions in this measurement, though.
(In reply to Chris H-C :chutten from comment #13)
> As for rewriting MEMORY_TOTAL in C++, I'm not sure if one process is allowed
> to query the memory characteristics of child processes, or if we'd need to
> rewrite the messaging stuff as well.

Linux and Windows can probably get the child's values from the parent (unless we've locked this down via sandboxing), OSX you'd need to do it in-process (for USS at least).

> Wasn't there to be an about:performance rewrite that would require knowing
> (amongst other things) memory allocations amongst the processes? Maybe
> there's some work we can reuse there.

aboutPerformance.js uses Memory.jsm [1], which is implemented in JS.

[1] http://searchfox.org/mozilla-central/rev/60ae6514e4c559c0c234f0e7aefccb101b8beb2e/toolkit/modules/Memory.jsm

Comment 16

10 months ago
njn, I believe your team "owns" all this memory reporting. Can you own prioritizing this?
Component: Telemetry → XPCOM
Flags: needinfo?(benjamin) → needinfo?(n.nethercote)
Product: Toolkit → Core
I think it's fine to move this off-main-thread; I agree with Olli, we shouldn't synchronously measure during the CC slice. Triggering the measurement at the current point is probably fine though (or if there's a |CYCLE_COLLECTION_END| topic we could key off that).

For the purpose of comparing e10s(-multi) with non-e10s, it doesn't really matter when we measure as long as the timing is reasonably consistent (so measure after CC should be fine). As far as measuring peak memory I can see how this would be problematic, but I think we already have a measurement for that.

Chris does that sound reasonable?
Flags: needinfo?(n.nethercote) → needinfo?(chutten)
But 1198209 is clearly the cause of this. It added the extra cross-process measurements. The measurements originally added by bug 678085 were there for a long time without causing problems.

Nobody from my team was involved with bug 1198209, either writing the code or reviewing it. I suggest disabling it.
Blocks: 1198209
No longer blocks: 678085
(In reply to Nicholas Nethercote [:njn] from comment #18)
> But 1198209 is clearly the cause of this. It added the extra cross-process
> measurements. The measurements originally added by bug 678085 were there for
> a long time without causing problems.
> 
> Nobody from my team was involved with bug 1198209, either writing the code
> or reviewing it. I suggest disabling it.

This is a gating measurement for e10s-multi, we can't disable it (and I'm reasonably sure I was involved in defining what to measure at least).

Comment 20

10 months ago
I wrote that code, though it seems a lifetime ago. Had I but known, I would've asked more questions about how to do it better.

:erahm, I'm fine with measuring at different but consistent times. All studies I'm aware of that use MEMORY_TOTAL are doing so in an A/B fashion within the same timeframe... so as long as we aren't missing out on some crucial time in a Firefox instance's memory history, it should be fine.

Oh, and echoing your sentiment, yes we can't disable it without replacing it with something that measures exactly the same thing. The need for cross-process memory measurement is only increasing: see bug 1344172, and also bug 1344174. So having an efficient way to do this would be _excellent_.
Flags: needinfo?(chutten)
As far as priority goes, from a memory measurement perspective things are working fine so changing this is low priority.

Anyone with perf concerns would probably disagree though: we're running this on the first CC and then most likely every minute thereafter (I assume CC runs that often when a session is active, but could be wrong). Note: on OSX measuring USS can be rather slow.

Updated

9 months ago
Whiteboard: [measurement:client] → [qf:p3][measurement:client]
This came up again in bug 1375281, lets get this fixed as perf is now high priority. My proposal is to just use idleDispatch for the measurements so as not to block CC.
Assignee: nobody → erahm
Created attachment 8880971 [details] [diff] [review]
Use idleDispatch for gathering memory statistics

This defers the |gatherMemory| telemetry measurement until we're idle. This
measurement should no longer affect CC times.

To test this I set the following prefs locally:

> 'toolkit.telemetry.log.level': 0,
> 'toolkit.telemetry.log.dump': True,
> 'toolkit.telemetry.enabled': True

And verified the function was still being called and that memory measurement
messages were still being generated. I added additional (temporary) logging in
xpconnect to make sure the observer call finished before any measurements were
made.

MozReview-Commit-ID: DMtAo3K59FV
Attachment #8880971 - Flags: review?(chutten)
Blocks: 1375281

Comment 24

6 months ago
Comment on attachment 8880971 [details] [diff] [review]
Use idleDispatch for gathering memory statistics

Review of attachment 8880971 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM

File a follow-up to look in on if/how the memory stats change on Nightly after this lands?
Attachment #8880971 - Flags: review?(chutten) → review+

Comment 25

6 months ago
In bug 1373892 I have observed memory stat gathering for telemetry taking up approximately 6 seconds and additional GCs presumably for serializing those stats. Will this solve those hangs?
(In reply to The 8472 from comment #25)
> In bug 1373892 I have observed memory stat gathering for telemetry taking up
> approximately 6 seconds and additional GCs presumably for serializing those
> stats. Will this solve those hangs?

It might help, but 6s is pretty extreme, just punting to an idle time might still cause jank. I'm working on improving the performance of our ghost window reporter in bug 1376038 and :cyu is looking into making our jemalloc_stats reporter asynchronous in bug 1375281, but from your report it looks like the VsizeMaxContiguous can be quite slow as well.

Can you file a separate bug for just that issue and CC me?

re: extra GCs during serialization, that sounds worth filing a separate bug as well, its possible we could at least offload the serialization to an idle task rather than block the main thread.
Blocks: 1377495
https://hg.mozilla.org/integration/mozilla-inbound/rev/499b44c627e9d5e18584f0b58212264500b5bc51
Bug 1340134 - Use idleDispatch for gathering memory statistics. r=chutten
I still think there is a problem here that has not been addressed yet:
observing "cycle-collector-begin" from JS is slow.
These notification can be frequent and since you only need a callback
once per minute observing it is a waste of resources.  Pretty much all
callbacks during that minute you just do "new Date", compare it, throw
it away, and then do nothing.

As I suggested in bug 1375281, can we just use a setTimeout or something
instead, so that minimal CPU resources are wasted until one minute has
passed and you want to actually do something of value?
(In reply to Mats Palmgren (:mats) from comment #28)
> I still think there is a problem here that has not been addressed yet:
> observing "cycle-collector-begin" from JS is slow.
> These notification can be frequent and since you only need a callback
> once per minute observing it is a waste of resources.  Pretty much all
> callbacks during that minute you just do "new Date", compare it, throw
> it away, and then do nothing.
> 
> As I suggested in bug 1375281, can we just use a setTimeout or something
> instead, so that minimal CPU resources are wasted until one minute has
> passed and you want to actually do something of value?

Oh right, sorry Mats. I'll file a follow up.
Blocks: 1377585

Comment 30

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/499b44c627e9
Status: NEW → RESOLVED
Last Resolved: 6 months ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Blocks: 1377767
You need to log in before you can comment on or make changes to this bug.