1340134 - Telemetry probes using TOPIC_CYCLE_COLLECTOR_BEGIN may take more time than the CC slice itself

Reporter

Description

•

7 years ago

https://perfht.ml/2lbd7hU is a profile from bug 1316864 and I was surprised to see slow
telemetry stuff during CC slice.
Could we change/remove the relevant probes?

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Updated

•

7 years ago

Flags: needinfo?(n.nethercote)

Georg Fritzsche [:gfritzsche]

Comment 1

•

7 years ago

Is this about the GCTelemetry?
billm added that in bug 1308040.

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 2

•

7 years ago

As far as I see, this is not about bug 1308040, but bug 678085 which this bug blocks.

Georg Fritzsche [:gfritzsche]

Comment 3

•

7 years ago

For context:
https://dxr.mozilla.org/mozilla-central/rev/0eef1d5a39366059677c6d7944cfe8a97265a011/toolkit/components/telemetry/TelemetrySession.jsm#1881
https://dxr.mozilla.org/mozilla-central/rev/0eef1d5a39366059677c6d7944cfe8a97265a011/toolkit/components/telemetry/TelemetrySession.jsm#1095

Is anyone consuming & monitoring all those probes?
If yes, is there value in recording those at regular intervals (instead of e.g. once)?

Chris H-C :chutten

Comment 4

•

7 years ago

MEMORY_TOTAL was added so we could monitor memory use under real-world Firefox use across processes. It was (still is?) an e10s release criteria. As recently as last week I've recommended its use as a release criteria for e10s-multi.

I have little knowledge of the other memory probes.

As for when to query these things... well, that's an interesting one, isn't it. The historical data is based on periodic and triggered reporting which almost certainly over-weights contributions from users who leave their browsers open. This alone should be reason to change our recording methodology... if only we had a good replacement.

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 5

•

7 years ago

Could we collect MEMORY_TOTAL in non-main-thread without blocking the main thread?

Chris H-C :chutten

Comment 6

•

7 years ago

I suspect so, but I suspect one of the reasons for the current topic subscriptions may be to capture something close to the maximum memory level... in other words, it might be a feature, not a bug.

Nicholas Nethercote [inactive]

Comment 7

•

7 years ago

(In reply to Chris H-C :chutten from comment #6)
> I suspect so, but I suspect one of the reasons for the current topic
> subscriptions may be to capture something close to the maximum memory
> level... in other words, it might be a feature, not a bug.

chutten, you added this probe in bug 1198209, so you should know as well as anyone whether this is the case :)

I wasn't aware of this probe until now. Measuring it off-main-thread seems reasonable to me. resident-fast typically doesn't change that fast and so slight variations in the timing of the measurement shouldn't matter much.

Flags: needinfo?(n.nethercote)

Chris H-C :chutten

Comment 8

•

7 years ago

Well, to be fair I only added MEMORY_TOTAL, I didn't add the collector topic subscription :)

So, uh, do we have a recipe (addTask?) for off-main-thread privileged JS? Or will this be a C++ rewrite?

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 9

•

7 years ago

It would need to be rewritten in C++.

Georg Fritzsche [:gfritzsche]

Updated

•

7 years ago

Updated

•

7 years ago

Priority: -- → P3

Whiteboard: [measurement:client]

Georg Fritzsche [:gfritzsche]

Comment 10

•

7 years ago

Does anyone have good criteria for what we need from these measurements?

Do we need or track anything besides MEMORY_TOTAL?
How often do we need to collect this, every N minutes?

If we rewrite MEMORY_TOTAL in C++, can we get the same results by accumulating through histograms?
(That is the only memory thing we special IPC messaging for with MESSAGE_TELEMETRY_USS & MESSAGE_TELEMETRY_GET_CHILD_USS [1])

For the record, this is what we collect in [2] right now:
(Histogram name, nsIMemoryReporterManager property)
"MEMORY_VSIZE", "vsize"
"MEMORY_VSIZE_MAX_CONTIGUOUS", "vsizeMaxContiguous"
"MEMORY_RESIDENT_FAST", "residentFast"
"MEMORY_UNIQUE", "residentUnique"
"MEMORY_HEAP_ALLOCATED", "heapAllocated"
"MEMORY_HEAP_OVERHEAD_FRACTION", "heapOverheadFraction"
"MEMORY_JS_GC_HEAP", "JSMainRuntimeGCHeap"
"MEMORY_JS_COMPARTMENTS_SYSTEM", "JSMainRuntimeCompartmentsSystem"
"MEMORY_JS_COMPARTMENTS_USER", "JSMainRuntimeCompartmentsUser"
"MEMORY_IMAGES_CONTENT_USED_UNCOMPRESSED", "imagesContentUsedUncompressed"
"MEMORY_STORAGE_SQLITE", "storageSQLite"
"LOW_MEMORY_EVENTS_VIRTUAL", "lowMemoryEventsVirtual"
"LOW_MEMORY_EVENTS_PHYSICAL", "lowMemoryEventsPhysical"
"GHOST_WINDOWS", "ghostWindows"
"PAGE_FAULTS_HARD", "pageFaultsHard"

1: https://dxr.mozilla.org/mozilla-central/rev/32dcdde1fc64fc39a9065dc4218265dbc727673f/toolkit/components/telemetry/TelemetrySession.jsm#1674,1692
2: https://dxr.mozilla.org/mozilla-central/rev/32dcdde1fc64fc39a9065dc4218265dbc727673f/toolkit/components/telemetry/TelemetrySession.jsm#1190

Georg Fritzsche [:gfritzsche]

Comment 11

•

7 years ago

Benjamin, do you know who else might monitor or need any of these probes in comment 10?

Flags: needinfo?(benjamin)

Georg Fritzsche [:gfritzsche]

Comment 12

•

7 years ago

ni?chutten for MEMORY_TOTAL input.

Flags: needinfo?(chutten)

Chris H-C :chutten

Comment 13

•

7 years ago

The MEMORY_* probes have alert emails mentioning memshrink. Is this a current effort?

From what I remember when MEMORY_TOTAL was proposed, the MEMORY_* probes were put in place to get an idea of "typical" memory use in the wild. That's why they're histograms that are accumulated to at various points in time (events, timers) instead of just single numbers (what would now be scalars).

As for rewriting MEMORY_TOTAL in C++, I'm not sure if one process is allowed to query the memory characteristics of child processes, or if we'd need to rewrite the messaging stuff as well. It is necessary to get the RSS and USS measures at or near the same time as each other so they can be summed properly.

Wasn't there to be an about:performance rewrite that would require knowing (amongst other things) memory allocations amongst the processes? Maybe there's some work we can reuse there.

Flags: needinfo?(chutten)

Andrew McCreight [:mccr8]

Comment 14

•

7 years ago

(In reply to Chris H-C :chutten from comment #13)
> The MEMORY_* probes have alert emails mentioning memshrink. Is this a
> current effort?

Yes, Eric Rahm and I are still working on it. I'm not sure I recall any regressions in this measurement, though.

Eric Rahm [:erahm]

Assignee

Comment 15

•

7 years ago

(In reply to Chris H-C :chutten from comment #13)
> As for rewriting MEMORY_TOTAL in C++, I'm not sure if one process is allowed
> to query the memory characteristics of child processes, or if we'd need to
> rewrite the messaging stuff as well.

Linux and Windows can probably get the child's values from the parent (unless we've locked this down via sandboxing), OSX you'd need to do it in-process (for USS at least).

> Wasn't there to be an about:performance rewrite that would require knowing
> (amongst other things) memory allocations amongst the processes? Maybe
> there's some work we can reuse there.

aboutPerformance.js uses Memory.jsm [1], which is implemented in JS.

[1] http://searchfox.org/mozilla-central/rev/60ae6514e4c559c0c234f0e7aefccb101b8beb2e/toolkit/modules/Memory.jsm

Benjamin Smedberg

Comment 16

•

7 years ago

njn, I believe your team "owns" all this memory reporting. Can you own prioritizing this?

Component: Telemetry → XPCOM

Flags: needinfo?(benjamin) → needinfo?(n.nethercote)

Product: Toolkit → Core

Eric Rahm [:erahm]

Assignee

Comment 17

•

7 years ago

I think it's fine to move this off-main-thread; I agree with Olli, we shouldn't synchronously measure during the CC slice. Triggering the measurement at the current point is probably fine though (or if there's a |CYCLE_COLLECTION_END| topic we could key off that).

For the purpose of comparing e10s(-multi) with non-e10s, it doesn't really matter when we measure as long as the timing is reasonably consistent (so measure after CC should be fine). As far as measuring peak memory I can see how this would be problematic, but I think we already have a measurement for that.

Chris does that sound reasonable?

Flags: needinfo?(n.nethercote) → needinfo?(chutten)

Nicholas Nethercote [inactive]

Comment 18

•

7 years ago

But 1198209 is clearly the cause of this. It added the extra cross-process measurements. The measurements originally added by bug 678085 were there for a long time without causing problems.

Nobody from my team was involved with bug 1198209, either writing the code or reviewing it. I suggest disabling it.

Blocks: 1198209
No longer blocks: 678085

Eric Rahm [:erahm]

Assignee

Comment 19

•

7 years ago

(In reply to Nicholas Nethercote [:njn] from comment #18)
> But 1198209 is clearly the cause of this. It added the extra cross-process
> measurements. The measurements originally added by bug 678085 were there for
> a long time without causing problems.
> 
> Nobody from my team was involved with bug 1198209, either writing the code
> or reviewing it. I suggest disabling it.

This is a gating measurement for e10s-multi, we can't disable it (and I'm reasonably sure I was involved in defining what to measure at least).

Chris H-C :chutten

Comment 20

•

7 years ago

I wrote that code, though it seems a lifetime ago. Had I but known, I would've asked more questions about how to do it better.

:erahm, I'm fine with measuring at different but consistent times. All studies I'm aware of that use MEMORY_TOTAL are doing so in an A/B fashion within the same timeframe... so as long as we aren't missing out on some crucial time in a Firefox instance's memory history, it should be fine.

Oh, and echoing your sentiment, yes we can't disable it without replacing it with something that measures exactly the same thing. The need for cross-process memory measurement is only increasing: see bug 1344172, and also bug 1344174. So having an efficient way to do this would be _excellent_.

Flags: needinfo?(chutten)

Eric Rahm [:erahm]

Assignee

Comment 21

•

7 years ago

As far as priority goes, from a memory measurement perspective things are working fine so changing this is low priority.

Anyone with perf concerns would probably disagree though: we're running this on the first CC and then most likely every minute thereafter (I assume CC runs that often when a session is active, but could be wrong). Note: on OSX measuring USS can be rather slow.

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [measurement:client] → [qf:p3][measurement:client]

Eric Rahm [:erahm]

Assignee

Comment 22

•

7 years ago

This came up again in bug 1375281, lets get this fixed as perf is now high priority. My proposal is to just use idleDispatch for the measurements so as not to block CC.

Assignee: nobody → erahm

Eric Rahm [:erahm]

Assignee

Comment 23

•

7 years ago

Attached patch Use idleDispatch for gathering memory statistics — Details — Splinter Review

This defers the |gatherMemory| telemetry measurement until we're idle. This
measurement should no longer affect CC times.

To test this I set the following prefs locally:

> 'toolkit.telemetry.log.level': 0,
> 'toolkit.telemetry.log.dump': True,
> 'toolkit.telemetry.enabled': True

And verified the function was still being called and that memory measurement
messages were still being generated. I added additional (temporary) logging in
xpconnect to make sure the observer call finished before any measurements were
made.

MozReview-Commit-ID: DMtAo3K59FV

Attachment #8880971 - Flags: review?(chutten)

Eric Rahm [:erahm]

Assignee

Updated

•

7 years ago

Blocks: 1375281

Chris H-C :chutten

Comment 24

•

7 years ago

Comment on attachment 8880971 [details] [diff] [review]
Use idleDispatch for gathering memory statistics

Review of attachment 8880971 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM

File a follow-up to look in on if/how the memory stats change on Nightly after this lands?

Attachment #8880971 - Flags: review?(chutten) → review+

The 8472

Comment 25

•

7 years ago

In bug 1373892 I have observed memory stat gathering for telemetry taking up approximately 6 seconds and additional GCs presumably for serializing those stats. Will this solve those hangs?

Eric Rahm [:erahm]

Assignee

Comment 26

•

7 years ago

(In reply to The 8472 from comment #25)
> In bug 1373892 I have observed memory stat gathering for telemetry taking up
> approximately 6 seconds and additional GCs presumably for serializing those
> stats. Will this solve those hangs?

It might help, but 6s is pretty extreme, just punting to an idle time might still cause jank. I'm working on improving the performance of our ghost window reporter in bug 1376038 and :cyu is looking into making our jemalloc_stats reporter asynchronous in bug 1375281, but from your report it looks like the VsizeMaxContiguous can be quite slow as well.

Can you file a separate bug for just that issue and CC me?

re: extra GCs during serialization, that sounds worth filing a separate bug as well, its possible we could at least offload the serialization to an idle task rather than block the main thread.

Eric Rahm [:erahm]

Assignee

Updated

•

7 years ago

Blocks: 1377495

Eric Rahm [:erahm]

Assignee

Comment 27

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/499b44c627e9d5e18584f0b58212264500b5bc51
Bug 1340134 - Use idleDispatch for gathering memory statistics. r=chutten

Mats Palmgren (inactive)

Comment 28

•

7 years ago

I still think there is a problem here that has not been addressed yet:
observing "cycle-collector-begin" from JS is slow.
These notification can be frequent and since you only need a callback
once per minute observing it is a waste of resources.  Pretty much all
callbacks during that minute you just do "new Date", compare it, throw
it away, and then do nothing.

As I suggested in bug 1375281, can we just use a setTimeout or something
instead, so that minimal CPU resources are wasted until one minute has
passed and you want to actually do something of value?

Eric Rahm [:erahm]

Assignee

Comment 29

•

7 years ago

(In reply to Mats Palmgren (:mats) from comment #28)
> I still think there is a problem here that has not been addressed yet:
> observing "cycle-collector-begin" from JS is slow.
> These notification can be frequent and since you only need a callback
> once per minute observing it is a waste of resources.  Pretty much all
> callbacks during that minute you just do "new Date", compare it, throw
> it away, and then do nothing.
> 
> As I suggested in bug 1375281, can we just use a setTimeout or something
> instead, so that minimal CPU resources are wasted until one minute has
> passed and you want to actually do something of value?

Oh right, sorry Mats. I'll file a follow up.

Eric Rahm [:erahm]

Assignee

Updated

•

7 years ago

Blocks: 1377585

Phil Ringnalda (:philor)

Comment 30

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/499b44c627e9

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox56: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla56

Andreas Tolfsen ❲:ato❳

Updated

•

7 years ago

Blocks: 1377767

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: --- → P3

Whiteboard: [qf:p3][measurement:client] → [measurement:client]