https://perfht.ml/2lbd7hU is a profile from bug 1316864 and I was surprised to see slow telemetry stuff during CC slice. Could we change/remove the relevant probes?
Is this about the GCTelemetry? billm added that in bug 1308040.
For context: https://dxr.mozilla.org/mozilla-central/rev/0eef1d5a39366059677c6d7944cfe8a97265a011/toolkit/components/telemetry/TelemetrySession.jsm#1881 https://dxr.mozilla.org/mozilla-central/rev/0eef1d5a39366059677c6d7944cfe8a97265a011/toolkit/components/telemetry/TelemetrySession.jsm#1095 Is anyone consuming & monitoring all those probes? If yes, is there value in recording those at regular intervals (instead of e.g. once)?
MEMORY_TOTAL was added so we could monitor memory use under real-world Firefox use across processes. It was (still is?) an e10s release criteria. As recently as last week I've recommended its use as a release criteria for e10s-multi. I have little knowledge of the other memory probes. As for when to query these things... well, that's an interesting one, isn't it. The historical data is based on periodic and triggered reporting which almost certainly over-weights contributions from users who leave their browsers open. This alone should be reason to change our recording methodology... if only we had a good replacement.
Could we collect MEMORY_TOTAL in non-main-thread without blocking the main thread?
I suspect so, but I suspect one of the reasons for the current topic subscriptions may be to capture something close to the maximum memory level... in other words, it might be a feature, not a bug.
(In reply to Chris H-C :chutten from comment #6) > I suspect so, but I suspect one of the reasons for the current topic > subscriptions may be to capture something close to the maximum memory > level... in other words, it might be a feature, not a bug. chutten, you added this probe in bug 1198209, so you should know as well as anyone whether this is the case :) I wasn't aware of this probe until now. Measuring it off-main-thread seems reasonable to me. resident-fast typically doesn't change that fast and so slight variations in the timing of the measurement shouldn't matter much.
Well, to be fair I only added MEMORY_TOTAL, I didn't add the collector topic subscription :) So, uh, do we have a recipe (addTask?) for off-main-thread privileged JS? Or will this be a C++ rewrite?
It would need to be rewritten in C++.
Does anyone have good criteria for what we need from these measurements? Do we need or track anything besides MEMORY_TOTAL? How often do we need to collect this, every N minutes? If we rewrite MEMORY_TOTAL in C++, can we get the same results by accumulating through histograms? (That is the only memory thing we special IPC messaging for with MESSAGE_TELEMETRY_USS & MESSAGE_TELEMETRY_GET_CHILD_USS ) For the record, this is what we collect in  right now: (Histogram name, nsIMemoryReporterManager property) "MEMORY_VSIZE", "vsize" "MEMORY_VSIZE_MAX_CONTIGUOUS", "vsizeMaxContiguous" "MEMORY_RESIDENT_FAST", "residentFast" "MEMORY_UNIQUE", "residentUnique" "MEMORY_HEAP_ALLOCATED", "heapAllocated" "MEMORY_HEAP_OVERHEAD_FRACTION", "heapOverheadFraction" "MEMORY_JS_GC_HEAP", "JSMainRuntimeGCHeap" "MEMORY_JS_COMPARTMENTS_SYSTEM", "JSMainRuntimeCompartmentsSystem" "MEMORY_JS_COMPARTMENTS_USER", "JSMainRuntimeCompartmentsUser" "MEMORY_IMAGES_CONTENT_USED_UNCOMPRESSED", "imagesContentUsedUncompressed" "MEMORY_STORAGE_SQLITE", "storageSQLite" "LOW_MEMORY_EVENTS_VIRTUAL", "lowMemoryEventsVirtual" "LOW_MEMORY_EVENTS_PHYSICAL", "lowMemoryEventsPhysical" "GHOST_WINDOWS", "ghostWindows" "PAGE_FAULTS_HARD", "pageFaultsHard" 1: https://dxr.mozilla.org/mozilla-central/rev/32dcdde1fc64fc39a9065dc4218265dbc727673f/toolkit/components/telemetry/TelemetrySession.jsm#1674,1692 2: https://dxr.mozilla.org/mozilla-central/rev/32dcdde1fc64fc39a9065dc4218265dbc727673f/toolkit/components/telemetry/TelemetrySession.jsm#1190
Benjamin, do you know who else might monitor or need any of these probes in comment 10?
ni?chutten for MEMORY_TOTAL input.
The MEMORY_* probes have alert emails mentioning memshrink. Is this a current effort? From what I remember when MEMORY_TOTAL was proposed, the MEMORY_* probes were put in place to get an idea of "typical" memory use in the wild. That's why they're histograms that are accumulated to at various points in time (events, timers) instead of just single numbers (what would now be scalars). As for rewriting MEMORY_TOTAL in C++, I'm not sure if one process is allowed to query the memory characteristics of child processes, or if we'd need to rewrite the messaging stuff as well. It is necessary to get the RSS and USS measures at or near the same time as each other so they can be summed properly. Wasn't there to be an about:performance rewrite that would require knowing (amongst other things) memory allocations amongst the processes? Maybe there's some work we can reuse there.
(In reply to Chris H-C :chutten from comment #13) > The MEMORY_* probes have alert emails mentioning memshrink. Is this a > current effort? Yes, Eric Rahm and I are still working on it. I'm not sure I recall any regressions in this measurement, though.
(In reply to Chris H-C :chutten from comment #13) > As for rewriting MEMORY_TOTAL in C++, I'm not sure if one process is allowed > to query the memory characteristics of child processes, or if we'd need to > rewrite the messaging stuff as well. Linux and Windows can probably get the child's values from the parent (unless we've locked this down via sandboxing), OSX you'd need to do it in-process (for USS at least). > Wasn't there to be an about:performance rewrite that would require knowing > (amongst other things) memory allocations amongst the processes? Maybe > there's some work we can reuse there. aboutPerformance.js uses Memory.jsm , which is implemented in JS.  http://searchfox.org/mozilla-central/rev/60ae6514e4c559c0c234f0e7aefccb101b8beb2e/toolkit/modules/Memory.jsm
njn, I believe your team "owns" all this memory reporting. Can you own prioritizing this?
I think it's fine to move this off-main-thread; I agree with Olli, we shouldn't synchronously measure during the CC slice. Triggering the measurement at the current point is probably fine though (or if there's a |CYCLE_COLLECTION_END| topic we could key off that). For the purpose of comparing e10s(-multi) with non-e10s, it doesn't really matter when we measure as long as the timing is reasonably consistent (so measure after CC should be fine). As far as measuring peak memory I can see how this would be problematic, but I think we already have a measurement for that. Chris does that sound reasonable?
But 1198209 is clearly the cause of this. It added the extra cross-process measurements. The measurements originally added by bug 678085 were there for a long time without causing problems. Nobody from my team was involved with bug 1198209, either writing the code or reviewing it. I suggest disabling it.
(In reply to Nicholas Nethercote [:njn] from comment #18) > But 1198209 is clearly the cause of this. It added the extra cross-process > measurements. The measurements originally added by bug 678085 were there for > a long time without causing problems. > > Nobody from my team was involved with bug 1198209, either writing the code > or reviewing it. I suggest disabling it. This is a gating measurement for e10s-multi, we can't disable it (and I'm reasonably sure I was involved in defining what to measure at least).
I wrote that code, though it seems a lifetime ago. Had I but known, I would've asked more questions about how to do it better. :erahm, I'm fine with measuring at different but consistent times. All studies I'm aware of that use MEMORY_TOTAL are doing so in an A/B fashion within the same timeframe... so as long as we aren't missing out on some crucial time in a Firefox instance's memory history, it should be fine. Oh, and echoing your sentiment, yes we can't disable it without replacing it with something that measures exactly the same thing. The need for cross-process memory measurement is only increasing: see bug 1344172, and also bug 1344174. So having an efficient way to do this would be _excellent_.
As far as priority goes, from a memory measurement perspective things are working fine so changing this is low priority. Anyone with perf concerns would probably disagree though: we're running this on the first CC and then most likely every minute thereafter (I assume CC runs that often when a session is active, but could be wrong). Note: on OSX measuring USS can be rather slow.
This came up again in bug 1375281, lets get this fixed as perf is now high priority. My proposal is to just use idleDispatch for the measurements so as not to block CC.
Created attachment 8880971 [details] [diff] [review] Use idleDispatch for gathering memory statistics This defers the |gatherMemory| telemetry measurement until we're idle. This measurement should no longer affect CC times. To test this I set the following prefs locally: > 'toolkit.telemetry.log.level': 0, > 'toolkit.telemetry.log.dump': True, > 'toolkit.telemetry.enabled': True And verified the function was still being called and that memory measurement messages were still being generated. I added additional (temporary) logging in xpconnect to make sure the observer call finished before any measurements were made. MozReview-Commit-ID: DMtAo3K59FV
Comment on attachment 8880971 [details] [diff] [review] Use idleDispatch for gathering memory statistics Review of attachment 8880971 [details] [diff] [review]: ----------------------------------------------------------------- LGTM File a follow-up to look in on if/how the memory stats change on Nightly after this lands?
In bug 1373892 I have observed memory stat gathering for telemetry taking up approximately 6 seconds and additional GCs presumably for serializing those stats. Will this solve those hangs?
(In reply to The 8472 from comment #25) > In bug 1373892 I have observed memory stat gathering for telemetry taking up > approximately 6 seconds and additional GCs presumably for serializing those > stats. Will this solve those hangs? It might help, but 6s is pretty extreme, just punting to an idle time might still cause jank. I'm working on improving the performance of our ghost window reporter in bug 1376038 and :cyu is looking into making our jemalloc_stats reporter asynchronous in bug 1375281, but from your report it looks like the VsizeMaxContiguous can be quite slow as well. Can you file a separate bug for just that issue and CC me? re: extra GCs during serialization, that sounds worth filing a separate bug as well, its possible we could at least offload the serialization to an idle task rather than block the main thread.
https://hg.mozilla.org/integration/mozilla-inbound/rev/499b44c627e9d5e18584f0b58212264500b5bc51 Bug 1340134 - Use idleDispatch for gathering memory statistics. r=chutten
I still think there is a problem here that has not been addressed yet: observing "cycle-collector-begin" from JS is slow. These notification can be frequent and since you only need a callback once per minute observing it is a waste of resources. Pretty much all callbacks during that minute you just do "new Date", compare it, throw it away, and then do nothing. As I suggested in bug 1375281, can we just use a setTimeout or something instead, so that minimal CPU resources are wasted until one minute has passed and you want to actually do something of value?
(In reply to Mats Palmgren (:mats) from comment #28) > I still think there is a problem here that has not been addressed yet: > observing "cycle-collector-begin" from JS is slow. > These notification can be frequent and since you only need a callback > once per minute observing it is a waste of resources. Pretty much all > callbacks during that minute you just do "new Date", compare it, throw > it away, and then do nothing. > > As I suggested in bug 1375281, can we just use a setTimeout or something > instead, so that minimal CPU resources are wasted until one minute has > passed and you want to actually do something of value? Oh right, sorry Mats. I'll file a follow up.