Open
Bug 810406
Opened 12 years ago
Updated 2 years ago
Browser hang while loading Telemetry evolution
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
NEW
People
(Reporter: lmandel, Unassigned)
Details
(Whiteboard: [Snappy])
Nightly hung for me while loading Telemetry evolution. https://metrics.mozilla.com/data/content/pentaho-cdf-dd/Render?solution=metrics2&path=%2Ftelemetry&file=telemetryEvolution.wcdf I captured a profile that seems to point to JS. http://people.mozilla.com/~bgirard/cleopatra/#report=f753f61b0a7edfff5600261dc14233506cec98fc
Comment 1•12 years ago
|
||
I see 85% of the time in mmap under MemoryReporter_Resident::GetAmount, so it doesn't seem like JS engine. Perhaps njn or jlebar know more about this code?
Comment 2•12 years ago
|
||
Whoa, in TelemetryPing.js: case "cycle-collector-begin": let now = new Date(); if (!gLastMemoryPoll || (TELEMETRY_INTERVAL <= now - gLastMemoryPoll)) { gLastMemoryPoll = now; this.gatherMemory(); } break; We get memory info on every CC, and apparently getting RSS in particular is very slow on Mac. That looks quite bad, in general. A CC is already very slow, and we're just making it slower here. Note also that bug 678085 completely invalidates all comparisons of memory telemetry before/after that change. See also bug 792888, which I believe I was seeing on my Linux box, but I'm not sure. Probably not the same thing. But the real question is: Are we actually calling GetResident() no more than once every 60s? If so, this is /just one/ call to GetResident which is hanging for many seconds. Which is seriously bad!
Comment 3•12 years ago
|
||
GetResident() for Mac is here: http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsMemoryReporterManager.cpp#271 The profile says that all the samples are in the __mmap call(s) within GetResident(). Get TaskBasicInfo() calls task_info() but doesn't obviously call __mmap. jemalloc_purge_freed_pages is only blamed for 4 samples but I suspect the stacks are slightly off and it's actually the problem. Note especially this comment: // Purging these pages shouldn't take more than 10ms or so, but we want to // keep an eye on it since GetResident() is called on each Telemetry ping. Prescience! jlebar, it seems the gods really don't want us to get accurate RSS data on Mac :(
Comment 4•12 years ago
|
||
The real question is why we are even bothering to listen for cycle collection at all; we call gatherMemory() every time we send a ping: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryPing.js#556 so the gatherMemory() at (some) cycle collections doesn't look that useful.
Comment 5•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #4) > The real question is why we are even bothering to listen for cycle > collection at all; we call gatherMemory() every time we send a ping: > > http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/ > TelemetryPing.js#556 > > so the gatherMemory() at (some) cycle collections doesn't look that useful. We only send a ping a few times a day; we want to collect memory stats more often. > jemalloc_purge_freed_pages is only blamed for 4 samples but I suspect the stacks are > slightly off and it's actually the problem. That's an easy hypothesis to test. Lawrence, is this reproducible for you if you restart your browser? Can we send you a tryserver build? Otherwise, let's add telemetry for how long the purge_freed_pages call takes?
Reporter | ||
Comment 6•12 years ago
|
||
To be clear, as it seems that the conversation here is about the Telemetry client (in product), the slowness that I experienced was when accessing the Telemetry evolution dashboard. If this slowness is coincidentally due to the Telemetry client that's fine. I just wanted to be clear. I just tried to reproduce the issue (having restarted my browser) and am not able to do so. I'll try again later after my browser has been running for a little while.
Comment 7•12 years ago
|
||
> If this slowness is coincidentally due to the Telemetry client that's fine.
That's what it looks like from the profile; it looks like the slowness is due to GC, which is apparently slow because it collects telemetry info, which does something slow -- we're not quite sure what yet.
Comment 8•12 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #2) > Whoa, in TelemetryPing.js: > > case "cycle-collector-begin": > let now = new Date(); > if (!gLastMemoryPoll > || (TELEMETRY_INTERVAL <= now - gLastMemoryPoll)) { > gLastMemoryPoll = now; > this.gatherMemory(); > } > break; > > We get memory info on every CC, and apparently getting RSS in particular is > very slow on Mac. That looks quite bad, in general. A CC is already very > slow, and we're just making it slower here. To be clear, we get memory stuff every cc that's TELEMETRY_INTERVAL apart. > > Note also that bug 678085 completely invalidates all comparisons of memory > telemetry before/after that change. Yes, but change was done early on in telemetry history. > But the real question is: Are we actually calling GetResident() no more than > once every 60s? If so, this is /just one/ call to GetResident which is > hanging for many seconds. Which is seriously bad! That does sound bad. Someone should fix that. I decided to do memory polling based on cc notifications to avoid setting up a new timer. I checked with njn when I did that, he said it seemed fine.
Reporter | ||
Comment 9•12 years ago
|
||
If this really is a Telemetry problem, should this bug be moved to Toolkit::Telemetry? If the probe should be changed/fixed by someone on the JS team, to whom can this bug be assigned?
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•