810406 - Browser hang while loading Telemetry evolution

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Description

•

12 years ago

Nightly hung for me while loading Telemetry evolution.

https://metrics.mozilla.com/data/content/pentaho-cdf-dd/Render?solution=metrics2&path=%2Ftelemetry&file=telemetryEvolution.wcdf

I captured a profile that seems to point to JS.

http://people.mozilla.com/~bgirard/cleopatra/#report=f753f61b0a7edfff5600261dc14233506cec98fc

Luke Wagner [:luke]

Comment 1

•

12 years ago

I see 85% of the time in mmap under MemoryReporter_Resident::GetAmount, so it doesn't seem like JS engine.  Perhaps njn or jlebar know more about this code?

Justin Lebar (not reading bugmail)

Comment 2

•

12 years ago

Whoa, in TelemetryPing.js:

    case "cycle-collector-begin":
      let now = new Date();
      if (!gLastMemoryPoll
          || (TELEMETRY_INTERVAL <= now - gLastMemoryPoll)) {
        gLastMemoryPoll = now;
        this.gatherMemory();
      }
      break;

We get memory info on every CC, and apparently getting RSS in particular is very slow on Mac.  That looks quite bad, in general.  A CC is already very slow, and we're just making it slower here.

Note also that bug 678085 completely invalidates all comparisons of memory telemetry before/after that change.

See also bug 792888, which I believe I was seeing on my Linux box, but I'm not sure.  Probably not the same thing.

But the real question is: Are we actually calling GetResident() no more than once every 60s?  If so, this is /just one/ call to GetResident which is hanging for many seconds.  Which is seriously bad!

Nicholas Nethercote [inactive]

Comment 3

•

12 years ago

GetResident() for Mac is here:

http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsMemoryReporterManager.cpp#271

The profile says that all the samples are in the __mmap call(s) within GetResident().  Get TaskBasicInfo() calls task_info() but doesn't obviously call __mmap.  

jemalloc_purge_freed_pages is only blamed for 4 samples but I suspect the stacks are slightly off and it's actually the problem.  Note especially this comment:

// Purging these pages shouldn't take more than 10ms or so, but we want to
// keep an eye on it since GetResident() is called on each Telemetry ping.

Prescience!

jlebar, it seems the gods really don't want us to get accurate RSS data on Mac :(

Nathan Froyd [:froydnj]

Comment 4

•

12 years ago

The real question is why we are even bothering to listen for cycle collection at all; we call gatherMemory() every time we send a ping:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryPing.js#556

so the gatherMemory() at (some) cycle collections doesn't look that useful.

Justin Lebar (not reading bugmail)

Comment 5

•

12 years ago

(In reply to Nathan Froyd (:froydnj) from comment #4)
> The real question is why we are even bothering to listen for cycle
> collection at all; we call gatherMemory() every time we send a ping:
> 
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/
> TelemetryPing.js#556
> 
> so the gatherMemory() at (some) cycle collections doesn't look that useful.

We only send a ping a few times a day; we want to collect memory stats more often.

> jemalloc_purge_freed_pages is only blamed for 4 samples but I suspect the stacks are 
> slightly off and it's actually the problem.

That's an easy hypothesis to test.  Lawrence, is this reproducible for you if you restart your browser?  Can we send you a tryserver build?

Otherwise, let's add telemetry for how long the purge_freed_pages call takes?

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Comment 6

•

12 years ago

To be clear, as it seems that the conversation here is about the Telemetry client (in product), the slowness that I experienced was when accessing the Telemetry evolution dashboard. If this slowness is coincidentally due to the Telemetry client that's fine. I just wanted to be clear.

I just tried to reproduce the issue (having restarted my browser) and am not able to do so. I'll try again later after my browser has been running for a little while.

Justin Lebar (not reading bugmail)

Comment 7

•

12 years ago

> If this slowness is coincidentally due to the Telemetry client that's fine.

That's what it looks like from the profile; it looks like the slowness is due to GC, which is apparently slow because it collects telemetry info, which does something slow -- we're not quite sure what yet.

(dormant account)

Comment 8

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #2)
> Whoa, in TelemetryPing.js:
> 
>     case "cycle-collector-begin":
>       let now = new Date();
>       if (!gLastMemoryPoll
>           || (TELEMETRY_INTERVAL <= now - gLastMemoryPoll)) {
>         gLastMemoryPoll = now;
>         this.gatherMemory();
>       }
>       break;
> 
> We get memory info on every CC, and apparently getting RSS in particular is
> very slow on Mac.  That looks quite bad, in general.  A CC is already very
> slow, and we're just making it slower here.

To be clear, we get memory stuff every cc that's TELEMETRY_INTERVAL apart.

> 
> Note also that bug 678085 completely invalidates all comparisons of memory
> telemetry before/after that change.

Yes, but change was done early on in telemetry history.

> But the real question is: Are we actually calling GetResident() no more than
> once every 60s?  If so, this is /just one/ call to GetResident which is
> hanging for many seconds.  Which is seriously bad!

That does sound bad. Someone should fix that. I decided to do memory polling based on cc notifications to avoid setting up a new timer. I checked with njn when I did that, he said it seemed fine.

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Comment 9

•

12 years ago

If this really is a Telemetry problem, should this bug be moved to Toolkit::Telemetry? If the probe should be changed/fixed by someone on the JS team, to whom can this bug be assigned?

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Bugzilla

Quick Search

Browser hang while loading Telemetry evolution

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: lmandel, Unassigned)

References

Details

(Whiteboard: [Snappy])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated