Telemetry ping blocks main process for 120 seconds

RESOLVED DUPLICATE of bug 1369594

Status

()

RESOLVED DUPLICATE of bug 1369594
a year ago
a year ago

People

(Reporter: bugzilla.mozilla.org, Unassigned)

Tracking

Trunk
x86_64
Windows 10
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

187.07 KB, image/png
Details
(Reporter)

Description

a year ago
Created attachment 8878743 [details]
ping-profile.png

I'm occasionally experiencing hangs lasting more than a minute. The profiler points at telemetry being responsible.

Attached: A screenshot of the profiler output.

And a stack trace of one of the functions taking up significant time: https://pastebin.mozilla.org/9025046
According to inverted call stacks 35% of the time is spent in NtQueryAttributesFile, 18% in NtReadVirtualMemory and 5% in NtQueryVirtualMemory.
I guess it's examining the virtual memory layout of all processes and there's some inefficiency in that.
This is jank resulting from loading module information from `Telemetry::GetStackAndModules()`, which we track in bug 1363453.

Note that this is limited to users who are on pre-release channels or turned on extended Telemetry, so release users are usually not affected.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1363453
(Reporter)

Comment 2

a year ago
Created attachment 8881106 [details]
telemetry2.png

(In reply to The 8472 from comment #0)
> According to inverted call stacks 35% of the time is spent in
> NtQueryAttributesFile, 18% in NtReadVirtualMemory and 5% in
> NtQueryVirtualMemory.

Bug 1369594 fixed the first two, but since they no longer occur NtQueryVirtualMemory is now the dominant leaf method and makes up 20% of the hang time all by itself.

Additionally garbage collection - apparently triggered among other things by the structured clone decoding - takes up another ~20%.
Attachment #8878743 - Attachment is obsolete: true
(Reporter)

Updated

a year ago
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
The main problem visible in the previous screenshot was fixed in bug 1369594.
It would be easier to open a new bug for remaining problems.

Uploading & linking a profile would be helpful for diagnosing.
Thanks.
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1369594
(Reporter)

Comment 4

a year ago
My bug report does not solely consist of a screenshot, it also contains text. 

> and 5% in NtQueryVirtualMemory.

This part was NOT fixed and made up a multi-second fraction of the hang, therefore the issue as a whole is not fixed either. Why do you insist on closing it?
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(Reporter)

Comment 5

a year ago
> Uploading & linking a profile would be helpful for diagnosing.

The profiler lacks an option to anonymize the profile, until it has that a screenshot will have to do due to privacy reasons. If you need to know additional stats I can provide them of course.
(In reply to The 8472 from comment #4)
> This part was NOT fixed and made up a multi-second fraction of the hang,
> therefore the issue as a whole is not fixed either. Why do you insist on
> closing it?

Because:
- as far as i can tell, the other issue was the main one
- breaking separate issues out into separate bugs makes it easier for us to talk about and organize them

Opening a separate bug doesn't mean we are not looking at this issue.

FWIW, bug 1340134 moves gatherMemory() to idle times; i assume that addresses the remaining issue.
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1369594
You need to log in before you can comment on or make changes to this bug.