Closed
Bug 1369594
Opened 8 years ago
Closed 8 years ago
ThreadHangStats getter for Telemetry can be very slow
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla56
| Tracking | Status | |
|---|---|---|
| firefox56 | --- | fixed |
People
(Reporter: guijoselito, Assigned: nika)
References
Details
Attachments
(4 files)
|
51.85 KB,
text/plain
|
Details | |
|
21.98 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
|
22.56 KB,
patch
|
Details | Diff | Splinter Review | |
|
206.94 KB,
image/png
|
Details |
When my computer is running World Community Grid in half of the cores, Nightly constantly hangs for a couple of seconds. I think that means that something is slow, but only noticeable on a very slow computer.
Browser Hangs from about:telemetry is attached.
Updated•8 years ago
|
Component: Untriaged → JavaScript Engine
Comment 1•8 years ago
|
||
Most of these hangs have stacks that look like:
...
mozilla::Telemetry::GetStackAndModules
`anonymous namespace'::CreateJSThreadHangStats
`anonymous namespace'::TelemetryImpl::GetThreadHangStats
Not sure what to make of this, but is it possible CreateJSThreadHangStats is making the problem much worse?
| Reporter | ||
Comment 2•8 years ago
|
||
Profile from a hang:
https://perfht.ml/2sQpUJp
I noticed the hangs become more frequent as the session gets longer and happen even without other programs running and using CPU.
Comment 3•8 years ago
|
||
(In reply to Guilherme Lima from comment #2)
> Profile from a hang:
> https://perfht.ml/2sQpUJp
>
> I noticed the hangs become more frequent as the session gets longer and
> happen even without other programs running and using CPU.
Thanks Guilherme. Out of curiosity, if you go to about:telemetry and turn off both FHR and Extended Telemetry recording, are you still able to reproduce the issue?
Flags: needinfo?(guijoselito)
| Reporter | ||
Comment 4•8 years ago
|
||
After 3 hours of browsing with World Community Grid running together, no hangs.
Flags: needinfo?(guijoselito)
Comment 6•8 years ago
|
||
Looking at the stacks attached in comment 0, a good number of these hangs seem to be due to us waiting for the OS to load a DLL when (ironically) attempting to put together the thread hang stacks entry in the Telemetry payload.
Thankfully these sorts of hangs are only possible with BHR enabled, so it's not on the release channel.
(Truncated) Example of a hang stack from commen t0:
Hang Report #18 (9 seconds)
NtQueryAttributesFile (in wntdll.pdb)
RtlDosSearchPath_Ustr (in wntdll.pdb)
BasepLoadLibraryAsDataFileInternal (in wkernelbase.pdb)
BasepLoadLibraryAsDataFile (in wkernelbase.pdb)
LoadLibraryExW (in wkernelbase.pdb)
SharedLibraryInfo::GetInfoForSelf() (in xul.pdb)
mozilla::Telemetry::GetStackAndModules(std::vector<unsigned int,std::allocator<unsigned int> > const &) (in xul.pdb)
`anonymous namespace'::CreateJSThreadHangStats (in xul.pdb)
`anonymous namespace'::TelemetryImpl::GetThreadHangStats (in xul.pdb)
Anything we can do about this, mystor? Is this one known about?
Flags: needinfo?(mconley) → needinfo?(michael)
| Assignee | ||
Comment 7•8 years ago
|
||
This sounds like bug 1363453 and https://github.com/bwinton/TabCenter/issues/1058 are similar issues.
I'll try to look into doing the work to get modules for ThreadHangStats off main thread somehow. Not sure what exactly that will look like as it's currently a very synchronous API.
And yes, these hangs will not occur outside of prerelease channels, which is why I haven't dropped everything to work on it yet. It sounds like it's much worse (at least recently) than I was thinking, so I should see if I can fix it soon.
| Assignee | ||
Comment 8•8 years ago
|
||
This patch switches up the way we submit hang information to telemetry. It changes the logic such that we submit our hang reports first to the StreamTransportService to have their module information processed, and then to telemetry. This means that we don't do the expensive module information work on the main thread when telemetry information is requested.
This means that we delay submitting hangs to telemetry slightly, but the process was already async, so it shouldn't be an issue.
MozReview-Commit-ID: L0ANHNKKAkY
Attachment #8877181 -
Flags: review?(nfroyd)
| Assignee | ||
Updated•8 years ago
|
Assignee: nobody → michael
Component: JavaScript Engine → XPCOM
Flags: needinfo?(michael)
| Assignee | ||
Updated•8 years ago
|
Summary: Nightly constantly hangs → ThreadHangStats getter for Telemetry can be very slow
Comment 9•8 years ago
|
||
Comment on attachment 8877181 [details] [diff] [review]
Get module information for BHR off main thread
Review of attachment 8877181 [details] [diff] [review]:
-----------------------------------------------------------------
This is a surprisingly high ratio of added lines to deleted lines, but maybe that's just the way things go.
::: xpcom/threads/BackgroundHangMonitor.cpp
@@ +468,5 @@
> +// for Telemetry.
> +//
> +// If this object is canceled, it will submit its payload to the
> +// BackgroundHangThread without performing the processing.
> +class ProcessHangRunnable : public CancelableRunnable
Nit: might as well make this final.
@@ +479,5 @@
> + : mManager(aManager)
> + , mNativeStack(mozilla::Move(aNativeStack))
> + , mThread(aThread)
> + , mHistogram(mozilla::Move(aHistogram))
> + {
Worth MOZ_ASSERT(mThread) here?
@@ +605,5 @@
> + // Hold a strong reference to the runnable so it doesn't go away.
> + nsCOMPtr<nsICancelableRunnable> cancelable =
> + mProcessHangRunnables[mProcessHangRunnables.Count() - 1];
> + // NOTE: This removes the item from the array.
> + cancelable->Cancel();
Maybe we should make the array a linked list, instead, so runnables can remove themselves in constant time? It's not only this case that suffers from O(n) removal, but all runnables need to go through that process, which seems suboptimal.
Attachment #8877181 -
Flags: review?(nfroyd) → review+
| Assignee | ||
Comment 10•8 years ago
|
||
MozReview-Commit-ID: L0ANHNKKAkY
Comment 11•8 years ago
|
||
Pushed by michael@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c8803d97c7d
Get module information for BHR off main thread, r=froydnj
Comment 12•8 years ago
|
||
| bugherder | ||
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment 14•8 years ago
|
||
Since bug 1373892 got duped to this one: As of build 20170622030208 I am still seeing hangs on the main thread. So this issue is not fixed.
See attached screenshot for the main culprits.
| Reporter | ||
Comment 15•8 years ago
|
||
FYI, at least my problem really got fixed. I don't have any hangs since this patch landed.
| Assignee | ||
Comment 16•8 years ago
|
||
(In reply to The 8472 from comment #14)
> Created attachment 8881057 [details]
> telemetry.png
>
> Since bug 1373892 got duped to this one: As of build 20170622030208 I am
> still seeing hangs on the main thread. So this issue is not fixed.
>
> See attached screenshot for the main culprits.
It would be much more useful if you would be willing to share the complete profile (even in private). I cannot get as much useful information from the screenshot as would be useful for diagnosing the issue. For example, I can see with this profile that things are slow, but I don't really have a good idea of why or how.
Your new issue is very different from the old one which you originally filed and from this bug. I would recommend opening a new bug for the hang you're experiencing right now, as the problem of threadHangStats' getter (namely GetInfoForSelf) taking a long time is gone now (it cannot be run on the main thread in the telemetry getter anymore).
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Comment 17•8 years ago
|
||
I'll just reopen my bug then since only 2 of the 3 hot methods I reported do not occur anymore.
You need to log in
before you can comment on or make changes to this bug.
Description
•