Closed Bug 1369594 Opened 8 years ago Closed 8 years ago

ThreadHangStats getter for Telemetry can be very slow

Categories

(Core :: XPCOM, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: guijoselito, Assigned: nika)

References

Details

Attachments

(4 files)

Attached file Hangs.txt
When my computer is running World Community Grid in half of the cores, Nightly constantly hangs for a couple of seconds. I think that means that something is slow, but only noticeable on a very slow computer. Browser Hangs from about:telemetry is attached.
Component: Untriaged → JavaScript Engine
Most of these hangs have stacks that look like: ... mozilla::Telemetry::GetStackAndModules `anonymous namespace'::CreateJSThreadHangStats `anonymous namespace'::TelemetryImpl::GetThreadHangStats Not sure what to make of this, but is it possible CreateJSThreadHangStats is making the problem much worse?
Profile from a hang: https://perfht.ml/2sQpUJp I noticed the hangs become more frequent as the session gets longer and happen even without other programs running and using CPU.
See Also: → 1344920
(In reply to Guilherme Lima from comment #2) > Profile from a hang: > https://perfht.ml/2sQpUJp > > I noticed the hangs become more frequent as the session gets longer and > happen even without other programs running and using CPU. Thanks Guilherme. Out of curiosity, if you go to about:telemetry and turn off both FHR and Extended Telemetry recording, are you still able to reproduce the issue?
Flags: needinfo?(guijoselito)
After 3 hours of browsing with World Community Grid running together, no hangs.
Flags: needinfo?(guijoselito)
NI so we don't forget about this.
Flags: needinfo?(mconley)
Looking at the stacks attached in comment 0, a good number of these hangs seem to be due to us waiting for the OS to load a DLL when (ironically) attempting to put together the thread hang stacks entry in the Telemetry payload. Thankfully these sorts of hangs are only possible with BHR enabled, so it's not on the release channel. (Truncated) Example of a hang stack from commen t0: Hang Report #18 (9 seconds) NtQueryAttributesFile (in wntdll.pdb) RtlDosSearchPath_Ustr (in wntdll.pdb) BasepLoadLibraryAsDataFileInternal (in wkernelbase.pdb) BasepLoadLibraryAsDataFile (in wkernelbase.pdb) LoadLibraryExW (in wkernelbase.pdb) SharedLibraryInfo::GetInfoForSelf() (in xul.pdb) mozilla::Telemetry::GetStackAndModules(std::vector<unsigned int,std::allocator<unsigned int> > const &) (in xul.pdb) `anonymous namespace'::CreateJSThreadHangStats (in xul.pdb) `anonymous namespace'::TelemetryImpl::GetThreadHangStats (in xul.pdb) Anything we can do about this, mystor? Is this one known about?
Flags: needinfo?(mconley) → needinfo?(michael)
This sounds like bug 1363453 and https://github.com/bwinton/TabCenter/issues/1058 are similar issues. I'll try to look into doing the work to get modules for ThreadHangStats off main thread somehow. Not sure what exactly that will look like as it's currently a very synchronous API. And yes, these hangs will not occur outside of prerelease channels, which is why I haven't dropped everything to work on it yet. It sounds like it's much worse (at least recently) than I was thinking, so I should see if I can fix it soon.
This patch switches up the way we submit hang information to telemetry. It changes the logic such that we submit our hang reports first to the StreamTransportService to have their module information processed, and then to telemetry. This means that we don't do the expensive module information work on the main thread when telemetry information is requested. This means that we delay submitting hangs to telemetry slightly, but the process was already async, so it shouldn't be an issue. MozReview-Commit-ID: L0ANHNKKAkY
Attachment #8877181 - Flags: review?(nfroyd)
Assignee: nobody → michael
Component: JavaScript Engine → XPCOM
Flags: needinfo?(michael)
Summary: Nightly constantly hangs → ThreadHangStats getter for Telemetry can be very slow
Comment on attachment 8877181 [details] [diff] [review] Get module information for BHR off main thread Review of attachment 8877181 [details] [diff] [review]: ----------------------------------------------------------------- This is a surprisingly high ratio of added lines to deleted lines, but maybe that's just the way things go. ::: xpcom/threads/BackgroundHangMonitor.cpp @@ +468,5 @@ > +// for Telemetry. > +// > +// If this object is canceled, it will submit its payload to the > +// BackgroundHangThread without performing the processing. > +class ProcessHangRunnable : public CancelableRunnable Nit: might as well make this final. @@ +479,5 @@ > + : mManager(aManager) > + , mNativeStack(mozilla::Move(aNativeStack)) > + , mThread(aThread) > + , mHistogram(mozilla::Move(aHistogram)) > + { Worth MOZ_ASSERT(mThread) here? @@ +605,5 @@ > + // Hold a strong reference to the runnable so it doesn't go away. > + nsCOMPtr<nsICancelableRunnable> cancelable = > + mProcessHangRunnables[mProcessHangRunnables.Count() - 1]; > + // NOTE: This removes the item from the array. > + cancelable->Cancel(); Maybe we should make the array a linked list, instead, so runnables can remove themselves in constant time? It's not only this case that suffers from O(n) removal, but all runnables need to go through that process, which seems suboptimal.
Attachment #8877181 - Flags: review?(nfroyd) → review+
MozReview-Commit-ID: L0ANHNKKAkY
Pushed by michael@thelayzells.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/9c8803d97c7d Get module information for BHR off main thread, r=froydnj
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Attached image telemetry.png
Since bug 1373892 got duped to this one: As of build 20170622030208 I am still seeing hangs on the main thread. So this issue is not fixed. See attached screenshot for the main culprits.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
FYI, at least my problem really got fixed. I don't have any hangs since this patch landed.
(In reply to The 8472 from comment #14) > Created attachment 8881057 [details] > telemetry.png > > Since bug 1373892 got duped to this one: As of build 20170622030208 I am > still seeing hangs on the main thread. So this issue is not fixed. > > See attached screenshot for the main culprits. It would be much more useful if you would be willing to share the complete profile (even in private). I cannot get as much useful information from the screenshot as would be useful for diagnosing the issue. For example, I can see with this profile that things are slow, but I don't really have a good idea of why or how. Your new issue is very different from the old one which you originally filed and from this bug. I would recommend opening a new bug for the hang you're experiencing right now, as the problem of threadHangStats' getter (namely GetInfoForSelf) taking a long time is gone now (it cannot be run on the main thread in the telemetry getter anymore).
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
I'll just reopen my bug then since only 2 of the 3 hot methods I reported do not occur anymore.
Depends on: 1374888
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: