Closed Bug 1529276 Opened 5 years ago Closed 3 months ago

resource:///modules/WindowsJumpLists.jsm still does main thread I/O

Categories

(Firefox :: Shell Integration, defect, P3)

defect

Tracking

()

RESOLVED FIXED
122 Branch
Performance Impact medium
Tracking Status
firefox122 --- fixed

People

(Reporter: florian, Assigned: mconley)

References

(Blocks 3 open bugs)

Details

(Keywords: main-thread-io, perf, perf:responsiveness, Whiteboard: [bhr:JumpList])

Attachments

(6 files, 1 obsolete file)

See this profile captured on the 2018 ref hardware: https://perfht.ml/2DXOTBP

Many main thread I/O syscalls caused by our calls to CShellLink::SetPath, and some more caused by mozilla::widget::FaviconHelper::ObtainCachedIconFile calling nsLocalFile::Exists.

Keywords: main-thread-io, perf
Priority: -- → P3
Whiteboard: [fxperf][qf] → [fxperf][qf:p2:responsiveness]
Whiteboard: [fxperf][qf:p2:responsiveness] → [fxperf:p2][qf:p2:responsiveness]
Component: General → Shell Integration

I thought the JumpList slowness was only due to main thread I/O, but I captured yesterday a profile on average user hardware with an SSD where mozilla::widget::JumpListBuilder::AddListToBuild blocked for almost 3s: https://perfht.ml/2Lxx9SC

(In reply to Florian Quèze [:florian] from comment #1)

I thought the JumpList slowness was only due to main thread I/O, but I captured yesterday a profile on average user hardware with an SSD where mozilla::widget::JumpListBuilder::AddListToBuild blocked for almost 3s: https://perfht.ml/2Lxx9SC

Looks like Windows does some IO behind the scenes, as well as blocking on communicating with some other thread...

Depends on: 1667807

Sylvestre, I was talking to Jim Mathies who originally worked on a lot of this code. He suggested gcp might have an opinion about who could look at this, he suggested Chris Martin. It certainly seems like this is an OS integration issue so I'm referring this to you for now, so we can hopefully move this forward. This is frequently seen in BHR data and this type of main thread I/O is something we really would prefer not to see.

Flags: needinfo?(sledru)

How important it is ?
As it has been occurring for a long time, I am wondering what should be the timeline here.
thanks

Flags: needinfo?(sledru)
Whiteboard: [fxperf:p2][qf:p2:responsiveness] → [fxperf:p2][qf:p2:responsiveness][bhr:jumplist]
Depends on: 1503809
Whiteboard: [fxperf:p2][qf:p2:responsiveness][bhr:jumplist] → [fxperf:p2][qf:p2:responsiveness][bhr:JumpList]
Performance Impact: --- → P2
Whiteboard: [fxperf:p2][qf:p2:responsiveness][bhr:JumpList] → [fxperf:p2][bhr:JumpList]

He suggested gcp might have an opinion about who could look at this, he suggested Chris Martin. It certainly seems like this is an OS integration issue

FWIW this kind of shell integration stuff is a bit closer to the Desktop Integration team from Amir. So it's actually with the right triage owner in Bugzilla. Amir told me they can take it but they're not clear how to prioritize. Looking at bug 1667807, it looks like we considered getting that data but haven't yet.

Arguably bug 1667807 could be a good first/second bug.

(In reply to Gian-Carlo Pascutto [:gcp] from comment #5)

Amir told me they can take it but they're not clear how to prioritize.

From BHR data (covering only the Nightly user population) we can say that:

  • this is consistently among the 10 most frequent hangs (depending on the day it seems to be somewhere between #6 and #9).
  • this represents about 1% of the total hang time
  • the average hang time is between 500ms and 1s (but the average isn't really meaningful; the median would be more interesting but this data isn't easily accessible). We only collect hang reports when the main thread has been blocked for more than 128ms.
  • for about 30-40% of the hang reports, the user tried to interact with the browser while it was unresponsive (Ie. this is causing actual user pain.)
Severity: normal → S2
Whiteboard: [fxperf:p2][bhr:JumpList] → [bhr:JumpList]

Mike, you were poking at this recently - are you planning to fix this?

Flags: needinfo?(mconley)

Yes, I was wanting to look at this during my streams, but only if it's not in the critical path. If somebody else wants to look at this and they're on a tight timeline, they're welcome to take this.

Flags: needinfo?(mconley)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #9)

Yes, I was wanting to look at this during my streams, but only if it's not in the critical path. If somebody else wants to look at this and they're on a tight timeline, they're welcome to take this.

Go right ahead :). I was just asking because this came up in triage.

Assignee: nobody → mconley

Hey mhowell! If you have a moment, I'm hoping for a quick consult on something I've been working on off the side of my desk (the patch for this bug).

What I have here is a WIP patch that adds a simpler interface for updating the Windows jump list off of the main thread. The old mechanism basically exposed the details of the win32 jump list API over XPCOM. My new mechanism abstracts over a lot of that complexity, and this makes it so that (in theory) it can run entirely off of the main thread.

So it needs a bit of a cleanup still, but one of the things I've been mulling over here is testing. I want to write a GTest for this stuff to make sure that when I call populateJumpList on nsIJumpListBuilder, the right things get called on the win32 API to actually write to the jump list.

Unfortunately (as you may know), it doesn't seem like there's a great way of mocking / stubbing out the jump list API, nor instrumenting it (at least not that I can tell).

What I'm doing instead is creating a thin wrapper around the API calls with a class called JumpListManagerBase, and then passing the arguments through to the underlying API outside of the test. During my GTest, what I'm hoping to do is to have an implementation that simply records what is being passed to those methods so that the GTest can then get a dump of those things to do assertions on.

So my questions are:

  1. Is there a better way of designing a test for this? Am I going about this all wrong?
  2. I still want to keep the old API around until I'm confident we can switch over. This means that there's mechanisms that might attempt to write to the jump list on the main thread, and ones that don't. I'm (re)using a lock for this, and trying to pass that lock down into my JumpListManagerBase as a way of reducing lock acquisition overhead. Is there a better way of doing this?
  3. Originally, I was constructing the "real" JumpListManagerBase implementation with the underlying IID_ICustomDestinationList thing in a constructor. The problem is that the IID_ICustomDestinationList thing is constructed off of the main thread, and so I don't think I can destroy it on the main thread (unless I transfer ownership of it to the main thread?)... so I have this SetNativeDestinationList thing now so that I can construct the JumpListManagerBase on the main thread, but then poke in the reference to the IID_ICustomDestinationList after the fact.

Anyhow, this all feels pretty brittle and scary, and I was wondering if you had any ideas about how I could make this whole thing better?

Flags: needinfo?(mhowell)
  1. Ideally, we would have a good way to mock the OS API itself at the OS level (that is, in the registry) and substitute our own implementations of those interfaces that just do logging. I don't think we're going to find an easy and non-extremely hacky way to do that though. So I think what you've got is probably a good move, to just use our own wrapper thing that we dependency inject an implementation of.

2 and 3. So, we should only need locking here to protect our own objects; I looked up the registrations for these MSCOM objects and they're registered with ThreadingModel=Apartment, which means the COM runtime is already synchronizing/queuing those calls for us. It should not matter whether we're making calls from the main thread or not (provided whatever thread we're on has had a good CoInitialize call done for it already). Consequently, I also don't think it matters which thread issues the final Release call, it should always get marshaled to the object's apartment thread.

Flags: needinfo?(mhowell)

We use a mock JumpListBackend to act as a stand-in for the win32 API
for populating the JumpList, and test that it gets called with the
expected arguments for each major function of JumpListBuilder.

Depends on D189406

Attachment #9317934 - Attachment is obsolete: true
Attachment #9340358 - Attachment description: Bug 1529276 - Rename the old Windows jump list mechanism to LegacyJumpListBuilder. r?mhowell → Bug 1529276 - Rename the old Windows jump list mechanism to LegacyJumpListBuilder. r?mhowell,bhearsum
Attachment #9340359 - Attachment description: Bug 1529276 - Create a new JumpListBuilder implementation that is simpler and doesn't block the main thread. r?mhowell → Bug 1529276 - Create a new JumpListBuilder implementation that is simpler and doesn't block the main thread. r?mhowell,bhearsum
Attachment #9355388 - Attachment description: Bug 1529276 - Update WindowsJumpList module to select the jump list backend via a pref. r?mhowell → Bug 1529276 - Update WindowsJumpList module to select the jump list backend via a pref. r?mhowell,bhearsum
Attachment #9355389 - Attachment description: Bug 1529276 - Write Gtests for the new JumpListBuilder. r?mhowell → Bug 1529276 - Write Gtests for the new JumpListBuilder. r?mhowell,bhearsum

According to the Microsoft documentation, this is the maximum length
of the string allowed for the description property of an IShellLinkW.
This also adds a test for long descriptions to ensure that they are
properly truncated.

This patch also:

  • Adjusts the tests a little bit to be less race-y, particularly with the
    SetAppID routine in the JumpListBuilder constructor.

  • Fixes the issue where the tests weren't properly copying in the intended
    strings for test titles and descriptions.

Depends on D189407

Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/666e52d79396
Rename the old Windows jump list mechanism to LegacyJumpListBuilder. r=mhowell,bhearsum,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/e0776e6b4cff
Create a new JumpListBuilder implementation that is simpler and doesn't block the main thread. r=mhowell,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/18392ebcd16d
Update WindowsJumpList module to select the jump list backend via a pref. r=mhowell,bhearsum
https://hg.mozilla.org/integration/autoland/rev/45bb7eff2174
Write Gtests for the new JumpListBuilder. r=mhowell,bhearsum,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/cecadb59647e
Limit the IShellLinkW description to MAX_PATH. r=mhowell,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/bbc273ed7b53
Add the ability to enable the new Windows Jump List mechanism via Experimenter. r=mhowell
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f3724116a44d
Rename the old Windows jump list mechanism to LegacyJumpListBuilder. r=mhowell,bhearsum,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/2aebd14532e7
Create a new JumpListBuilder implementation that is simpler and doesn't block the main thread. r=mhowell,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/76e6475c68ba
Update WindowsJumpList module to select the jump list backend via a pref. r=mhowell,bhearsum
https://hg.mozilla.org/integration/autoland/rev/4b9ca2a0747d
Write Gtests for the new JumpListBuilder. r=mhowell,bhearsum,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/d1f5c71b03c2
Limit the IShellLinkW description to MAX_PATH. r=mhowell,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/ce28bf4ceb89
Add the ability to enable the new Windows Jump List mechanism via Experimenter. r=mhowell
Flags: needinfo?(mconley)
Flags: needinfo?(mconley)
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/129269188fef
Rename the old Windows jump list mechanism to LegacyJumpListBuilder. r=mhowell,bhearsum,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/d9a6f2ae662c
Create a new JumpListBuilder implementation that is simpler and doesn't block the main thread. r=mhowell,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/aca991ef2a24
Update WindowsJumpList module to select the jump list backend via a pref. r=mhowell,bhearsum
https://hg.mozilla.org/integration/autoland/rev/03e7a619a2e5
Write Gtests for the new JumpListBuilder. r=mhowell,bhearsum,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/0ed4c3d6d3e5
Limit the IShellLinkW description to MAX_PATH. r=mhowell,win-reviewers
https://hg.mozilla.org/integration/autoland/rev/fb474adbcaab
Add the ability to enable the new Windows Jump List mechanism via Experimenter. r=mhowell
Regressions: 1866403
Blocks: 1503809, 1667807
No longer depends on: 1503809, 1667807
Regressions: 1866601
Blocks: 1866881
Blocks: 1867340
See Also: → 1867344
Blocks: 1880082
You need to log in before you can comment on or make changes to this bug.