Closed Bug 1360164 Opened 7 years ago Closed 7 years ago

The first async XHR done during startup blocks the main thread while initializing nsProtocolProxyService

Categories

(Core :: Networking, defect)

x86_64
Windows 10
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1366133
Performance Impact high

People

(Reporter: florian, Assigned: bagder)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-active][necko-quantum])

Attachments

(1 file)

Here is a startup profile on Windows 10 on a Quantum reference device with a mechanical hard drive: http://perfht.ml/2poNwq7

The first async XHR (which happened to be triggered by captive portal initialization, but could be triggered by something else) blocks the main thread for a while (114ms) in mozilla::net::nsProtocolProxyService::Init().

It's really ReadInternetOption (http://searchfox.org/mozilla-central/rev/ce5ccb6a8ca803271c946ccb8b43b7e7d8b64e7a/toolkit/system/windowsproxy/nsWindowsSystemProxySettings.cpp#69) that's expensive, as it calls the InternetGetConnectedStateExW and InternetQueryOptionW windows API that seem to do sync I/O.
This is a good find - thanks! Unfortunately I don't think its going to be a big win

* This boils down to pps::ReloadPac() doing disk i/o under our default windows config via some synchronous windows system libs

* Init is one place this is done, but its also done periodically when you go on/offline wakeup from sleep, etc

* That info can't be cached - those are points where we really do want to know the current OS value. (indeed the info is already cached - these are the points where we are refreshing it.)

* the jank can be alleviated with a helper thread but..

* networking would probably have to be blocked (in a non janky way) while we wait for the reply. So that 114ms delay would still happen to the xhr.. but I guess some of the time that's ok afterall your example is a total background item anyhow.

Daniel, as AsyncResolve() is already async this could probably just be done with state flag that says the helper thread is busy doing its thing... and ReloadPac() could just set that state, dispatch the work, and return quickly. wdyt?
Flags: needinfo?(daniel)
(In reply to Patrick McManus [:mcmanus] from comment #1)

> * networking would probably have to be blocked (in a non janky way) while we
> wait for the reply. So that 114ms delay would still happen to the xhr.. but
> I guess some of the time that's ok afterall your example is a total
> background item anyhow.

Yes, that would be ok. I'm trying to reduce the time it takes to display the first window during startup. If this check can be done on a network thread, startup can continue.
Whiteboard: [qf] → [qf:p1]
Bug 1361495 shows another case where we need to do stuff early on for necko on a background thread.  If we decide to do that, then this may be a good candidate to piggy back on top of the work done in that bug.
Assignee: nobody → daniel
Whiteboard: [qf:p1] → [qf:p1][necko-active]
What makes things a little tricky is this method from the android implementation of the proxy settings code:

nsAndroidSystemProxySettings::GetMainThreadOnly(bool *aMainThreadOnly)
{
  *aMainThreadOnly = true;
  return NS_OK;
}

... implying we need to keep the GetPACURI() call on the main thread conditionally if the backend needs it.
Flags: needinfo?(daniel)
(In reply to Daniel Stenberg [:bagder] from comment #4)
> What makes things a little tricky is this method from the android
> implementation of the proxy settings code:
> 
> nsAndroidSystemProxySettings::GetMainThreadOnly(bool *aMainThreadOnly)
> {
>   *aMainThreadOnly = true;
>   return NS_OK;
> }
> 
> ... implying we need to keep the GetPACURI() call on the main thread
> conditionally if the backend needs it.

I think that's ok for at least the first step.
Here's my initial take on fixing this issue. I've run into some stupid build problems on my Windows box that I work on, so I haven't yet been able to personally verify the functionality so I'm not marking this ready for feedback or review yet but I figured I could at least show where I'm at.
Whiteboard: [qf:p1][necko-active] → [qf:p1][necko-active][necko-quantum]
See Also: → 1366133
As bug 1366133 landed, it worths to profile again to see if anything more we need to do for this bug.
Hi Shian-Yow, could you find someone to profile this again and decide whether we want to fix this?
Flags: needinfo?(swu)
Florian, could you help to profile this again?  We think this bug could be fixed after bug 1366133 landed.  If you are busy, we can have Gary (who fixed bug 1366133) to profile it, and he may need to understand the STR in comment 0.  Thanks!
Flags: needinfo?(swu) → needinfo?(florian)
I can't see this anymore in cold startup profiles on the current (June 20) nightly on the quantum reference hardware.
Flags: needinfo?(florian)
(In reply to Florian Quèze [:florian] [:flo] from comment #10)
> I can't see this anymore in cold startup profiles on the current (June 20)
> nightly on the quantum reference hardware.

Daniel, do you think we can safely close this issue as a dup?
Flags: needinfo?(daniel)
I feel confident that this issue was fixed with your work on that bug yes, and I view Florian's comment #10 as a verification of that. As such, let's close this and celebrate! =)

Thanks, both of you!
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(daniel)
Resolution: --- → DUPLICATE
Performance Impact: --- → P1
Whiteboard: [qf:p1][necko-active][necko-quantum] → [necko-active][necko-quantum]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: