Closed Bug 1366133 Opened 7 years ago Closed 7 years ago

Calling InternetGetConnectedStateExW() in ReadInternetOption() may cause hangs

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Performance Impact high
Tracking Status
firefox55 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: xeonchen)

References

Details

(Whiteboard: [necko-active][bhr])

Attachments

(2 files)

In the BHR hang data I see hangs like this:

RtlpWaitOnAddress (in wntdll.pdb)<br>
RtlpWaitOnCriticalSection (in wntdll.pdb)<br>
RtlpEnterCriticalSectionContended (in wntdll.pdb)<br>
RtlEnterCriticalSection (in wntdll.pdb)<br>
IsDialUpConnection(CWxString *) (in wininet.pdb)<br>
InternetGetConnectedStateExW (in wininet.pdb)<br>
ReadInternetOption (in xul.pdb)<br>
nsWindowsSystemProxySettings::GetPACURI(nsACString &amp;) (in xul.pdb)<br>
mozilla::net::nsProtocolProxyService::ReloadPAC() (in xul.pdb)<br>
mozilla::net::nsProtocolProxyService::ReloadNetworkPAC() (in xul.pdb)<br>
mozilla::net::nsProtocolProxyService::Observe(nsISupports *,char const *,char16_t const *) (in xul.pdb)<br>
nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *) (in xul.pdb)<br>
nsObserverService::NotifyObservers(nsISupports *,char const *,char16_t const *) (in xul.pdb)<br>
nsNotifyAddrListener::ChangeEvent::Run() (in xul.pdb)<br>
nsThread::ProcessNextEvent(bool,bool *) (in xul.pdb)<br>
NS_ProcessNextEvent(nsIThread *,bool) (in xul.pdb)<br>
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) (in xul.pdb)<br>
MessageLoop::RunHandler() (in xul.pdb)<br>
MessageLoop::Run() (in xul.pdb)<br>
nsBaseAppShell::Run() (in xul.pdb)<br>
nsAppShell::Run() (in xul.pdb)<br>
nsAppStartup::Run() (in xul.pdb)<br>
XREMain::XRE_mainRun() (in xul.pdb)<br>

As far as I can tell, we only call InternetGetConnectedStateExW() in <https://searchfox.org/mozilla-central/rev/24c443a440104dabd9447608bd41b8766e8fc2f5/toolkit/system/windowsproxy/nsWindowsSystemProxySettings.cpp#75> so that we fill in the pszConnection member for the third argument to InternetQueryOptionW(), however per MSDN <https://msdn.microsoft.com/en-us/library/windows/desktop/aa385101(v=vs.85).aspx> that is an out argument...

Masatoshi, do you know if we actually need to initialize this field to the name of the modem connection?  I don't have a modem connection so I wouldn't be able to test what would happen if I changed this code myself...
Flags: needinfo?(VYV03354)
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #0)
> As far as I can tell, we only call InternetGetConnectedStateExW() in
> <https://searchfox.org/mozilla-central/rev/
> 24c443a440104dabd9447608bd41b8766e8fc2f5/toolkit/system/windowsproxy/
> nsWindowsSystemProxySettings.cpp#75> so that we fill in the pszConnection
> member for the third argument to InternetQueryOptionW(), however per MSDN
> <https://msdn.microsoft.com/en-us/library/windows/desktop/aa385101(v=vs.85).
> aspx> that is an out argument...

Actually this is an in-out parameter. MSDN is inaccurate as always.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa385146(v=vs.85).aspx
> pszConnection
>     Pointer to a string that contains the name of the RAS connection or NULL, which indicates the default or LAN connection, to set or *query* options on.
(emphasis mine)

> Masatoshi, do you know if we actually need to initialize this field to the
> name of the modem connection?

INTERNET_CONNECTION_MODEM will correspond to [Internet Options] > [Connections] > [Dial-up and Virtual Private Network settings] (so it requires a connection name to identify which connection setting we are operating on) and INTERNET_CONNECTION_LAN will correspond to [Local Area Network (LAN) settings].

> I don't have a modem connection so I wouldn't
> be able to test what would happen if I changed this code myself...

The "modem" doesn't have to be a physical modem. It might be a PPPoE connection, it might be a VPN connection. 

As an aside, we can remove the INTERNET_PER_CONN_FLAGS fallback because we no longer supports Windows Vista or earlier.
Flags: needinfo?(VYV03354)
Whiteboard: [qf][bhr] → [necko-backlog][qf][bhr]
I asked Honza about this bug and his response was:
> mayhemer: probably just make nsProtocolProxyService::Reload(Network)PAC happen on a background (best separate - new) thread?
> mayhemer: this is proxy related, maybe Gary should look at it

Gary, what do you think?
Flags: needinfo?(xeonchen)
(In reply to Valentin Gosu [:valentin] from comment #2)
> I asked Honza about this bug and his response was:
> > mayhemer: probably just make nsProtocolProxyService::Reload(Network)PAC happen on a background (best separate - new) thread?
> > mayhemer: this is proxy related, maybe Gary should look at it
> 
> Gary, what do you think?

I agree, I'll take this bug.

Two parts to do:

- Remove the INTERNET_PER_CONN_FLAGS fallback
- Move nsProtocolProxyService::Reload(Network)PAC to another thread
Assignee: nobody → xeonchen
Flags: needinfo?(xeonchen)
Sounds great, thanks Masatoshi and Gary!
Whiteboard: [necko-backlog][qf][bhr] → [necko-backlog][qf:p1][bhr]
Comment on attachment 8873794 [details]
Bug 1366133 - Part 1: make nsISystemProxySettings::GetPACURI happen in another thread;

After investigation, I found that |InternetGetConnectedStateExW()| is only called by two functions: |nsWindowsSystemProxySettings::GetPACURI| and |nsWindowsSystemProxySettings::GetProxyForURI|, and the latter is never called on main thread of Windows platform, therefore, only 2 occurrences of |GetPACURI| must be handled. I move the API calls and following steps into newly created function, |AsyncConfigureFromPAC|.
Comment on attachment 8873793 [details]
Bug 1366133 - Part 0: remove support for Windows Vista and earlier versions;

https://reviewboard.mozilla.org/r/145210/#review149590

::: toolkit/system/windowsproxy/nsWindowsSystemProxySettings.cpp:97
(Diff revision 2)
>      unsigned long size = sizeof(INTERNET_PER_CONN_OPTION_LISTW);
>      if (!InternetQueryOptionW(nullptr, INTERNET_OPTION_PER_CONNECTION_OPTION,
>                                &list, &size)) {
> -        if (GetLastError() != ERROR_INVALID_PARAMETER) {
> -            return NS_ERROR_FAILURE;
> +        return NS_ERROR_FAILURE;
> -        }
> +    }

Can you clarify exactly how this removes support for Vista and earlier (maybe in the commit message)? Are those versions more likely tyo return ERROR_INVALID_PARAMETER and require that second code block?
Attachment #8873793 - Flags: review?(daniel) → review+
Comment on attachment 8873794 [details]
Bug 1366133 - Part 1: make nsISystemProxySettings::GetPACURI happen in another thread;

https://reviewboard.mozilla.org/r/145212/#review149594
Attachment #8873794 - Flags: review?(daniel) → review+
Just a heads-up: this bug is very similar to bug 1360164 so with a little care/luck I would say that we might be able to fix both at once.
Whiteboard: [necko-backlog][qf:p1][bhr] → [necko-active][qf:p1][bhr]
See Also: → 1360164
Pushed by gachen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/40c08de101f1
Part 0: remove support for Windows Vista and earlier versions; r=bagder
https://hg.mozilla.org/integration/autoland/rev/9a76c93fb1b5
Part 1: make nsISystemProxySettings::GetPACURI happen in another thread; r=bagder
https://hg.mozilla.org/mozilla-central/rev/40c08de101f1
https://hg.mozilla.org/mozilla-central/rev/9a76c93fb1b5
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Depends on: 1371550
IIUC we're still going to delay getting system proxy info until the 1st necko channel needs it?  Would be it better to initialize the proxyProtocolSvc early in startup, so the 1st XHR, etc doesn't wind up waiting for init to finish?
Flags: needinfo?(daniel)
Yes, I agree.

Even though this landed fix will already make it faster since there were other main-thread operations waiting to get done, so now they'll get done faster. This async but still init-on-first-use still risks delaying the first network request with 100ms.

That said, the first network request done is often also doing DNS and many other slow requests so those 100ms might not be very notable in a lot of cases.

I think we should file a new bug for this.
Flags: needinfo?(daniel)
Filed follow up bug 1379872 per comment 26.
FWIW, I just got an assertion crash in ~AsyncGetPACURIRequest () on line 369:

    367: ~AsyncGetPACURIRequest()
    368: {
    369:     MOZ_ASSERT(NS_IsMainThread() == mIsMainThreadOnly);
    370:     NS_ReleaseOnMainThreadSystemGroup(
    371:       "AsyncGetPACURIRequest::mServiceHolder", mServiceHolder.forget());
    372: }

on resume from hibernation on FF 56.0.1 x64 under Win7 SP1 x64. VS indicates that the code is running in the main thread in the chrome process. I have saved a dump about which I will answer questions (but not upload).

A comment on line 318 led me here:

    318: // Bug 1366133: make GetPACURI off-main-thread since it may hang on Windows platform
(In reply to q1 from comment #28)
> FWIW, I just got an assertion crash in ~AsyncGetPACURIRequest () on line 369:
> 
>     367: ~AsyncGetPACURIRequest()
>     368: {
>     369:     MOZ_ASSERT(NS_IsMainThread() == mIsMainThreadOnly);
>     370:     NS_ReleaseOnMainThreadSystemGroup(
>     371:       "AsyncGetPACURIRequest::mServiceHolder",
> mServiceHolder.forget());
>     372: }
> 
> on resume from hibernation on FF 56.0.1 x64 under Win7 SP1 x64. VS indicates
> that the code is running in the main thread in the chrome process. I have
> saved a dump about which I will answer questions (but not upload).
> 
> A comment on line 318 led me here:
> 
>     318: // Bug 1366133: make GetPACURI off-main-thread since it may hang on
> Windows platform

thank you for reporting this, the issue has been fixed in bug 1405496.
See Also: → 1616082
Performance Impact: --- → P1
Whiteboard: [necko-active][qf:p1][bhr] → [necko-active][bhr]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: