Closed
Bug 1361495
Opened 4 years ago
Closed 4 years ago
Initialize winsock on a background thread really early on during startup
Categories
(Core :: Networking, enhancement)
Core
Networking
Tracking
()
RESOLVED
FIXED
mozilla55
| Tracking | Status | |
|---|---|---|
| firefox55 | --- | fixed |
People
(Reporter: ehsan.akhgari, Assigned: mcmanus)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
The profile in bug 1361087 (https://perfht.ml/2p1uyD1) shows a really interesting problem. We call gethostname() on the main thread. That goes to call WSALookupServiceBeginW() from WinSock which happens to be the first winsock function in this profile which starts loading and initializing winsock DLLs, which takes 252ms on the main thread during startup, which is very bad. It seems that we could do a lot better here. Can we move to a model where we spin up a networking thread (for example the STS thread if that happens to be spawn early enough during startup) which calls a winsock API with the purpose of initializing the winsock libraries off the main thread as soon as possible? Otherwise we're playing russian roulette of sorts with a loaded gun holding a 250ms jank bullet every time some code calls a winsock API on the main thread during startup. The first one blows out their brains on the table and the rest of the consumers will be fast. Bug 1361087's patch is merely just changing the rules of the roulette game, if it's not this consumer paying the 250ms cost on the main thread it'll be the next one before we fix this underlying issue.
| Assignee | ||
Comment 1•4 years ago
|
||
oy. thanks. I think the suggestion in comment 0 is good. we can do that here. Is this something that is true in the children or just the parent? It doesn't matter right now, STS is in both - but it does very few things in the children and we'd like to eliminate it sooner or later. further, its appalling that nspr is calling gethostname().. specifically it seems to be doing it for PR_GetSystemInfo(PR_SI_HOSTNAME).. its unclear what we would ever do with that information and dxr can't show me a user of it (but because it is a property bag impl and the key is 'host' I'm really not sure the search is correct). but I think we ought to try removing that from nsSystemInfo::Init() too...
| Comment hidden (mozreview-request) |
| Assignee | ||
Comment 3•4 years ago
|
||
This is a try run to see what life without nsSystemInfo.getProperty("host") would look like| Assignee | ||
Comment 4•4 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #3) > This is a try run to see what life without nsSystemInfo.getProperty("host") > would look like https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d476ef70a76363a8cad849f341a5a7150d583ec
| Comment hidden (mozreview-request) |
| Assignee | ||
Comment 6•4 years ago
|
||
this is another run with getProperty("host") removed - a test needed updating from last run
https://treeherder.mozilla.org/#/jobs?repo=try&revision=394e8b31c3490fd118dd8f3ef12033975c0809a4| Reporter | ||
Comment 7•4 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #1) > oy. thanks. Thank *you* for the quick patch! > I think the suggestion in comment 0 is good. we can do that here. Is this > something that is true in the children or just the parent? It doesn't matter > right now, STS is in both - but it does very few things in the children and > we'd like to eliminate it sooner or later. We don't do networking in the child, right? I mean, if someone goes ahead and calls gethostname() in a content process then surely this will be a problem there as well. We do call WSAStartup() in the child as well, so as far as I can tell, we don't really have anything protecting us against this in the child, do we? > further, its appalling that nspr is calling gethostname().. specifically it > seems to be doing it for PR_GetSystemInfo(PR_SI_HOSTNAME).. its unclear what > we would ever do with that information and dxr can't show me a user of it > (but because it is a property bag impl and the key is 'host' I'm really not > sure the search is correct). but I think we ought to try removing that from > nsSystemInfo::Init() too... Thanks for trying to get rid of it, that is a noble effort! AFAICT the other usages of PR_SI_HOSTNAME are in the DNS service, <https://searchfox.org/mozilla-central/rev/ae8c2e2354db652950fe0ec16983360c21857f2a/toolkit/profile/nsProfileLock.cpp#345> and <https://searchfox.org/mozilla-central/rev/abe68d5dad139e376d5521ca1d4b7892e1e7f1ba/widget/xremoteclient/XRemoteClient.cpp#298>, the latter two being non-Windows code.
Comment on attachment 8863905 [details] Bug 1361495 - trigger winsock init early on STS to avoid main thread jank https://reviewboard.mozilla.org/r/135622/#review138672 ::: netwerk/base/nsSocketTransportService2.cpp:845 (Diff revision 2) > nsSocketTransportService::Run() > { > SOCKET_LOG(("STS thread init %d sockets\n", gMaxCount)); > > + if (!IsNeckoChild()) { > + // see bug 1361495, gethostname() inside nspr can trigger Reading the bug makes it clear, but this comment doesn't make it immediately obvious that PR_SI_HOSTNAME results in calling gethostname(), perhaps call that out?
Attachment #8863905 -
Flags: review?(hurley) → review+
| Reporter | ||
Comment 9•4 years ago
|
||
Why not call gethostname() directly? Also, I think we should get a profile with this patch applied to confirm that the jank from bug 1361087 goes away without the patch attached there, since this is timing sensitive... It would be nice if there was a good way to write an automated test for this. :-/
| Assignee | ||
Comment 10•4 years ago
|
||
if you read the nspr code you'll see it doesn't just call gethostname() - it has a bunch of platform specific nspr'y things in between. Seemed best to leave them in place to reduce risk to other platforms. I guess it could be #ifdef WIN32 gethostname(); #endif let me know if you think that's better.. they seem about the same to me.
| Assignee | ||
Comment 11•4 years ago
|
||
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #7) > (In reply to Patrick McManus [:mcmanus] from comment #1) > > oy. thanks. > > Thank *you* for the quick patch! > > > I think the suggestion in comment 0 is good. we can do that here. Is this > > something that is true in the children or just the parent? It doesn't matter > > right now, STS is in both - but it does very few things in the children and > > we'd like to eliminate it sooner or later. > > We don't do networking in the child, right? I mean, if someone goes ahead > and calls gethostname() in a content process then surely this will be a > problem there as well. We do call WSAStartup() in the child as well, so as > far as I can tell, we don't really have anything protecting us against this > in the child, do we? well, that's why I ask. Technically right now we can still do networking in the child because some piece of RTC does that. (STUN iirc). Otherwise, we don't. But the STS does exist (though the http machinery does not). Eventually RTC will get fixed (and we can sandbox networking out of the child) and I'd been figuring to get rid of the STS when that happened. but if this needs to hang around on the child we just have it do the work and then have STS exit. We'll cross that when we get there, meanwhile, I'll make this code run on the child too. > > > further, its appalling that nspr is calling gethostname().. specifically it > > seems to be doing it for PR_GetSystemInfo(PR_SI_HOSTNAME).. its unclear what > > we would ever do with that information and dxr can't show me a user of it > > (but because it is a property bag impl and the key is 'host' I'm really not > > sure the search is correct). but I think we ought to try removing that from > > nsSystemInfo::Init() too... > > Thanks for trying to get rid of it, that is a noble effort! > > AFAICT the other usages of PR_SI_HOSTNAME are in the DNS service, > <https://searchfox.org/mozilla-central/rev/ > ae8c2e2354db652950fe0ec16983360c21857f2a/toolkit/profile/nsProfileLock. > cpp#345> and > <https://searchfox.org/mozilla-central/rev/ > abe68d5dad139e376d5521ca1d4b7892e1e7f1ba/widget/xremoteclient/XRemoteClient. > cpp#298>, the latter two being non-Windows code. so that use in DNS is only used during PAC evaluation.. that's done in its own dedicated thread pool off the main thread. and the try run in comment 6 looks good - so it looks like we can try and nix getProperty("host") - which will go a long way towards making this not as timing sensitive.
Updated•4 years ago
|
Assignee: nobody → mcmanus
| Comment hidden (mozreview-request) |
| Assignee | ||
Comment 13•4 years ago
|
||
Comment on attachment 8863905 [details] Bug 1361495 - trigger winsock init early on STS to avoid main thread jank this is the gethostname() version.. req re-r?
Attachment #8863905 -
Flags: review+ → review?(hurley)
| Assignee | ||
Comment 14•4 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #11) > (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from > comment #7) > and the try run in comment 6 looks good - so it looks like we can try and > nix getProperty("host") - which will go a long way towards making this not > as timing sensitive. sort that part out in bug 1361601
| Reporter | ||
Comment 15•4 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #10) > if you read the nspr code you'll see it doesn't just call gethostname() - it > has a bunch of platform specific nspr'y things in between. Seemed best to > leave them in place to reduce risk to other platforms. I guess it could be > #ifdef WIN32 > gethostname(); > #endif > > let me know if you think that's better.. they seem about the same to me. They seem about the same to me as well, I just prefer to not add magic I don't understand where it's not needed, so I'd prefer to keep this as a Windows only hack. :-) I'm more afraid of it backfiring on other platforms than hopeful of it helping. :-) (In reply to Patrick McManus [:mcmanus] from comment #11) > > We don't do networking in the child, right? I mean, if someone goes ahead > > and calls gethostname() in a content process then surely this will be a > > problem there as well. We do call WSAStartup() in the child as well, so as > > far as I can tell, we don't really have anything protecting us against this > > in the child, do we? > > well, that's why I ask. Technically right now we can still do networking in > the child because some piece of RTC does that. (STUN iirc). Otherwise, we > don't. But the STS does exist (though the http machinery does not). > Eventually RTC will get fixed (and we can sandbox networking out of the > child) and I'd been figuring to get rid of the STS when that happened. but > if this needs to hang around on the child we just have it do the work and > then have STS exit. We'll cross that when we get there, meanwhile, I'll make > this code run on the child too. OK, yeah I'm glad you brought it up. We should do it in both processes I think. > > > further, its appalling that nspr is calling gethostname().. specifically it > > > seems to be doing it for PR_GetSystemInfo(PR_SI_HOSTNAME).. its unclear what > > > we would ever do with that information and dxr can't show me a user of it > > > (but because it is a property bag impl and the key is 'host' I'm really not > > > sure the search is correct). but I think we ought to try removing that from > > > nsSystemInfo::Init() too... > > > > Thanks for trying to get rid of it, that is a noble effort! > > > > AFAICT the other usages of PR_SI_HOSTNAME are in the DNS service, > > <https://searchfox.org/mozilla-central/rev/ > > ae8c2e2354db652950fe0ec16983360c21857f2a/toolkit/profile/nsProfileLock. > > cpp#345> and > > <https://searchfox.org/mozilla-central/rev/ > > abe68d5dad139e376d5521ca1d4b7892e1e7f1ba/widget/xremoteclient/XRemoteClient. > > cpp#298>, the latter two being non-Windows code. > > so that use in DNS is only used during PAC evaluation.. that's done in its > own dedicated thread pool off the main thread. Oh great! > and the try run in comment 6 looks good - so it looks like we can try and > nix getProperty("host") - which will go a long way towards making this not > as timing sensitive. Fantastic!
Comment 16•4 years ago
|
||
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #0) > The profile in bug 1361087 (https://perfht.ml/2p1uyD1) shows a really > interesting problem. We call gethostname() on the main thread. Doesn't the call itself (even when WSA already initialized) cause main thread blocking?
Comment 17•4 years ago
|
||
| mozreview-review | ||
Comment on attachment 8863905 [details] Bug 1361495 - trigger winsock init early on STS to avoid main thread jank https://reviewboard.mozilla.org/r/135622/#review138842 ::: netwerk/base/nsSocketTransportService2.cpp:849 (Diff revision 3) > + // so do it here (on parent and child) to protect against it being done first > + // accidentally on the main thread.. especially via PR_GetSystemInfo(). This > + // will also improve latency of first real winsock operation > + // .. > + // If STS-thread is no longer needed this should still be run before exiting > + Unused << gethostname(ignoredStackBuffer, 255); Isn't calling WSAStartup better thing to do? I'm kinda surprised we do gethostname sooner than that...
| Comment hidden (mozreview-request) |
| Assignee | ||
Comment 19•4 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #16) > (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from > comment #0) > > The profile in bug 1361087 (https://perfht.ml/2p1uyD1) shows a really > > interesting problem. We call gethostname() on the main thread. > > Doesn't the call itself (even when WSA already initialized) cause main > thread blocking? probably not - its getting the local hostname. But see bug 1361601 to totally remove that instance anyhow.. this bug is really just about getting winsock initiated earlier for 2 reasons 1] prevent a recurrence of this through some other unidentified path 2] improve responsiveness of first real winsock operation instead of doing lazy init implicitly in nspr(In reply to Honza Bambas (:mayhemer) from comment #17) > > Isn't calling WSAStartup better thing to do? that little bit of ugliness is buried inside nspr and I prefer to leave it there :)
| Assignee | ||
Updated•4 years ago
|
Attachment #8863905 -
Flags: review?(hurley)
Comment 20•4 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #19) > that little bit of ugliness is buried inside nspr and I prefer to leave it > there :) :) Understand, just FYI: https://dxr.mozilla.org/mozilla-central/search?q=wsastartup&redirect=false
Comment 21•4 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #20) > :) Understand, just FYI: > https://dxr.mozilla.org/mozilla-central/search?q=wsastartup&redirect=false OTOH, most of them are tests under chromium. Only serious users are webrtc and sctp. Still, I'm a bit curious why calling gethostname should be better than calling WSAStartup or some NSPR IO func to run it. I'll check where from we call WSAStartup the first time now.
| Assignee | ||
Comment 22•4 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #21) > (In reply to Honza Bambas (:mayhemer) from comment #20) > > :) Understand, just FYI: > > https://dxr.mozilla.org/mozilla-central/search?q=wsastartup&redirect=false > > OTOH, most of them are tests under chromium. Only serious users are webrtc > and sctp. > and I believe both those latter ones are actually imported code..
Comment 23•4 years ago
|
||
Just for curiosity (win10, m-c@82c2d17e74ef debug build, MOZ_LOG* unset):
nsSystemInfo::GetProperty already removed in this cset.
> nss3.dll!_PR_MD_INIT_IO(...) Line 48 C (w95io.c)
nss3.dll!_PR_InitIO() Line 63 C
nss3.dll!_PR_InitStuff() Line 203 C
nss3.dll!_PR_ImplicitInitialization() Line 226 C
nss3.dll!PR_NewThreadPrivateIndex(0x1acabcf8, 0x00000000) Line 105 C
xul.dll!nsTraceRefcnt::SetActivityIsLegal(true) Line 1315 C++
xul.dll!NS_LogInit() Line 929 C++
xul.dll!mozilla::BootstrapImpl::NS_LogInit() Line 30 C++
firefox.exe!InitXPCOMGlue(0x01a030c0) Line 257 C++
firefox.exe!NS_internal_main(4, 0x01a02040, 0x013de878) Line 296 C++
firefox.exe!wmain(4, 0x013da930) Line 115 C++
firefox.exe!__scrt_common_main_seh() Line 253 C++
kernel32.dll!@BaseThreadInitThunk@12() Unknown
ntdll.dll!__RtlUserThreadStart() Unknown
ntdll.dll!__RtlUserThreadStart@8() Unknown
With
ws2_32.dll C:\Windows\SysWOW64\ws2_32.dll N/A N/A Symbols loaded. C:\Symbols\ws2_32.pdb\ec49255a7c1745e580816aff78ea72381\ws2_32.pdb 22 10.0.14393.206 (rs1_release.160915-0644) 2016-09-15 18:55 76D10000-76D73000 [17288] firefox.exe
wsock32.dll C:\Windows\SysWOW64\wsock32.dll N/A N/A Symbols loaded. C:\Symbols\wsock32.pdb\88f5486102b2457c8410dc0b14d3d2fa1\wsock32.pdb 21 10.0.14393.0 (rs1_release.160715-1616) 2016-07-16 03:43 73AA0000-73AA8000 [17288] firefox.exe
already loaded. Hence, doing gethostname probably triggers some more initiation code. Hence, probably a good thing to do :)
Comment 24•4 years ago
|
||
| mozreview-review | ||
Comment on attachment 8863905 [details] Bug 1361495 - trigger winsock init early on STS to avoid main thread jank https://reviewboard.mozilla.org/r/135622/#review138928
Attachment #8863905 -
Flags: review?(hurley) → review+
Comment 25•4 years ago
|
||
Pushed by mcmanus@ducksong.com: https://hg.mozilla.org/integration/autoland/rev/328fbb1a7bb0 trigger winsock init early on STS to avoid main thread jank r=nwgh
Comment 26•4 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/328fbb1a7bb0
Status: NEW → RESOLVED
Closed: 4 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in
before you can comment on or make changes to this bug.
Description
•