Closed
Bug 1470907
Opened 6 years ago
Closed 6 years ago
the number of dns resolver threads is out of control
Categories
(Core :: Networking: DNS, defect, P2)
Core
Networking: DNS
Tracking
()
VERIFIED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | unaffected |
firefox61 | --- | unaffected |
firefox62 | + | verified |
firefox63 | + | verified |
People
(Reporter: froydnj, Assigned: valentin)
References
Details
(Keywords: regression, Whiteboard: [MemShrink][necko-triaged])
Attachments
(1 file)
59 bytes,
text/x-review-board-request
|
froydnj
:
review+
lizzard
:
approval-mozilla-beta+
|
Details |
If I understand: https://searchfox.org/mozilla-central/source/netwerk/dns/nsHostResolver.h#45-50 correctly, nsHostResolver should spin up, at most, 8 resolver threads at one time. Looking at htop on my currently running Nightly, I have 37 DNS Res~ver (sic) threads running in the parent process. It is not super-clear to me how this situation comes about. nsHostResolver::ConditionallyCreateThread is, AFAICT, only called under nsHostResolver::mLock, so there shouldn't be weird issues racing on nsHostResolver::mThreadCount.
Reporter | ||
Comment 1•6 years ago
|
||
Oh, perhaps the problem is that GetHostToLookup returns false in *two* situations: 1) We're shutting down; 2) We timed out in some way. And ThreadFunc interprets both situations as "welp, time to bail". And if we timeout on *one* lookup, the thread that timed out just sits there forevermore, because nothing joins the individual threads that we spin up.
Reporter | ||
Comment 2•6 years ago
|
||
[Tracking Requested - why for this release]: leaking threads is bad. Not sure how many releases this affects, or whether it is a side-effect of bug 1426019. Valentin, can you see if asynchronously shutting down the thread after ThreadFunc has completed (the shutdown will have to be done on the main thread, IIUC) fixes this issue?
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•6 years ago
|
||
I tried reproducing it for about 15 minutes... no luck. Couldn't get it to have more than 8 DNS threads, but you're right, not shutting them down would probably leave them hanging there. I pushed a patch to try with your suggested change. Can you use the build to see if it keeps happening? Maybe I don't have a good enough test case :)
Flags: needinfo?(valentin.gosu) → needinfo?(nfroyd)
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → valentin.gosu
Priority: -- → P2
Whiteboard: [MemShrink] → [MemShrink][necko-triaged]
Comment 5•6 years ago
|
||
It reproduces fairly easily on my Linux machine. I just switched off TRR, revisited most tabs and within minutes I was over 8 DNS threads: $ ps -eL | grep DNS 23574 23674 pts/8 00:00:00 DNS Resolver #1 23574 23803 pts/8 00:00:00 DNS Resolver #2 23574 23804 pts/8 00:00:00 DNS Resolver #3 23574 23805 pts/8 00:00:00 DNS Resolver #4 23574 23806 pts/8 00:00:00 DNS Resolver #5 23574 23807 pts/8 00:00:00 DNS Resolver #6 23574 23924 pts/8 00:00:00 DNS Resolver #7 23574 32245 pts/8 00:00:00 DNS Resolver #8 23574 32253 pts/8 00:00:00 DNS Resolver #9 23574 32309 pts/8 00:00:00 DNS Res~ver #10
Reporter | ||
Comment 6•6 years ago
|
||
mozreview-review |
Comment on attachment 8987706 [details] Bug 1470907 - Call AsyncShutdown for threads at the end of nsHostResolver::ThreadFunc https://reviewboard.mozilla.org/r/252946/#review259596 Thank you. I haven't tried the try build, though. Maybe Daniel's STR help?
Attachment #8987706 -
Flags: review?(nfroyd) → review+
Reporter | ||
Comment 7•6 years ago
|
||
Also, I wonder if it's the best policy for the resolver threads to drop themselves if there hasn't been activity for them in the last N minutes. Seems like we may want them to stick around indefinitely to make DNS lookups more responsive (i.e. not potentially have to spin up threads). But that would be a separate bug.
Flags: needinfo?(nfroyd)
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/a5a1fdc41ec4 Call AsyncShutdown for threads at the end of nsHostResolver::ThreadFunc r=froydnj
Comment 9•6 years ago
|
||
> Seems like we may want them to stick around indefinitely
Agreed. It seems like a waste to start/stop them like this. Once they're all started, I think it would be better to let them run.
Comment 10•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a5a1fdc41ec4
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Assignee | ||
Comment 11•6 years ago
|
||
[Tracking Requested - why for this release]: Leaking threads could be a big performance issue.
tracking-firefox62:
--- → ?
Assignee | ||
Comment 12•6 years ago
|
||
Comment on attachment 8987706 [details] Bug 1470907 - Call AsyncShutdown for threads at the end of nsHostResolver::ThreadFunc Approval Request Comment [Feature/Bug causing the regression]: bug 1426019 [User impact if declined]: unnecessary use of resources. Possible OOM. [Is this code covered by automated tests?]: no [Has the fix been verified in Nightly?]: just landed. verified with try build. [Needs manual test from QE? If yes, steps to reproduce]: yes. browse normally for about 15 minutes, opening a large number of tabs. Check the resource monitor on windows, or `ps -eL | grep DNS` on linux and make sure the number of threads is never more than 8. [List of other uplifts needed for the feature/fix]: none. [Is the change risky?]: no [Why is the change risky/not risky?]: it just makes the threads end after ThreadFunc completes, as it did prior to bug 1426019 [String changes made/needed]: none
Attachment #8987706 -
Flags: approval-mozilla-beta?
Comment 13•6 years ago
|
||
Andrei, can your team verify the fix in nightly? Thanks!
Updated•6 years ago
|
Keywords: regression
Comment 14•6 years ago
|
||
I couldn't manage to reproduce the initial issue on a 63.0a1 20180626220124 nor by switching TRR off (network.trr.wait-for-portal). Valentin, I wonder if this is the right preference to switch off (comment 5). In latest Nightly(2018-06-28) available, with the same steps, I was never over 8 DNS threads. (Ubuntu 16.04 x64)
Flags: needinfo?(valentin.gosu)
Comment 15•6 years ago
|
||
I didn't actually mean to imply that the TRR pref would have anything to do with this issue, just that switching off TRR will use the native threadpool fully. Since the problem needs some timeouts in the resolver queues, it might take a little time to trigger and is probably more likely the larger number of names Firefox tries to resolve...
Assignee | ||
Comment 16•6 years ago
|
||
(In reply to Adrian Florinescu [:adrian_sv] from comment #14) > I couldn't manage to reproduce the initial issue on a 63.0a1 20180626220124 > nor by switching TRR off (network.trr.wait-for-portal). Valentin, I wonder > if this is the right preference to switch off (comment 5). > > In latest Nightly(2018-06-28) available, with the same steps, I was never > over 8 DNS threads. > > (Ubuntu 16.04 x64) To disable TRR you should set network.trr.mode;0 (it might already be 0) Indeed, I had trouble myself reproducing the bug. I guess just verifying that it doesn't happen now is OK.
Flags: needinfo?(valentin.gosu)
Comment 17•6 years ago
|
||
I've tried a bit more, using excesive tabs and windows, with both 2018-06-26 build and 2018-6-29 build and the results do look similar to me (network.trr.mode 0). Nightly 53 2018-06-26 ps -eL | grep DNS 11508 13520 pts/4 00:00:00 DNS Resolver #7 11508 13525 pts/4 00:00:00 DNS Res~ver #10 11508 13526 pts/4 00:00:00 DNS Res~ver #11 11508 13789 pts/4 00:00:00 DNS Res~ver #12 11508 13790 pts/4 00:00:00 DNS Res~ver #13 11508 13791 pts/4 00:00:00 DNS Res~ver #14 11508 13792 pts/4 00:00:00 DNS Res~ver #15 11508 13794 pts/4 00:00:00 DNS Res~ver #16 Nightly 53 2018-06-29 ps -eL | grep DNS 8564 8680 pts/4 00:00:00 DNS Resolver #1 8564 8774 pts/4 00:00:00 DNS Resolver #2 8564 8775 pts/4 00:00:00 DNS Resolver #3 8564 8879 pts/4 00:00:00 DNS Resolver #5 8564 8881 pts/4 00:00:00 DNS Resolver #7 8564 9198 pts/4 00:00:00 DNS Resolver #9 8564 9199 pts/4 00:00:00 DNS Res~ver #10 8564 9256 pts/4 00:00:00 DNS Res~ver #11 I couldn't catch the over 8 DNS threads. Guessing that there must be something additional that makes the original issue easier to reproduce, but no idea what. Since Daniel's env. reproduced best the initial problem, Daniel would you mind giving the fixed version a spin, just to make sure we're not missing anything?
Flags: needinfo?(daniel)
Comment 18•6 years ago
|
||
I've not seen more than 8 DNS threads happen with the fixed version - and I've checked a few times and did it basically as I did before when I could see more than eight. It's far from a very tight verification, but it's at least a data point.
Flags: needinfo?(daniel)
Comment 19•6 years ago
|
||
Based on comment 16, comment 17 and comment 18, marking this bug as verified.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Flags: needinfo?(adrian.florinescu)
Comment 20•6 years ago
|
||
Comment on attachment 8987706 [details] Bug 1470907 - Call AsyncShutdown for threads at the end of nsHostResolver::ThreadFunc Fix for a recent regression, verified in nightly, let's uplift for beta 5.
Attachment #8987706 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 21•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/4eb0833bb0b4
Comment 22•6 years ago
|
||
Adding the qe-verify+ flag for the verification on beta 62.
Flags: qe-verify+
Updated•6 years ago
|
status-firefox61:
--- → unaffected
status-firefox-esr52:
--- → unaffected
status-firefox-esr60:
--- → unaffected
Comment 23•6 years ago
|
||
I was also unlucky in reproducing this bug, using an affected Nightly build from 2018-06-25; after browsing for several minutes on different tabs, the number of DNS threads remained under 8, on Ubuntu 16.04 x64. However, I spent quite some time (~30 minutes) on Beta 62.0b6 in order to see the current behavior in a fixed build and the bug seemed fixed; the number of DNS threads stayed at no more that eight when tested on Linux. I've tried to verify this on Windows as well with the Process Explorer, but after a closer look it turned out that symbols were required to see the DNS threads and a bit complicated for manual verification; After chatting with Valentin on slack we concluded that it shouldn't be differences between platforms and the verification on Ubuntu only, will suffice. Based on the above verification on Ubuntu 16.04 x64, I think it's safe to mark this bug verified fixed on Beta.
Flags: qe-verify+
You need to log in
before you can comment on or make changes to this bug.
Description
•