Closed Bug 463724 Opened 16 years ago Closed 6 years ago

DNS nsHostResolver class shutdown does not join worker threads

Categories

(Core :: Networking: DNS, defect, P3)

x86
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1471280

People

(Reporter: mcmanus, Unassigned)

References

Details

(Whiteboard: [lame-network][MemShrink:P3][necko-backlog])

Attachments

(1 file)

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.3) Gecko/2008101315 Ubuntu/8.10 (intrepid) Firefox/3.0.3 Build Identifier: Follow-on bug from 453403 resolver::shutdown() is called which wakes up anybody in the thread pool after setting the shutdown flag and then, after cleaning up a few other things, exits. Shutdown() is synchronous, but the other thread exits are not, and they hold references to the resolver. That leaves a race between the firefox shutdown/leak code and the pool threads getting scheduled and running to completion. The pool thread could take a fair chunk of time if it is blocked on getaddrinfo() which is why I can reproduce this easily if I add a network delay getting to my name server. [..] The normal/correct behavior is that all XPCOM threads must be joined before or during the xpcom-shutdown-threads global observer notification. At some point we specifically decided to absolve the DNS lookup thread(s) from this responsibility because PR_GetAddrInfoByName was uninterruptably blocking. We cannot delay shutdown for any significant amount of time. This meant, IIRC, that the DNS threads couldn't participate in XPCOM. [..] The normal/correct behavior is that all XPCOM threads must be joined before or during the xpcom-shutdown-threads global observer notification. At some point we specifically decided to absolve the DNS lookup thread(s) from this responsibility because PR_GetAddrInfoByName was uninterruptably blocking. We cannot delay shutdown for any significant amount of time. This meant, IIRC, that the DNS threads couldn't participate in XPCOM. [..] We could perhaps use the magic shutdown flag differently to abandon, so that we can simply abandon the worker thread by having the main thread do cleanup for the abandoned worker thread. I can try to do this, if you file a separate bug. The big problem is that you have to avoid referencing mLock after the thread has been abandoned, which sounds like it would introduce a race condition. Anyone know if it's ok to intentionally leak a PRLock? Reproducible: Always Steps to Reproduce: 1. 2. 3.
Status: UNCONFIRMED → NEW
Ever confirmed: true
This patch intentionally leaks a PRLock (nsHostResolver::gLock) but tracks and abandons the resolver threads. Did a quick-startup test; not sure what the best way is to get additional testing.
Assignee: nobody → benjamin
Status: NEW → ASSIGNED
Attachment #347350 - Flags: review?(cbiesinger)
Blocks: 467648
Comment on attachment 347350 [details] [diff] [review] Abandon DNS resolution threads at shutdown, leak gLock, rev. 1 cleaning up my review queue, please re-request if this is still desired
Attachment #347350 - Flags: review?(cbiesinger)
when the mozilla testing lan is screwed up this bug results in leaks being reported in mochitests - they timeout and shutdown beacuse of the screwed up lan with the requests still outstanding and it gets reported as a leak. so we'll add [lame-network] here
Whiteboard: [lame-network]
Whiteboard: [lame-network] → [lame-network][MemShrink]
Whiteboard: [lame-network][MemShrink] → [lame-network][MemShrink:P3]
Assignee: benjamin → nobody
Whiteboard: [lame-network][MemShrink:P3] → [lame-network][MemShrink:P3][necko-backlog]
Priority: -- → P1
Priority: P1 → P3
No assignee, updating the status.
Status: ASSIGNED → NEW
No assignee, updating the status.
No assignee, updating the status.
No assignee, updating the status.
Valentin, do you think we can close this bug since the entire thread-pool logic has since been replaced? Or can this bug still be relevant?
Component: Networking → Networking: DNS
Flags: needinfo?(valentin.gosu)
I think we can dupe it to bug 1471280 which changed the thread handling to use nsThreadPool. As noted in bug 1498782, resolver thread shutdown can still sometimes be an issue.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(valentin.gosu)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: