Closed Bug 463724 Opened 11 years ago Closed Last year

DNS nsHostResolver class shutdown does not join worker threads

Categories

(Core :: Networking: DNS, defect, P3)

x86
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1471280

People

(Reporter: mcmanus, Unassigned)

References

Details

(Whiteboard: [lame-network][MemShrink:P3][necko-backlog])

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.3) Gecko/2008101315 Ubuntu/8.10 (intrepid) Firefox/3.0.3
Build Identifier: 

Follow-on bug from 453403

resolver::shutdown() is called which wakes up anybody in the thread pool after
setting the shutdown flag and then, after cleaning up a few other things,
exits. Shutdown() is synchronous, but the other thread exits are not, and they
hold references to the resolver.

That leaves a race between the firefox shutdown/leak code and the pool threads
getting scheduled and running to completion. The pool thread could take a fair
chunk of time if it is blocked on getaddrinfo() which is why I can reproduce
this easily if I add a network delay getting to my name server.

[..]

The normal/correct behavior is that all XPCOM threads must be joined before or
during the xpcom-shutdown-threads global observer notification.

At some point we specifically decided to absolve the DNS lookup thread(s) from
this responsibility because PR_GetAddrInfoByName was uninterruptably blocking.
We cannot delay shutdown for any significant amount of time. This meant, IIRC,
that the DNS threads couldn't participate in XPCOM.

[..]

The normal/correct behavior is that all XPCOM threads must be joined before or
during the xpcom-shutdown-threads global observer notification.

At some point we specifically decided to absolve the DNS lookup thread(s) from
this responsibility because PR_GetAddrInfoByName was uninterruptably blocking.
We cannot delay shutdown for any significant amount of time. This meant, IIRC,
that the DNS threads couldn't participate in XPCOM.

[..]

We could perhaps use the magic shutdown flag differently to abandon, so that we
can simply abandon the worker thread by having the main thread do cleanup for
the abandoned worker thread. I can try to do this, if you file a separate bug.
The big problem is that you have to avoid referencing mLock after the thread
has been abandoned, which sounds like it would introduce a race condition.
Anyone know if it's ok to intentionally leak a PRLock?



Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Status: UNCONFIRMED → NEW
Ever confirmed: true
This patch intentionally leaks a PRLock (nsHostResolver::gLock) but tracks and abandons the resolver threads. Did a quick-startup test; not sure what the best way is to get additional testing.
Assignee: nobody → benjamin
Status: NEW → ASSIGNED
Attachment #347350 - Flags: review?(cbiesinger)
Comment on attachment 347350 [details] [diff] [review]
Abandon DNS resolution threads at shutdown, leak gLock, rev. 1

cleaning up my review queue, please re-request if this is still desired
Attachment #347350 - Flags: review?(cbiesinger)
Duplicate of this bug: 706517
when the mozilla testing lan is screwed up this bug results in leaks being reported in mochitests - they timeout and shutdown beacuse of the screwed up lan with the requests still outstanding and it gets reported as a leak.

so we'll add [lame-network] here
Whiteboard: [lame-network]
Whiteboard: [lame-network] → [lame-network][MemShrink]
Whiteboard: [lame-network][MemShrink] → [lame-network][MemShrink:P3]
Assignee: benjamin → nobody
Whiteboard: [lame-network][MemShrink:P3] → [lame-network][MemShrink:P3][necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
No assignee, updating the status.
Status: ASSIGNED → NEW
No assignee, updating the status.
No assignee, updating the status.
No assignee, updating the status.
Valentin, do you think we can close this bug since the entire thread-pool logic has since been replaced? Or can this bug still be relevant?
Component: Networking → Networking: DNS
Flags: needinfo?(valentin.gosu)
I think we can dupe it to bug 1471280 which changed the thread handling to use nsThreadPool.
As noted in bug 1498782, resolver thread shutdown can still sometimes be an issue.
Status: NEW → RESOLVED
Closed: Last year
Flags: needinfo?(valentin.gosu)
Resolution: --- → DUPLICATE
Duplicate of bug: 1471280
You need to log in before you can comment on or make changes to this bug.