Closed Bug 192271 Opened 22 years ago Closed 21 years ago

DNS: hangs during Quit

Categories

(Core :: Networking, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla1.6alpha

People

(Reporter: bugmail, Assigned: darin.moz)

References

()

Details

(Keywords: hang)

Attachments

(2 files)

If Mozilla is waiting for a DNS response and the user Quits, Mozilla will hang.

Steps to reproduce:
1. Try to access <http://www.imagemagick.org/> using the DNS at <206.13.28.12>
2. Observe Mozilla trying to resolve that URL
3. Quit Mozilla.

Expected results:
Mozilla should abort the DNS session and quit.

Actual results:
Mozilla hangs.
(See also bug 192272 for a crash that occurred in this state.)
This bug is not Mac specific:

http://bugzilla.mozilla.org/show_bug.cgi?id=193827

In happens because DNS resolution in tcp mode seems 
not working right
I've been seeing both this and bug 193827 (inability to resolve new sites) on
Linux.  I'll attach the stacks from Linux shutdown.
Flags: blocking1.4b?
OS: MacOS X → All
Hardware: Macintosh → All
the situation is that we have called gethostbyname, which may block until the OS
either gets the DNS result or determines that it cannot get the DNS result. 
because of network problems or just slow DNS servers, gethostbyname can block
for a relatively long time.

the solution we've been planning is to spawn multiple threads (up to some limit)
for calling gethostbyname.  this will help keep the browser usable while an
existing gethostbyname is blocked.  as for the shutdown problem, we might want
to look at making the threads unjoinable... or find some way to cancel the
gethostbyname call.

there's an uber-bug for this problem somewhere...
Hmm. I wonder if this (and bug 193827) was the issue I mailed darin about last week.

On unix, we can use pthread_cancel, but I think we need ntpl to use it. man
pthread_cancel says:

       POSIX specifies that a number of system calls  (basically,  all  system
       calls  that  may  block,  such as read(2), write(2), wait(2), etc.) and
       library functions that may call these system calls  (e.g.   fprintf(3))
       are  cancellation  points.   LinuxThreads  is not yet integrated enough
       with the C library to implement this, and thus none of  the  C  library
       functions is a cancellation point.

and SUS says:

If a thread has cancelability enabled and a cancellation request is made with
the thread as a target while the thread is suspended at a cancellation point,
the thread shall be awakened and the cancellation request shall be acted upon

(gethostbyname is a cancellation point)

Does RH9 still have that text in the manpage? I believe that the new threading
stuff fixed that. Can someone test?

How do we quit, anyway? It looks the dns thrad calls nsThread::Join, which calls
PR_JoinThread, but that will block until the thread exists or is cancelled.
Don't we have to cancel the thread instead?

wtc, is there an NSPRd pthread_cancel we can call (which would then presumably
work with NPTL). LXR doesn't find a call to pthread_cancel, so I'm guessing not.
I don't think that this is solvable via linuxthreads, since once we're blocked,
we're stuck. We could set the cancellation state to PTHREAD_CANCEL_ASYNCRONOUS,
but we'd have to test that for linuxthreads.

An explicit pthread_cancel may still work, I guess, although that man page text
doesn't seem encouraging.

Does this happen on windows, btw? (Or another non-unix-based os) How do
bsd-style os's handle cancellation points?
I think we can avoid much of the thread complications by redesigning the DNS
service to use multiple unjoinable threads.  I don't have a complete design in
my head at the moment, but after playing with similar issues in the disk cache,
I can begin to see the light.

Darin, do you want me to take this?  I don't mind.
Can we mark this as a dupe?

Is there any additional information needed here? I think it sounds like we have
enough technical firepower to now agree this is a real problem, so I want to go
hunting for all the unconfirmed dupes that have piled up over time.

I am also likely to be increasingly behind on bugmail these days, so
consolidation of bugs is a high priority for me. 
Component: Networking: HTTP → Networking
QA Contact: httpqa → benc
Flags: blocking1.4b?
Flags: blocking1.4b-
Flags: blocking1.4?
Perhaps we could kill (9) the DNS thread if it still exists when we're finally
ready to exit (perhaps after NSPR is otherwise fully shut down)?
dependent on DNS servers, not a lot users complaining and any fix is likely to
be a bit scary. At this point, we're not going to block on this. 
Flags: blocking1.4? → blocking1.4-
Depends on: 205726
Blocks: 193827
>The situation is that we have called gethostbyname, which may block until the OS
>either gets the DNS result or determines that it cannot get the DNS result. 

No, this is not the case in this bug.
The mozilla does not exit after any timeout.
It still running in background after 24 hours.

The problem is that mozilla's DNS gets into some corrupt state.
Vladislav: thanks for the info, but what version of mozilla are you testing?
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.5alpha
This problem exists with all mozill versions I tried

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3b) Gecko/20030210
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

The problem exists for years, sinse netscape.

Note that to get mozilla to this state 
you need to set DNS record such that the request is made in TCP mode.
Then do File->Quit 
and then 
ps axuww|grep mozilla

See bug 
http://bugzilla.mozilla.org/show_bug.cgi?id=193827
which also imcludes packet traces that indicate DNS activity.

I've had this happen too, on RH9. Strace shows waiting in futex_wait. the
browser will just stop working, and then quiting doesn't actually quit, and
restarts hang on the x-remote ping() from the shell script.
well, if the DNS thread ever were to deadlock, then on shutdown or restart we
would indeed hang the entire browser when the UI thread joins with the DNS
thread.  so, sounds like we have a real race of some sort to unravel here.  the
DNS rewrite (bug 205726) should help since i think we can greatly simplify the
thread synchronization.
Summary: Mozilla hangs during Quit while waiting for cranky DNS → DNS: hangs during Quit
Target Milestone: mozilla1.5alpha → mozilla1.5beta
Target Milestone: mozilla1.5beta → mozilla1.6alpha
ok, now that the patch for bug 205726 has landed, DNS pinning is now a think of
the past.  this bug is fixed (note: on the trunk only).
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
V.

No reports of this for some time.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: