Closed Bug 7531 Opened 25 years ago Closed 25 years ago

NECKO: DNS lookup does not timeout and will hang app

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: rubydoo123, Assigned: gagan)

Details

(Whiteboard: 9/24, asked reporter to verify)

This bug has been reported by CERN and is most noticeable in the 4.x tree, but
needs to be addressed in seamonkey. The CERN contact is: Arnaud.Taddei@cern.ch
He would like to be informed of any issues, he is the technical lead and he will
be able to provide any additional information. He is also willing to test
seamonkey in their environment if that is necessary.

ISSUE: MAIL 1
1. open mail
2. select Get Message
3. host lookup dialog is displayed, lookup continues to recycle without
interrupt, select cancel and immediately retry and it works

ISSUE: MAIL 2
1. open mail
2. select New Message
3. select Send
4. host lookup dialog is displayed, lookup continues to recycle without
interrupt, select New Message, select send, the second meesage gets sent
immediately and then the first message gets sent.

ISSUE: BROWSER
1. open browser
2. enter URI
3. host lookup recycles without interrupt

The problems are sporadic, but consistent. They stated that users will
experience this problem 4-5 in a day. It also seems to happen more often on
lower end machines (P90, etc.). It also happens with hardware that is on direct
ethernet lines verses folks going through a modem. It is also more consistent on
win32 machines, but folks using Macs and Unix have had it happen too.

It seems as though the DNS lookup does not have a break in it somewhere.
Target to M8.
I received a note on Friday, 6.18.99 fromArnaud with additional information for
this bug:

Phil Peterson suggested to check the Winsock version we have on our
Win32 machines. Here is the current state:

These are the winsock versions supported on the various Windows
systems at CERN which is called the NICE system.

W95     1.1 (an add-on and other updates to implement 2.0 exists, but
             they're not implemented on NICE)
W98     2.0
WNT 4   2.0 (substantial updates and bug fixes in SP4)
W2000   2.0
WCE     1.1 (only a subset + with some 2.0 functions, no 16-bit
             support)

We will investigate with people who have WNT 4 and Winsock 2.0. Actually
not all of them have it so we need to check. My feeling is that it fails
even for them but I will confirm.

3b) David Bienvenu suggested the problem could come from the multiple
IMAP connections. We sincerely believe this is not the case.

3c) Ed Hannagan thinks this could be due to a race condition in the URL
parser. I am mixed feeling, in one hand this could well be due to a race
condition indeed especially when you look at the 'ISSUE MAIL 2' but in
another hand it happens very often with a simple Get Mesg and I don't
see with what the program is 'racing' so to speak. However I don't know
the code (maybe this is fortunate for my mental health! sic!) so this is
a purely naive judgement.

3d) DP says this could be an exception which is not raised around the
select call

3e) John Myers says this could boil down to a difference in using the
select call between different platforms.

3f) I have asked a user to s-y-t-e-m-a-t-i-c-a-l-l-y record when he sees
problems and here is the output:

04/06/99         8:58   looking up mail server for ever
04/06/99        16:39   switched folders, found number of messages in
                        new folder, however never got the headers. When
                        clicked on stop, headers appeared!
07/06/99         9:43   looking up mail server for ever
07/06/99        14:23           "
08/06/99        11:25           "
08/06/99        12:05           "
08/06/99        17:12           "
09/06/99         8:52           "
10/06/99        11:56           "
11/06/99         8:21           "
11/06/99        17:33           "
14/06/99        10:30           "
14/06/99        16:14           "
15/06/99        11:53           "
15/06/99        14:47           "
16/06/99         9:16           "
16/06/99        15:06           "
16/06/99        15:53           "
17/06/99        13:30           "
18/06/99         8:17           "
18/06/99        13:56           "

I believe that it systematically fails when he arrives in the morning
(don't be mislead by the entry of monday 14/06 he surely killed his
netscape during the week end and restarted it) or when he is back from
meetings but not only.

3g) The second entry of 04/06/99 lead us to think that it could be
another symptom for the same problem so maybe we have a 'ISSUE: MAIL 3'
        1. open a folder
        2. find number of messages new in this folder but cannot see the
headers. Never completes.
        3. When clicking on stop, the headers appeared!
It oculd be due that it fails reaching the IMAP server too although in
this part of the interface no progress bar is showing anything so we
cannot know.
I'm moving this to target M9, Necko will be enabled somewhere during late M8 or
early M9.  We will need to get on this and it cannot be postponed past the M9
milestone.
Changing all Networking Library/Browser bugs to Networking-Core component for
Browser.

Occasionally, Bugzilla will burp and cause Verified bugs to reopen when I do
this in a bulk change.  If this happens, I will fix. ;-)
Summary: DNS lookup does not timeout and will hang app → NECKO: DNS lookup does not timeout and will hang app
Pl. verify with Necko.
the mail client is not in a state that this can be tested. Arnaud, if you would
like to test#3 on the latest Seamonkey builds please feel free. They are located
at http://www.mozilla.org/binaries.html  You will want the latest nightly build.
Target Milestone: M9 → M12
here is an update from Arnaud -- since they can only test on the 4.x tree, his
frame of reference is from there. In any event they have narrowed down the
problem:

We believe the problem is due to a wrong check in or after the DNS CACHE
expiration. I received NO update so far from Mountain View.

We tried this since a few weeks after we made a clear analysis that is
showing that the problem is function in the first order from a timeout
variable. Our interpretation is that a periodic Check Mail BELOW the time
set by the dns cache parameter in the factory default (see about:config in
your communicator) keeps the system dns cache correct and makes
Communicator not entering the code which corresponds to the end of the DNS
cache expiration which is buggy in what way or another.

Note that this doesn't solve entirely the problem because this has other
implications on the Compress of the folder which fails more often when the
check mail option is enabled and because the symptom number 2 (when the
sent mail cannot be filed) didn't disappeared completely (I got one such
problem) so I believe in second order this host lookup problem could come
as well from racing conditions which occur in the legacy part of
Communicator 4.N
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Seems like a seepthru of a 4.* bug. Necko has (will have) async DNS. and so
things would be better. Marking this as fixed for now. Pl. open a new one if
there are DNS lookup problems with Necko. Thanks!
Whiteboard: 9/24, asked reporter to verify
arnaud please verify that this is fixed with seamonkey
Status: RESOLVED → VERIFIED
didn't hear from reporter, will mark verified.
Bulk move of all Networking-Core (to be deleted component) bugs to new
Networking component.
You need to log in before you can comment on or make changes to this bug.