Closed Bug 7428 Opened 25 years ago Closed 24 years ago

Conn: no network, apprunner hangs indefinitely (DNS)

Categories

(Core :: Networking, defect, P2)

x86
Linux
defect

Tracking

()

VERIFIED INVALID

People

(Reporter: mcafee, Assigned: gordon)

References

()

Details

(Whiteboard: [nsbeta2-][nsbeta3-])

Linux, apprunner.

Pull your network cable out, apprunner hangs and never
finishes rendering the window.  Put the network cable back in,
and things start working again.
Component: Apprunner → Networking Library
QA Contact: leger → paulmac
Updating QA Contact.  paulmac, if this does not gets fixed for M7, please make
sure it gets Release Noted.  Thanks!
OS: Linux → All
Hardware: PC → All
Summary: Unix: no network, apprunner hangs indefinitely → no network, apprunner hangs indefinitely
this is an xp issue, occurs on mac/win also.
Assignee: don → dp
Target Milestone: M8
Marking M8 to match the necko landing, giving to dp.
Assignee: dp → warren
Assignee: warren → rpotts
Hmmm... how are we going to deal with this? I think I'll let Rick look at it.
Depends on: 7232
Target Milestone: M8 → M9
Priority: P3 → P2
This is going to be important for non-connected
scenarios, laptops being one of them.  bumping up to p2,
hope that's Ok.  Adding dp, chofmann.
No longer depends on: 7232
Blocks: 7232
Changing all Networking Library/Browser bugs to Networking-Core component for
Browser.

Occasionally, Bugzilla will burp and cause Verified bugs to reopen when I do
this in a bulk change.  If this happens, I will fix. ;-)
I'm sure that this is because the SocketTransport does not implement socket
timeouts yet!  Currently, the socket transport thread sits in PR_Poll(...) with
NO_TIMEOUT.

It needs sto keep track of the amount of time a transport has been active
without receiving any data - and timeout if necessary.

-- rick
Status: NEW → ASSIGNED
Target Milestone: M9 → M10
I'm moving this out to M10 :-(
-- rick
Target Milestone: M10 → M11
if this is ready we can take it in the next few days.
if not -> m11. and release note for the current milestone.
Blocks: 15200
Rick was working on a timeout in the socket transport to handle this. Not sure
if it went in yet.
*** Bug 11030 has been marked as a duplicate of this bug. ***
I have the code sitting in my tree...  can I check this in for M11?
code for this would be a terrible thing to waste. ;-)

can you think of what kind of tests other than starting with no network might
be run to exercise the new code?
Right now the timeout only applies to establishing a connection....  This seems
the closest to Navigator behavior.

I've been testing it by starting to load a page and then pulling out the
network :-)
We're gonna need this for offline, laptops & stuff.
I vote for m11.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
I've checked in code to timeout a socket transport if it is unable to establish
a connection with the server in 35 seconds...

Right now, there is no timeout once the connection has been established.  This
seems to be consistant with Communicator behavior.
mcafee, this looks ok on windows, what do you see on linux?
Status: RESOLVED → REOPENED
Component: Networking-Core → ActiveX Wrapper
unplug network, visit a local URL, browser hangs.
plug in network, local file gets rendered.
Reopening.
oh yeah, linux.
m12, says squid.
changing to /etc/passwd test case.
Resolution: FIXED → ---
Clearing resolution due to reopen.
Component: ActiveX Wrapper → Networking-Core
Changing back to Networking-Core from ActiveX (looks like mcafee inadvertently
changed that).
that'll teach him to use Seamonkey for his bug work, sounds like there is a
problem with combo boxes again probably.
Target Milestone: M11 → M12
actually changing to m12
Target Milestone: M12 → M11
WORKSFORME.
I tried to generate this bug, and I can't.
Some setup info:
Mozilla Build: 1999110608
Redhat 6.1
kernel-source-2.2.12-20
GNOME
DHCP (Road Runner Cable modem connection).

Tried:
Launch run-mozilla.sh apprunner
Pull Internet Cable.
type in: "file:/etc/passwd"

Results: page shows.

I even tried opening netcfg and deactivating eth0 after pulling cable.
Also tried /etc/rc.d/init.d/network stop
after pulling network cable. All of worked for me.
On a side issue: GNOME won't open any X objects when my cable is pulled - until
I re-insert it. So it's key that I launch mozilla before I pull the cable.
Apparently GNOME doesn't like it when it thinks it's domain is X and suddenly it
doesn't resolve.
maybe this is specific to some kernel or distribution, or even network type?
Target Milestone: M11 → M12
Thanks for the investigation. Looks like you accidentally changing TFV back to
M11, changing back to M12, so no one gets scared.
I've reopened bug 13960 on the combobox issue.
This issue is fully hashed out in 17519, so I closed 13960.  Please see 17519.
Hey, wait a minute...  I think that this bug has morphed.

Now it sounds like accessing *local files* requires an active network
connection.
Is this a dup of bug #15200?
That's been true for a while (I thought it was one of the issues covered in this
bug); at least, older builds I've checked on my laptop had that problem, and
it's a definite problem for me.  I use 4.x on a laptop to access local files
quite often, sometimes when miles from the nearest phone line or network
connection.  I'd love to be able to use mozilla instead of 4.x.

15200 does sound like the same thing.
Could this be the sidebar loading content and hanging? Local file access
shouldn't hit the network at all.
I run with the sidebar closed, or at least I try to.  Not that that necessarily
means the sidebar isn't loading something anyway ...
Target Milestone: M12 → M13
Moving milestones...
Bulk move of all Networking-Core (to be deleted component) bugs to new
Networking component.
Assignee: rpotts → dp
Status: REOPENED → NEW
Depends on: 10733
Unix DNS problem => dp
Status: NEW → ASSIGNED
Target Milestone: M13 → M14
yours...
Assignee: dp → scc
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
nominating for b1
Keywords: beta1
SCC: Please review this bug, and provide a summary for the current disposition 
and recommendation for beta1 status. (For example: is this an issue for dialup 
users?)
Updating QA Contact
QA Contact: paulmac → tever
Tever, what's the latest status?
Just looking into this
here is what I see for it's worth

Linux:  When you disconnect during a load mozilla basically hangs as originally 
described ... when you reconnect, the page will finish loading.  I cannot do a 
local file listing without a connection.  Also, I cannot seem to trigger a site 
not found or any lost connection error boxes.

NT:  I do get time out messages on NT after about 35 seconds of disconnecting 
during a page load.  Local file listings show a blank directory whether there is 
a connection or not ... I beleive this is (or was) a known bug not related to 
the missing connection.

Mac: I get timeout messages after disconnecting during a page load.  Also,
I see the same blank file listing as on NT regardless of connection.

I am not set up to do a dial-up test.  Choffman can you check this out?
win95 build from 2/9/00
dial-up connection under win95 seems to behave pretty reasonable.
-start browser with no connection and it will spin for about 144 seconds
 trying to do a dns lookup and it uses 100% of the cpu while doing this,
 but you can stop the page load and the browser window is still responsive.
 I was also able to load the res/sample/test*.html files with no problem.
-I also started up and dail up connection and the browser seems to pick
 up and run with it ok.  able to go to browser buster and start loading 
 pages.
-If I kill dialup connection out from under browser buster the page
 stop loading but no hangs or crashes.  From there I could go
 and load the local test cases off the disk with file | open.
 If I start the connection back up I did need to reload the browser 
 buster page; it looked like the 30sec. refresh cycle had been broken
 with the loss of the net connection.

All the behavior seems pretty reasoable under win95. 
but we would take a fix for linux
Whiteboard: [PDT-]
Mine
Assignee: scc → dp
Status: ASSIGNED → NEW
*** Bug 27517 has been marked as a duplicate of this bug. ***
From bug 21517: "OpenTransport version: 2.0.3 if i'm NOT connected to the
internet (by isdn-"modem") while i wanted to start the mozilla m-13
(2000012601), it crashes with error number 3."
Adding crash to keywords.
Keywords: crash
I meant to mention that the reporter received the crash on Mac System 8.6
tever, dial up connections for testing available in DialUp lab.  See paw if you 
need to use.
Status: NEW → ASSIGNED
unix dns back to warren
Assignee: dp → warren
Status: ASSIGNED → NEW
What a shaggy dog story of a bug report! Here's my take on this:

- We currently have no way of detecting whether the network is available or not 
(although it is reported that Travis might know how to do this for Windows). 
Attempts so far have resulted in bringing up the dialer (which is bad if you 
just want to detect). (cc'ing travis, filing separate bug 27975)

- I've added the support in necko for a SetOffline call (bug 21835). This would 
allow the user to specify that they want to work offline, allowing the browser 
to be usable even though the network is inaccessible. However, this is not 
hooked up to the UI, nor can I find a bug that says we should do that. (cc'ing 
selmer, filing bug 27976)

- Async dns is implemented for Windows and Mac, so this failure shouldn't hang 
the UI or local file access. But the network will be inactive. Async dns for 
unix is bug 10733. Until it is fixed, any attempt to use the network without 
calling SetOffline will indefinitely hang the network (unless PR_GetHostByName 
times out, but I don't think so).

- dp suggested moving PR_GetHostByName to its own thread (for unix) so that 
even if it does hang forever, it doesn't stop other network transfers. But 
although that keeps a slow host resolution from stopping the current transfers, 
any subsequent transfers will still get delayed because all dns resolution will 
be serialized by that thread. I'm not sure this is worth doing. We need async 
dns (bug 10733).

- This is not a crash. Removing crash keyword.

- Changing platform/OS to PC/Linux since this is the only platform affected now 
(as far as I can tell).

This bug can either be fixed by fixing 10733, 27975 or 27976.
Keywords: crash
OS: All → Linux
Hardware: All → PC
Summary: no network, apprunner hangs indefinitely → no network, apprunner hangs indefinitely (DNS)
Adding dependencies.
Depends on: 27975, 27976
I determined that PR_GetHostByName does indeed time out on Unix after a couple 
of minutes.
I had this same problem with our spacecraft dynamic simulator at NASA/GSFC
which runs in Linux.  At the time I thought, "I wonder what Netscape 4.x does
this?" :)
The annoying thing is there's suppose to be this environmental variable
RES_OPTIONS for the resolver where you can set the timeout, but I never
could get it to effect the timeout of the resolver.  That seems like
the "correct" solution if it worked. see Linux manpage resolver(5) (actually
a BSD man page)

I fixed it my problem by making DNS lookup a separate thread.  It works good.
It will timeout after a couple of minutes, but it doesn't hold anything up
while waiting.  You still have to have your own timeout about when your going
to give up that the DNS thread is going to response.
Target Milestone: M14 → M15
Target Milestone: M15 → M16
Moving to M17 which is now considered part of beta2.
Target Milestone: M16 → M17
Assigning to Gordon. Dependent on unix async dns. 
Assignee: warren → gordon
I'm estimating 1 day to verify this bug, on top of getting async DNS on Unix 
working.
Status: NEW → ASSIGNED
Keywords: beta1beta2
Whiteboard: [PDT-] → 1d
Keywords: nsbeta2
Keywords: beta2
Whiteboard: 1d → [nsbeta2+][5/16]1d
Putting on [nsbeta2+] radar.  But MUST complete work by 05/16 
Incidentally, I saw this just yesterday by unplugging my linux box to test
another bug.  I can do that again if you want something tested. :-)
Target Milestone: M17 → M16
RE. needing a network even to browse local files, I've noticed this with both
Netscape and Mozilla. Local files should not wait for a network timeout, nor
should connections to /dev/lo* (127.0.0.1)
Putting on [nsbeta2-] radar.   
Whiteboard: [nsbeta2+][5/16]1d → [nsbeta2-]
Let me rephrase that: at least under unix, if the address is in /etc/hosts, we
should not need a DNS lookup.
bug 27976 is fixed- ProfileManager UI has offline widget- but seamonkey behavior 
while in offline mode differs from 4.x (where warnings are given when trying use 
network) 
Sidebar does not load for Linux/Win, opening page is blank for all- no message 
as to why not there.
But what I'm saying is, even in online mode, if I have a local webserver
running, I should be able to pull out the network cable and still connect to
127.0.0.1 without any timeout, because 127.0.0.1 is defined in /etc/hosts, and
it references only local files. Gordon, do you agree ?

M16 has been out for a while now, these bugs target milestones need to be 
updated.
is this still happening? tom can you verify? If it is then we have to consider 
for nsbeta3
Keywords: qawanted
Target Milestone: M16 → M18
Tom, can you verify if this still occurs?  Thanks.
Assignee: gordon → tever
Status: ASSIGNED → NEW
my network was down last night and mozilla still worked
locally on linux.
*** Bug 47999 has been marked as a duplicate of this bug. ***
Gordon / Gagan,  Yes, this is still happening on Linux.  Browser doesn't launch 
until network cable is connected.  So you cannot get to offline mode.  Checked 
8/24 build on redhat 6.0.
Assignee: tever → gagan
Keywords: nsbeta3
Same here -- doesn't launch on linux when I unplug from the network.  This is
blocking my testing of a fix for bug 36082.
Blocks: 36082
approving for beta3.
Assignee: gagan → gordon
Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3+]
this seems to work just fine for me. mozilla comes up fine and sits there
forever waiting for connection but otherwise its usable. I can browse local
files in another window. Based on this info I am going to have to give this a
minus for beta3. 
Whiteboard: [nsbeta2-][nsbeta3+] → [nsbeta2-][nsbeta3-]
Gagan, were you running a completely local copy of mozilla?  Would there be a 
problem if some support files were on a mounted drive and couldn't be loaded on 
launch? 
That's what was happening to me, I think.  My homedir is NFS mounted, and I
couldn't separate that from the mozilla run, so mozilla timed out trying to load
my profile.  But I tried it on my home machine (with no network connection
active) this morning, and indeed I was able to run both the browser and the
editor pointed at a local file.
so then, would this be the expected behavior of mozilla if you are trying to 
access an NFS mounted dir and the connection was dropped?  I am thinking this 
bug is fixed right now and that no software is going to work that is set up this 
way.  
Yes, that's probably true -- that there's no way we'll get around the NFS hangs.
ok thanks, marking this invalid based on the discussion.  

Status: NEW → RESOLVED
Closed: 25 years ago24 years ago
Resolution: --- → INVALID
verif.
Status: RESOLVED → VERIFIED
Summary: no network, apprunner hangs indefinitely (DNS) → Conn: no network, apprunner hangs indefinitely (DNS)
Keywords: qawanted
No longer depends on: 27975
You need to log in before you can comment on or make changes to this bug.