Closed
Bug 260832
Opened 20 years ago
Closed 20 years ago
DNS name resolution fails under WinXP-SP2 when max persistent connections bumped up to 100
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
INVALID
Future
People
(Reporter: lisanke, Assigned: darin.moz)
Details
(Keywords: helpwanted)
Attachments
(2 files, 1 obsolete file)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/20040910 MultiZilla/1.6.4.0b Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/20040910 MultiZilla/1.6.4.0b First I noted many 'DNS problem' bugs in bugzilla; it is not clear which are open, at my level, with related symptoms. Hopefully, an expert (developer) will dupe this to a known problem (with a workaround or fix). Mozilla on XP has DNS resolution trouble with particular specific 'intranet' URLs (most all other URL resolve). Mozilla on Win2K (same DNS) has problems resolving all (most) any URL. NSLOOKUP of site-domain works. Use of the returned dotted IP address in-place of the site-domain allows Mozilla to browse normally (except page referenced relative links fail to resolve). Reproducible: Sometimes Steps to Reproduce: 1.(XP) Select w3.ibm.com (specific intranet URL). 2. Wait for 'site' not found alert. 1. (W2K) Select www.yahoo.com (or any other). 2. Wait for 'site' not found alert. Actual Results: Alert: www.yahoo.com could not be found. Check the name and try again. Expected Results: resolve browser to yahoo.
Assignee | ||
Comment 1•20 years ago
|
||
Hi, can you please generate a Mozilla log file for us? Here's what you'd need to do: Open a DOS prompt and set the following environment variables: C:\> set NSPR_LOG_MODULES=nsHostResolver:5 C:\> set NSPR_LOG_FILE=c:\log.txt Now, in the same DOS prompt, 'cd' into the directory containing Mozilla, and then run it from the DOS prompt. Something like this: C:\> cd \path\containing\mozilla C:\path\containing\mozilla\> .\mozilla.exe Now, reproduce the problem, and close Mozilla. Then upload c:\log.txt to this bug report using the "Create a New Attachment" link. Thanks!!
Reporter | ||
Comment 2•20 years ago
|
||
Darin, On my W2K machine I found I had 1.6; I tried the logging you suggest; no log after running with your setup. I installed 1.7.3 and repeated; no log after setup. I will reproduce failure of C:\> set NSPR_LOG_MODULES=nsHostResolver:5 C:\> set NSPR_LOG_FILE=c:\log.txt to produce a logfile, when running mozilla from same DOS session, on the XP machine next. I don't know what, but I hope it means something, that no logging was produced. I'll comment again after restarting browser (after test on (primary) XP machine). Regards, Mike
Reporter | ||
Comment 3•20 years ago
|
||
Darin, OK, back from running experiment on XP Mozilla; again no log.txt was produced. Something I should have said in the last comment; on W2K a 0 bytes c:\log.txt was produced; on XP nothing. If this is not expected; maybe their is a trival error in the SET syntax (version issue)? Just a guess. Regards, Mike
Reporter | ||
Comment 4•20 years ago
|
||
Darin, I read your note on Logging in Mozilla's online debug documents, which indicates specifically "if you have a nightly trunk build of Mozilla, then you can set some environment variables before running Mozilla to capture a log of HTTP activity." This implies a 'nightly trunk build' of Mozilla. Is a nightly build required to get debug logging? Should I install one of these (on W2K backup), and which? Regards, Mike
Comment 5•20 years ago
|
||
All recent release builds should do logging but it`s possible that "nsHostResolver:5" is only availible in a debug build..
I'm having the same problem as described here. The recommended log file also produced nothing for me. If I set it to "all:5" I get a big log, of course, but I'm not sure that's useful. Should I post it? The easiest way I can repro this is to go to Google News and Ctrl-Click a pile of links (say 10-12) to generate new tabs. This is with the latest nightly (20040917), so provisos re: nightly builds shouldn't apply in this case. In case it's relevant, I have several extensions installed, some of which might be relevant, including: LiveHTTPHeaders, TabbrowserExtensions, and AdBlock. I used to think that this or a bug similar to this was related to pipelining, but I turned that back off a long time ago (and have left it disabled), and while it seemed to help at the time, it doesn't seem to matter for this issue.
Reporter | ||
Comment 7•20 years ago
|
||
Ray, Thanks for the comment, and idea to use 'all:5' to catch some log of the problem. I did a quick test (bring up mozilla to my home page; fails on W2K machine) and terminate immediately after fail. I produced a log (trimmed of some redundant lines which I attach to end of message). Interesting thing, with logging as all:5 the failure seemed to occur (present) faster, than without. Regards, Mike ---------------------------big logfile attached here---------------------------- 0[233e10]: Loaded library Executable (init) 0[233e10]: Loaded library ws2_32.dll (load lib) 0[233e10]: Unloaded library ws2_32.dll 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\xpcom.dll (load lib) 0[233e10]: C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\xpcom.dll incr => 2 (find lib) 0[233e10]: C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\xpcom.dll decr => 1 0[233e10]: read -> 161084 0[233e10]: read -> 91163 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\embedcomponents.dll (load lib) 0[233e10]: Loaded library C:\Program Files\mozilla.org\Mozilla\components\qfaservices.dll (load lib) 0[233e10]: Loaded library C:\Program Files\mozilla.org\Mozilla\components\typeaheadfind.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\xppref32.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\necko.dll (load lib) 0[233e10]: read -> 88 ***************************************60 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 3416 0[233e10]: read -> 0 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\gklayout.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\caps.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\xpc3250.dll (load lib) 0[233e10]: read -> 7424 0[233e10]: read -> 2543 0[233e10]: connect -> 0 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\i18n.dll (load lib) 1204[edc160]: nsSocketTransportService::Run 1204[edc160]: calling PR_Poll [active=0 idle=0] 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\imglib2.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\rdf.dll (load lib) 0[233e10]: Loaded library C:\Program Files\mozilla.org\Mozilla\components\msgMapi.dll (load lib) 0[233e10]: Loaded library C:\Program Files\mozilla.org\Mozilla\components\jsd3250.dll (load lib) 0[233e10]: Loaded library C:\Program Files\mozilla.org\Mozilla\components\palmsync.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\appshell.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\gkwidget.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\profile.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\xpcom_compat_c.dll (load lib) 0[233e10]: read -> 2836 0[233e10]: read -> 0 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\chrome.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\gkparser.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\uconv.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\docshell.dll (load lib) 0[233e10]: read -> 4096 0[233e10]: read -> 4096 ***************************************10 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 12145 0[233e10]: read -> 8634 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\gkgfxwin.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\pipboot.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\oji.dll (load lib) 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\gkplugin.dll (load lib) 0[233e10]: NPN Logging Active! 0[233e10]: General Plugin Logging Active! (nsPluginHostImpl::ctor) 0[233e10]: NPP Logging Active! 0[233e10]: nsPluginHostImpl::ctor 0[233e10]: read -> 225 0[233e10]: LoadCachedPluginsInfo : Loading Cached plugininfo for C:\Program Files\mozilla.org\Mozilla\plugins\npnul32.dll 0[233e10]: nsPluginHostImpl::ScanPluginsDirectory dir=C:\Program Files\mozilla.org\Mozilla\plugins 0[233e10]: nsPluginTag::RegisterWithCategoryManager plugin=C:\Program Files\mozilla.org\Mozilla\plugins\npnul32.dll, removing = no 0[233e10]: nsPluginHostImpl::GetPluginFactory End mime=application/x-java-vm, rv=-2147467259, plugin=0 name=(not found) 0[233e10]: read -> 1375 0[233e10]: read -> 55805 0[233e10]: read -> 7689 ***************************************10 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 1282 0[233e10]: read -> 226 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\cookie.dll (load lib) 0[233e10]: read -> 409 0[233e10]: read -> 0 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\webbrwsr.dll (load lib) 0[233e10]: Loaded library C:\Program Files\mozilla.org\Mozilla\components\appcomps.dll (load lib) 0[233e10]: Creating nsHttpHandler [this=e70938]. 0[233e10]: nsHttpHandler::Init 0[233e10]: nsHttpHandler::PrefsChanged [pref=(null)] 0[233e10]: read -> 4096 0[233e10]: read -> 1752 0[233e10]: read -> 0 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\jar50.dll (load lib) 0[233e10]: read -> 1024 0[233e10]: read -> 4096 ***************************************15 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 1673 0[233e10]: read -> 0 0[233e10]: read -> 30 0[233e10]: read -> 178 0[233e10]: nsHttpAuthCache::Init 0[233e10]: Creating nsHttpConnectionMgr @140f848 0[233e10]: nsHttpConnectionMgr::Init 0[233e10]: nsHttpHandler::StartPruneDeadConnectionsTimer 0[233e10]: read -> 32 0[233e10]: read -> 0 0[233e10]: read -> 5312 0[233e10]: read -> 8192 0[233e10]: read -> 16 0[233e10]: read -> 8172 0[233e10]: read -> 8192 0[233e10]: read -> 8192 ***************************************220 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 0 0[233e10]: read -> 30 0[233e10]: read -> 3946 0[233e10]: read -> 498 1156[140a380]: read -> 30 1156[140a380]: read -> 44 1156[140a380]: read -> 30 1156[140a380]: read -> 57 1156[140a380]: read -> 30 1156[140a380]: read -> 57 1156[140a380]: read -> 30 1156[140a380]: read -> 56 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 3066 0[233e10]: read -> 0 1156[140a380]: read -> 30 1156[140a380]: read -> 56 0[233e10]: read -> 30 0[233e10]: read -> 8192 ***************************************70 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 8192 0[233e10]: read -> 8192 0[233e10]: read -> 8192 0[233e10]: read -> 8192 0[233e10]: read -> 1883 0[233e10]: read -> 0 0[233e10]: read -> 4533 1156[140a380]: read -> 1024 1156[140a380]: read -> 2566 1156[140a380]: read -> 0 1156[140a380]: read -> 30 1156[140a380]: read -> 2118 0[233e10]: read -> 8192 0[233e10]: read -> 8192 ***************************************60 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 2132 0[233e10]: read -> 30 0[233e10]: read -> 4498 0[233e10]: read -> 8192 0[233e10]: read -> 8192 0[233e10]: read -> 2759 0[233e10]: read -> 0 0[233e10]: Loaded library C:\Program Files\mozilla.org\Mozilla\components\txmgr.dll (load lib) 0[233e10]: read -> 30 0[233e10]: read -> 3202 0[233e10]: read -> 30 0[233e10]: read -> 4096 ***************************************20 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 1010 1156[140a380]: read -> 30 1156[140a380]: read -> 68 1156[140a380]: read -> 30 1156[140a380]: read -> 53 0[233e10]: read -> 30 0[233e10]: read -> 4096 ***************************************10 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 62 1156[140a380]: read -> 30 1156[140a380]: read -> 2300 0[233e10]: read -> 30 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 1328 ***************************************10 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 2024 1156[140a380]: read -> 30 1156[140a380]: read -> 65 1156[140a380]: read -> 30 1156[140a380]: read -> 2228 0[233e10]: read -> 30 0[233e10]: read -> 787 ***************************************20 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 2730 0[233e10]: read -> 30 0[233e10]: read -> 8192 0[233e10]: read -> 536 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 1156 0[233e10]: read -> 4096 0[233e10]: read -> 1790 1156[140a380]: read -> 30 1156[140a380]: read -> 178 0[233e10]: Loaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\components\editor.dll (load lib) 1156[140a380]: read -> 30 1156[140a380]: read -> 217 1156[140a380]: read -> 30 1156[140a380]: read -> 1480 0[233e10]: read -> 30 0[233e10]: read -> 488 1156[140a380]: read -> 30 0[233e10]: read -> 30 0[233e10]: read -> 3937 1156[140a380]: read -> 165 1156[140a380]: read -> 30 1156[140a380]: read -> 200 1156[140a380]: read -> 30 1156[140a380]: read -> 51 0[233e10]: read -> 30 0[233e10]: read -> 978 0[233e10]: read -> 30 0[233e10]: read -> 4096 ***************************************20 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 1257 0[233e10]: read -> 30 0[233e10]: read -> 3826 1156[140a380]: read -> 30 1156[140a380]: read -> 1030 1332[19b97b0]: read -> 30 1332[19b97b0]: read -> 112 1316[19b9180]: read -> 30 1316[19b9180]: read -> 67 0[233e10]: read -> 30 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 2415 1332[19b97b0]: read -> 30 1332[19b97b0]: read -> 941 1316[19b9180]: read -> 30 1316[19b9180]: read -> 929 1156[140a380]: read -> 30 1156[140a380]: read -> 932 1332[19b97b0]: read -> 30 1332[19b97b0]: read -> 132 1316[19b9180]: read -> 30 1316[19b9180]: read -> 214 0[233e10]: read -> 30 0[233e10]: read -> 1827 0[233e10]: read -> 30 0[233e10]: read -> 3194 1156[140a380]: read -> 30 1332[19b97b0]: read -> 30 1332[19b97b0]: read -> 163 1156[140a380]: read -> 908 0[233e10]: read -> 2257 0[233e10]: SecureUI:1acf128: Init 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 18ad4e0 OnStateChange 10001 about:layout-dummy-request 0[233e10]: SecureUI:1acf128: OnStateChange: not relevant for sub content 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 1aaaa20 OnStateChange 30004 about:blank 0[233e10]: SecureUI:1acf128: OnStateChange: not relevant for sub content 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 18ad4e0 OnStateChange 10004 about:layout-dummy-request 0[233e10]: SecureUI:1acf128: OnStateChange: not relevant for sub content 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 18ad4e0 OnStateChange 10010 about:layout-dummy-request 0[233e10]: SecureUI:1acf128: OnStateChange: not relevant for sub content 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 1aaaa20 OnStateChange 10010 about:blank 0[233e10]: SecureUI:1acf128: OnStateChange: SOMETHING STOPS FOR TOPMOST DOCUMENT 0[233e10]: SecureUI:1acf128: OnStateChange: not relevant for sub content 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 1aaaa20 OnStateChange 20010 about:blank 0[233e10]: SecureUI:1acf128: OnStateChange: not relevant for sub content 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 1aaaa20 OnStateChange c0010 about:blank 0[233e10]: SecureUI:1acf128: OnStateChange: not relevant for sub content 0[233e10]: read -> 3095 1316[19b9180]: read -> 30 1316[19b9180]: read -> 160 0[233e10]: read -> 30 0[233e10]: read -> 2725 0[233e10]: read -> 5287 0[233e10]: nsHttpHandler::NewURI 0[233e10]: nsHttpHandler::NewProxiedChannel [proxyInfo=0] 0[233e10]: Creating nsHttpChannel @1a176a8 0[233e10]: nsHttpChannel::Init [this=1a176a8] 0[233e10]: host=www.yahoo.com port=-1 0[233e10]: uri=http://www.yahoo.com/ 0[233e10]: Creating nsHttpConnectionInfo @1afbb98 0[233e10]: nsHttpHandler::AddStandardRequestHeaders 0[233e10]: nsHttpHandler::BuildUserAgent 0[233e10]: nsHttpChannel::AsyncOpen [this=1a176a8] 0[233e10]: read -> 518 0[233e10]: read -> 0 0[233e10]: nsHttpHandler::NotifyObservers [chan=1a176a8 event="http-on-modify-request"] 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 1a176a8 OnStateChange 70001 http://www.yahoo.com/ 0[233e10]: SecureUI:1acf128: OnStateChange: SOMETHING STARTS FOR TOPMOST DOCUMENT 0[233e10]: SecureUI:1acf128: OnStateChange: start for toplevel document 0[233e10]: SecureUI:1acf128: OnStateChange: ++mDocumentRequestsInProgress 0[233e10]: nsHttpChannel::Connect [this=1a176a8] 0[233e10]: nsHttpChannel::OpenCacheEntry [this=1a176a8] 1332[19b97b0]: read -> 30 1332[19b97b0]: read -> 4096 1332[19b97b0]: read -> 4096 1332[19b97b0]: read -> 4096 1332[19b97b0]: read -> 4096 1332[19b97b0]: read -> 4096 1332[19b97b0]: read -> 3423 0[233e10]: read -> 135168 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: read -> 4096 0[233e10]: write -> 4096 0[233e10]: got cache entry [access=2] 0[233e10]: nsHTTPChannel::CheckCache [this=1a176a8 entry=1b24840] 0[233e10]: nsHttpChannel::AddAuthorizationHeaders? [this=1a176a8] 0[233e10]: nsHttpAuthCache::GetAuthEntryForPath [key=http://www.yahoo.com:80 path=/] 0[233e10]: nsHttpChannel::SetupTransaction [this=1a176a8] 0[233e10]: Creating nsHttpTransaction @1b21538 0[233e10]: nsHttpTransaction::Init [this=1b21538 caps=1] 0[233e10]: http request [ 0[233e10]: GET / HTTP/1.1 0[233e10]: Host: www.yahoo.com 0[233e10]: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910 0[233e10]: Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1 0[233e10]: Accept-Language: en-us,en;q=0.5 0[233e10]: Accept-Encoding: gzip,deflate 0[233e10]: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 0[233e10]: Keep-Alive: 300 0[233e10]: Connection: keep-alive 0[233e10]: ] 0[233e10]: nsHttpConnectionMgr::AddTransaction [trans=1b21538] 0[233e10]: nsSocketTransportService::PostEvent [event=1ad8ca0] 0[233e10]: send: fd=ee2918 osfd=612 buf=60f77564 amount=1 1204[edc160]: recv: fd=ee2888 osfd=640 buf=13dfb34 amount=1024 flags=0 1204[edc160]: recv -> 1, error = 0, os error = 0 0[233e10]: send -> 1 1204[edc160]: nsHttpConnectionMgr::OnMsgNewTransaction [trans=1b21538] 1204[edc160]: nsHttpConnectionMgr::GetConnection [ci=..www.yahoo.com:80 caps=1] 1204[edc160]: nsHttpConnectionMgr::AtActiveConnectionLimit [ci=..www.yahoo.com:80 caps=1] 1204[edc160]: total=0, persist=0 1204[edc160]: Creating nsHttpConnection @1b24d88 1204[edc160]: nsHttpConnection::Init [this=1b24d88] 1204[edc160]: nsHttpConnectionMgr::DispatchTransaction [ci=..www.yahoo.com:80 trans=1b21538 caps=1 conn=1b24d88] 1204[edc160]: nsHttpConnection::Activate [this=1b24d88 trans=1b21538 caps=1] 1204[edc160]: creating nsSocketTransport @1b24678 1204[edc160]: nsSocketTransport::Init [this=1b24678 host=www.yahoo.com:80 proxy=:0] 1204[edc160]: nsSocketTransport::OpenOutputStream [this=1b24678 flags=2] 1204[edc160]: nsSocketTransport::PostEvent [this=1b24678 type=0 status=0 param=0] 1204[edc160]: nsSocketTransportService::PostEvent [event=1b24e68] 1204[edc160]: send: fd=ee2918 osfd=612 buf=60f77564 amount=1 1204[edc160]: send -> 1 1204[edc160]: nsSocketTransport::OpenInputStream [this=1b24678 flags=2] 1204[edc160]: nsSocketTransport::PostEvent [this=1b24678 type=0 status=0 param=0] 1204[edc160]: nsSocketTransportService::PostEvent [event=1b24ea0] 1204[edc160]: send: fd=ee2918 osfd=612 buf=60f77564 amount=1 1204[edc160]: send -> 1 1204[edc160]: nsSocketOutputStream::AsyncWait [this=1b24728] 1204[edc160]: calling PR_Poll [active=0 idle=0] 1204[edc160]: recv: fd=ee2888 osfd=640 buf=13dfb34 amount=1024 flags=0 1204[edc160]: recv -> 2, error = 0, os error = 0 1204[edc160]: nsSocketTransport::OnSocketEvent [this=1b24678 type=0 status=0 param=0] 1204[edc160]: MSG_ENSURE_CONNECT 1204[edc160]: nsSocketTransport::ResolveHost [this=1b24678] 1204[edc160]: advancing to STATE_RESOLVING 1204[edc160]: nsSocketTransport::SendStatus [this=1b24678 status=804b0003] 1204[edc160]: nsHttpTransaction::OnSocketStatus [this=1b21538 status=804b0003 progress=0] 1204[edc160]: nsSocketTransport::OnSocketEvent [this=1b24678 type=0 status=0 param=0] 1204[edc160]: MSG_ENSURE_CONNECT 1204[edc160]: ignoring redundant event 1204[edc160]: calling PR_Poll [active=0 idle=0] 292[1b24868]: thread exiting 0[233e10]: read -> 1666 0[233e10]: read -> 645 0[233e10]: read -> 30 0[233e10]: read -> 296 0[233e10]: read -> 1512 0[233e10]: read -> 30 0[233e10]: read -> 643 0[233e10]: read -> 6515 0[233e10]: read -> 408 0[233e10]: read -> 30 0[233e10]: read -> 514 0[233e10]: sending status notification [this=1a176a8 status=804b0003 progress=0/0] 0[233e10]: read -> 30 0[233e10]: read -> 2695 0[233e10]: read -> 2139 0[233e10]: read -> 3539 0[233e10]: read -> 261112 1000[1815ff8]: nsSocketTransport::PostEvent [this=1b24678 type=1 status=804b001e param=0] 1000[1815ff8]: nsSocketTransportService::PostEvent [event=2004830] 1000[1815ff8]: send: fd=ee2918 osfd=612 buf=60f77564 amount=1 1204[edc160]: recv: fd=ee2888 osfd=640 buf=13dfb34 amount=1024 flags=0 1204[edc160]: recv -> 1, error = -5987, os error = 0 1000[1815ff8]: send -> 1 1204[edc160]: nsSocketTransport::OnSocketEvent [this=1b24678 type=1 status=804b001e param=0] 1204[edc160]: MSG_DNS_LOOKUP_COMPLETE 1204[edc160]: after event [this=1b24678 cond=804b001e] 1204[edc160]: nsSocketTransport::OnSocketDetached [this=1b24678 cond=804b001e] 1204[edc160]: nsSocketTransport::RecoverFromError [this=1b24678 state=2 cond=804b001e] 1204[edc160]: nsSocketInputStream::OnSocketReady [this=1b2470c cond=804b001e] 1204[edc160]: nsSocketOutputStream::OnSocketReady [this=1b24728 cond=804b001e] 1204[edc160]: nsHttpConnection::OnSocketWritable [this=1b24d88] 1204[edc160]: writing transaction request stream 1204[edc160]: nsSocketOutputStream::Write [this=1b24728 count=414] 1204[edc160]: ReadSegments returned [rv=0 read=0 sock-cond=804b001e] 1204[edc160]: nsHttpConnection::CloseTransaction[this=1b24d88 trans=1b21538 reason=804b001e] 1204[edc160]: nsHttpTransaction::Close [this=1b21538 reason=804b001e] 1204[edc160]: nsHttpConnectionMgr::ReclaimConnection [conn=1b24d88] 1204[edc160]: nsSocketTransportService::PostEvent [event=1ba0430] 1204[edc160]: send: fd=ee2918 osfd=612 buf=60f77564 amount=1 1204[edc160]: send -> 1 1204[edc160]: nsHttpConnection::Close [this=1b24d88 reason=804b001e] 1204[edc160]: nsSocketInputStream::CloseWithStatus [this=1b2470c reason=804b001e] 1204[edc160]: nsSocketOutputStream::CloseWithStatus [this=1b24728 reason=804b001e] 1204[edc160]: calling PR_Poll [active=0 idle=0] 1204[edc160]: recv: fd=ee2888 osfd=640 buf=13dfb34 amount=1024 flags=0 1204[edc160]: recv -> 1, error = -5987, os error = 0 1204[edc160]: nsHttpConnectionMgr::OnMsgReclaimConnection [conn=1b24d88] 1204[edc160]: connection cannot be reused; closing connection 1204[edc160]: nsHttpConnection::Close [this=1b24d88 reason=80004004] 1204[edc160]: nsSocketInputStream::CloseWithStatus [this=1b2470c reason=80004004] 1204[edc160]: nsSocketOutputStream::CloseWithStatus [this=1b24728 reason=80004004] 1204[edc160]: nsHttpConnectionMgr::OnMsgProcessPendingQ [ci=..www.yahoo.com:80] 1204[edc160]: nsHttpConnectionMgr::ProcessPendingQForEntry [ci=..www.yahoo.com:80] 1204[edc160]: nsHttpConnectionMgr::ProcessPendingQForEntry [ci=..www.yahoo.com:80] 1204[edc160]: Destroying nsHttpConnection @1b24d88 1204[edc160]: nsSocketOutputStream::CloseWithStatus [this=1b24728 reason=80470002] 1204[edc160]: nsSocketInputStream::CloseWithStatus [this=1b2470c reason=80470002] 1204[edc160]: calling PR_Poll [active=0 idle=0] 1000[1815ff8]: destroying nsSocketTransport @1b24678 0[233e10]: nsHttpChannel::OnStartRequest [this=1a176a8 request=1a3b410 status=804b001e] 0[233e10]: calling mListener->OnStartRequest 0[233e10]: nsHttpChannel::OnStopRequest [this=1a176a8 request=1a3b410 status=804b001e] 0[233e10]: nsHttpTransaction::DeleteSelfOnConsumerThread [this=1b21538] 0[233e10]: Destroying nsHttpTransaction @1b21538 0[233e10]: nsHttpChannel::FinalizeCacheEntry [this=1a176a8] 0[233e10]: calling OnStopRequest 0[233e10]: nsHttpChannel::CloseCacheEntry [this=1a176a8 status=804b001e] 0[233e10]: dooming cache entry!! 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 1a176a8 OnStateChange 10010 http://www.yahoo.com/ 0[233e10]: SecureUI:1acf128: OnStateChange: SOMETHING STOPS FOR TOPMOST DOCUMENT 0[233e10]: SecureUI:1acf128: OnStateChange: --mDocumentRequestsInProgress 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 1a176a8 OnStateChange 20010 http://www.yahoo.com/ 0[233e10]: nsHttpHandler::NewURI 0[233e10]: read -> 30 0[233e10]: read -> 2695 ***************************************10 read-xxxx lines removed Mike Lisanke ************************ 0[233e10]: read -> 3265 0[233e10]: read -> 30 0[233e10]: read -> 2688 0[233e10]: read -> 30 0[233e10]: read -> 2668 0[233e10]: read -> 30 0[233e10]: read -> 167 1156[140a380]: read -> 30 1156[140a380]: read -> 226 0[233e10]: SecureUI:1acf128: OnStateChange: progress: for toplevel 0[233e10]: SecureUI:1acf128: OnStateChange 0[233e10]: SecureUI:1acf128: 1ab89f4 1a176a8 OnStateChange c0010 http://www.yahoo.com/ 0[233e10]: write -> 4096 0[233e10]: write -> 4096 ***************************************60 write-xxxx lines removed Mike Lisanke ************************ 0[233e10]: write -> 4096 0[233e10]: write -> 3064 0[233e10]: nsPluginHostImpl::Destroy Called 0[233e10]: nsPluginTag::RegisterWithCategoryManager plugin=C:\Program Files\mozilla.org\Mozilla\plugins\npnul32.dll, removing = yes 0[233e10]: write -> 2836 0[233e10]: nsHttpHandler::Observe [topic="profile-change-net-teardown"] 0[233e10]: nsHttpHandler::StopPruneDeadConnectionsTimer 0[233e10]: nsHttpAuthCache::ClearAll 0[233e10]: nsHttpConnectionMgr::Shutdown 0[233e10]: nsSocketTransportService::PostEvent [event=1aaa800] 0[233e10]: send: fd=ee2918 osfd=612 buf=60f77564 amount=1 1204[edc160]: recv: fd=ee2888 osfd=640 buf=13dfb34 amount=1024 flags=0 1204[edc160]: recv -> 1, error = -5987, os error = 0 0[233e10]: send -> 1 1204[edc160]: nsHttpConnectionMgr::OnMsgShutdown 1204[edc160]: calling PR_Poll [active=0 idle=0] 1000[1815ff8]: thread exiting 0[233e10]: nsSocketTransportService::Shutdown 0[233e10]: send: fd=ee2918 osfd=612 buf=60f77564 amount=1 1204[edc160]: recv: fd=ee2888 osfd=640 buf=13dfb34 amount=1024 flags=0 1204[edc160]: recv -> 1, error = -5987, os error = 0 0[233e10]: send -> 1 1204[edc160]: shutting down socket transport thread... 1204[edc160]: thread exiting 0[233e10]: Destroying nsHttpChannel @1a176a8 0[233e10]: Destroying nsHttpConnectionInfo @1afbb98 0[233e10]: write -> 4096 0[233e10]: write -> 4096 ***************************************60 write-xxxx lines removed Mike Lisanke ************************ 0[233e10]: write -> 4096 0[233e10]: write -> 3064 0[233e10]: write -> 131072 0[233e10]: write -> 4096 0[233e10]: nsHttpHandler::Observe [topic="xpcom-shutdown"] 0[233e10]: nsHttpHandler::StopPruneDeadConnectionsTimer 0[233e10]: nsHttpAuthCache::ClearAll 0[233e10]: nsHttpConnectionMgr::Shutdown 1316[19b9180]: thread exiting 1332[19b97b0]: thread exiting 1156[140a380]: thread exiting 0[233e10]: nsSocketTransportService::Shutdown 0[233e10]: Deleting nsHttpHandler [this=e70938] 0[233e10]: nsHttpConnectionMgr::Shutdown 0[233e10]: Destroying nsHttpConnectionMgr @140f848 0[233e10]: nsPluginHostImpl::dtor 0[233e10]: nsPluginHostImpl::Destroy Called 1220[f163b0]: thread exiting 1064[235f68]: thread exiting 0[233e10]: Unloaded library C:\Program Files\Common Files\mozilla.org\GRE\1.7.3_2004091008\xpcom.dll ---------------------------end big logfile--------------------------------------
Assignee | ||
Comment 8•20 years ago
|
||
Mike: thanks for the log file. for future reference, it's better to use the "Create a New Attachment" feature of bugzilla. that way long attachments can be managed more easily :-)
Assignee | ||
Comment 9•20 years ago
|
||
(In reply to comment #5) > All recent release builds should do logging but it`s possible that > "nsHostResolver:5" is only availible in a debug build.. That logging module should be enabled by default in release builds unless the release build was configured with the --disable-logging flag (see "about:buildconfig" to check if your build has that flag). FWIW, I checked and nsHostResolver:5 is not enabled on the Aviary branch, which explains why you got an empty log file when using Firefox 1.0PR. If you download a nightly trunk build (of Firefox or Mozilla Seamonkey), then you should be able to generate a nsHostResolver:5 log. The rest of the logging output is less useful (I think) because this bug is about DNS issues, so it'd be good to see what our host resolver thinks it is doing ;-)
Reporter | ||
Comment 10•20 years ago
|
||
Darin, Yah, sorry about the logfile embed. I thought better of it, but; I hate to run scripts of any kind if I can help it. OK, now for the (good)bad news. I downloaded the Nightly build and before I could test that the logging now works. Mozilla's install launch to Mozilla.org worked. A quick test, and yes, I now had no problem to www.yahoo.com, et. al. It seems stange that a 1.6 and 1.7.3 install has a problem, and the nightly build doesn't. Does this mean anything to you, or anyone else reading along? Regards, Mike
Comment 11•20 years ago
|
||
This goes away in the Firefox trunk build for me too. Hmmm. I forgot to mention that was happening on Firefox PR 1.0 for me. While using the trunk is great for me, it sucks for the 1.0 release. I was getting ready to consider switching back to something like Opera (or even, shudder, IE) before I tried the trunk. Personally, I think this should block Firefox 1.0, but I don't remember whether I'm supposed to set the flag to "?" to request that (or which flag, for that matter :-)...
Severity: normal → major
Comment 12•20 years ago
|
||
I take it back. I'm still having a problem even on the trunk nightly (it was less pronounced, I thought, but I suspect some kind of caching was fooling me). Still repro's easily by going to Google News and Ctrl-Clicking on a bunch of different sites to get a bunch of tabs. However, I still can't get "set NSPR_LOG_MODULES=nsResolver:5" to put anything in the log file (it creates an empty one, which stays empty), even with the latest trunk nightly build (and about:buildconfig does not show the --disable- logging flag).
Comment 13•20 years ago
|
||
(In reply to comment #12) > However, I still can't get "set NSPR_LOG_MODULES=nsResolver:5" to put anything try using nsHostResolver:5
Comment 14•20 years ago
|
||
Doh!!!
Attachment #160419 -
Attachment mime type: application/octet-stream → text/plain
Comment 15•20 years ago
|
||
This is getting intolerable lately. The most recent 1.0PR build (20041025) seems (subjectively) to be even worse about this, but it could just be my frustration level building. In spite of having invested a lot of time in Mozilla, I'm getting close to switching to Opera or something because of this problem. Even IE is starting to look appealing... Requesting 1.0 block (hope I did that right :-).
Flags: blocking-aviary1.0?
Assignee | ||
Comment 16•20 years ago
|
||
Hi Ray, Unfortunately the log file doesn't show us anything useful :-( Thanks for capturign though. We'll have to think of another way to uncover what is troubling your system. I've heard of a few other reports of DNS problems, but I've never experienced any myself. It's very difficult to diagnose a problem if you can't see it! ;-) Can you please try changing the logging to: NSPR_LOG_MODULES=nsHostResolver:5,nsSocketTransport:5 That way I might be able to deduce further what is going on. Thanks!
Reporter | ||
Comment 17•20 years ago
|
||
The problem may be related to DNS response timing. For me, the problem disappeared (without change to Mozilla) on my XP system. Also, changing to a nightly build on the Win2K system also 'fixed' the problem. It is not clear what was wrong, except that certain DNS look-ups failed for Mozilla only.
Assignee | ||
Comment 18•20 years ago
|
||
> The problem may be related to DNS response timing. For me, the problem
> disappeared (without change to Mozilla) on my XP system. Also, changing to a
> nightly build on the Win2K system also 'fixed' the problem. It is not clear what
> was wrong, except that certain DNS look-ups failed for Mozilla only.
Under WinXP we use the Microsoft getaddrinfo API, whereas on Win2k and other
older versions of Windows we use the traditional gethostbyname. getaddrinfo is
used so that we can support IPv6 if enabled on the host system. I think we
revert to gethostbyname if we detect that IPv6 is not enabled (i.e., if creating
an IPv6 socket fails).
This leaves us having very little option since we are simply invoking a
Microsoft API for DNS. Mozilla does not even know that DNS will be used to
resolve hostnames, so there is usually very little we can do in these situations :-(
The exception may be if we find that we are not calling the API correctly or
something like that, which is unlikely since the API is so simple.
Comment 19•20 years ago
|
||
Here's a log as requested. At the end of the log, I'm still stuck trying to load some images on the web pages I opened, but it did finish loading all the main pages, so this log shows examples of it eventually working as well as indicating the stuck state. Hopefully this will turn out to be useful. BTW, I'm running XP, but I've run XP in the past without having this problem. Also, I've tried disabling IPv6 DNS resolution (by setting network.dns.disableIPv6 to true), and that had no effect, in case that's an anti-clue. The documentation for the getaddrinfo API doesn't look especially simple to me, but perhaps that's just because it's been a long time since I did any network programming.
Attachment #160419 -
Attachment is obsolete: true
Assignee | ||
Comment 20•20 years ago
|
||
Thanks for the log file Ray... you may also want to try adjusting these prefs: network.dnsCacheExpiration network.dnsCacheEntries Both are integer values, where dnsCacheExpiration is the number of seconds to hold a lookup result in mozilla's DNS cache, and dnsCacheEntries is the number of entries to store before evicting old entries. These are "hidden" prefs in that their default values are not listed in about:config, but you can easily just use the "right click" and select "New->Integer" method to add these. Please let me know if adjusting those prefs helps any.
Assignee | ||
Comment 21•20 years ago
|
||
NOTE: older versions of Mozilla used to cache DNS lookups indefinitely, but people complained that that caused problems. So, we reduced our default expiration time to 60 seconds, and keep only 20 entries. If modifying those prefs helps you, then we may want to consider bumping up the value of the defaults to something higher. As I said before since Mozilla uses the operating system's resolver API, there is no way for Mozilla to infer DNS TTL values that could be used to build an optimal DNS cache. As such, we are left with the crude method of applying a global expiration value.
Comment 22•20 years ago
|
||
Well, upping the dns cache limits seems to help mask the problem to the point where it might be bearable if I cranked up the expiration time to several days and the number of entries to 1000 or so. The experiment did give me some insights, though: it looks to me like at some point down the line, Firefox gets stuck on one of the DNS lookups, and then *all* the others get wedged behind it until some really long timeout. Once that one stuck one lets go, the rest seem to resolve pretty quickly. An experiment with IE shows that it doesn't have any kind of queue of these requests... it seems to do them all in parallel. For example, on the same set of test sites from news.google.com, Firefox got stuck on most of them, and only eventually freed itself up and loaded them all. With IE, it got stuck on one of them (salon.com for some reason), but all the rest loaded immediately. This raises another important clue: it's not *just* that my machine is configured in some weird way or is connected through some quirky ISP that prevents it from doing DNS lookups. Other apps do this fine. But there *is* some quirk that causes DNS to get somewhat stuck for me on some occasions. Even in the IE case, though, it seemed to recover much faster than Mozilla. It's almost like it had a shorter retry interval or something. The nice thing about cranking up the DNS cache on Firefox, though, is that once I've gotten to a site, it seems (subjectively) like I can at least open other windows on that site pretty reliably even if FF is stuck on some other DNS resolve. This will keep me using/testing it for considerably longer than if I didn't have any workaround... I guess I'd say that it still sucks, but at least I can live with it for now. I've done enough testing on this now to feel pretty comfortable that it's a real FF bug, so I'm confirming it. Please accept my apologies if that is the wrong thing to do for some arcane reason :-).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 23•20 years ago
|
||
putting on the blocking list until we know more. we would need a low risk patch and lots of testing before landing on the branch for firefox 1.0
Flags: blocking-aviary1.0? → blocking-aviary1.0+
Comment 24•20 years ago
|
||
I see this problem too (mozilla, not ff): sporadically mozilla will cease to resolve any hostnames, even though ping and nslookup and other programs can contact sites just fine. Restarting mozilla a couple times always cures the problem (often the first restart doesn't do it, but so far the second restart has always cured it). Clearing the cache and/or setting cache size to 0 doesn't cure it. Mozilla 1.7.3 (and earlier versions -- I've been seeing this for a while) on Debian unstable with kernel 2.6.8.1 (and various earlier versions). CONFIG_IPV6 is not set, if that matters (I've seen mentions of that in other DNS-related bugs while searching for this one).
Assignee | ||
Comment 25•20 years ago
|
||
Hmm, before akkana's comment I thought this might be related to the fact that we are using gethostbyname on Windows. I thought that maybe the OS was serializing all threads accessing that function. But, the fact that akkana's says she sees something like this on Linux, makes me suspect that it might be a problem in the DNS thread pool. Hmm...
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8beta
Comment 26•20 years ago
|
||
hm... seems to me like this is the same as most if not all duplicates of bug 240759. maybe one of those provides more info.
Comment 27•20 years ago
|
||
Windows NT XP and 2000 have a system service that cahches responses to DNS queries. Nslookup does not use it. When it gets hosed you see symmptoms similar to those reported in this bug. Unless this bug can be reporduced wiht the DNS Client service disabled inthe operating system. I would suspect this is not a valid bug.
Comment 28•20 years ago
|
||
Just repproduced it with the XP DNS client service stopped. Also, it's been reported on Linux (comment 24). Additionally, I would be extremely surprised to find that IE doesn't use the DNS caching service (in fact, why else would they even bother implement a DNS caching service?). And it doesn't have this problem at all.
Comment 29•20 years ago
|
||
Well, there's one way to find out if this is a problem with the Windows DNS: the next time that it happens, run IPCONFIG /FLUSHDNS and see if the problem goes away. If it does, it's a problem with Window's DNS cache. If it doesn't, it's a problem with Mozilla.
Comment 30•20 years ago
|
||
Re: comment 29. There's no effect from ipconfig/flushdns. I see 2 possible ways it could have had an impact: freeing up an already "stuck" FF, and allowing a non-stuck FF to work properly after the flush. I tested both, and ipconfig/flushdns has no effect in either case. Of course, now that I've bumped up my dns cache limits, it's getting increasingly difficult to test this :-). But luckily Google News supplies me with a nearly unlimited number of weird sites I've never been to...
Assignee | ||
Comment 31•20 years ago
|
||
It is odd that such a major problem would be reproducible by only a relatively few users. I'm not convinced that the Linux problem (reported by akkana) is at all a duplicate of this WinXP one. Ray: does your computer have multiple CPUs? hyper-threaded? etc.? What about you, akkana?
Comment 32•20 years ago
|
||
Yes, my machine is hyperthreaded. It's a pretty new one (3.2 GHz). Interestingly, I switched to that machine at a time not too drastically different from when I started noticing this problem. I would take that with a grain of salt, because I can't swear to an actual correlation, but it might be an interesting avenue to investigate. Deadlocks in the DNS thread pool that only show up on a fast (effectively) multi-processor system are a distinct possibility. I suppose I should install FF on my even newer Media Center PC at home and see if it happens there... not sure when I'll be able to get around to that, though... I don't use it for browsing much (TVs make sucky monitors).
Assignee | ||
Comment 33•20 years ago
|
||
I regularly use a SMP linux box and have yet to encounter a problem like this, but it would not surprise me if there was a race-condition of some sort here.
Comment 34•20 years ago
|
||
If I were the type to offer advise :-), I'd suggest doing some thought experiments along the lines of "what would happen if a particular DNS lookup got hung or failed at the same time that X happened" where X includes: 1) another DNS request came in (especially if it "came in" due to needing to look up an address for an image that was just parsed on the main page of some tab), 2) another thread's DNS lookup succeeded/failed, 3) you ran out of threads in some pool, 4) some mutex (perhaps indirectly) protecting some part of the DNS system is held, and 5) others that I can't think of right now... Also, generally examining the re-entrancy of the DNS subsystem wouldn't hurt. I'd offer to do it, but it would take me a very long time to come up to speed on the source architecture and I'm really busy right now. The idea of the code "spontaneously" needing to do another lookup is especially interesting to me, because one of the times I most often see this bug is when a main page has finished loading, but its external images aren't loaded yet. That's a sufficiently common occurance that I wouldn't get too excited about it (could just be statistics), but it might be a clue.
Assignee | ||
Comment 35•20 years ago
|
||
Here's the code for the thread pool and asynchronous host resolver: http://lxr.mozilla.org/mozilla/source/netwerk/dns/src/nsHostResolver.cpp#377 It's dependencies on other mozilla code are limited to NSPR and pldhash.
Comment 36•20 years ago
|
||
re comment #28. I guess I should have read through all the comments on this bug a little more closely. I thought the issue was that the same hostname that resoves in nslookup does not resove in Mozilla. I have had issues like this and the dns caching was the cause, but in those cases the name did not resolve in IE either. However I now remember having had issue like this in mozilla in the past and I concluded that the issue might be this one: http://support.microsoft.com/default.aspx?scid=kb;en-us;815768 So, I disabled IPV6 on my system and have not had a similar issue since. But none of this expains the commenter who said they are having similar issues under linux.
Comment 37•20 years ago
|
||
Sorry for bugsapm on this re comment #36. I realized after posting that this is not the same knowlegbase article I was talking about. I cant seem to find the one I had before. Perhaps it wasn't on the microsoft site. The one I had found before said a workaround was to disable IPV6 support and told you how to accomplish that.
Comment 38•20 years ago
|
||
Ok, time to expose my complete ignorance of the codebase: At line 549, it looks like the code is waiting for some condition to become true (or a timeout occurs) while holding the lock (from line 535). Is this correct? If so, I'm *extremely* suspicious of this, because ResolveHost grabs the lock before it calls IssueLookup to hand its request off to the idle thread (which has the lock held). This would be deadlocked until line 554 is hit. The only question would be why this doesn't happen more often... Some other ideas/questions: At line 511: it looks like IssueLookup can return NS_OK without actually issuing the lookup if mThreadCount >= MAX_THREADS. Perhaps that's not harmful, but I'm having a hard time following the consequences. At line 388: does result's constructor initialize it to NULL or something equivilent? Otherwise it's used without being initialized at line 451. (grasping at straws... this appears to be done in so many places that I have to believe it does the right thing).
Comment 39•20 years ago
|
||
Just one CPU here, a P3. I suspect I'm seeing the same issue reported by many people (on various platforms) in bug 146769, which may or may not be the same issue as the Windows people here are seeing. It's unfortunately very sporadic and I haven't detected any regularity to what's happening at the time; certainly I do use tabs heavily (lots of the "lost DNS" bugs seem to mention opening links in new tabs, though I'm not clear why that would matter) and when it happened yesterday, there was some network flakiness that afternoon (mtr was showing some lossage), so perhaps DNS was slower than usual and that causes a race condition somehow. Anecdotally, yesterday I mentioned the problem on a linux-heavy IRC channel (to see if anyone else had trouble with http from a host I was trying -- no one did) and got several "Oh, my mozilla/firefox does that too" comments (where host resolution stops working until a restart).
Assignee | ||
Comment 40•20 years ago
|
||
so, just to be clear the symptom is that the browser appears completely unable to resolve hosts until a restart? ray: is that what you see too? or in your case are there just unusually long delays?
Assignee | ||
Comment 41•20 years ago
|
||
> At line 549, it looks like the code is waiting for some condition to become true > (or a timeout occurs) while holding the lock (from line 535). Is this correct? yes > If so, I'm *extremely* suspicious of this, because ResolveHost grabs the lock > before it calls IssueLookup to hand its request off to the idle thread (which > has the lock held). This would be deadlocked until line 554 is hit. when you wait on a condition variable, you implicitly release the mutex. hence, ResolveHost is allowed to enter the mutex, and eventually notify the condition variable inside IssueLookup. > At line 511: it looks like IssueLookup can return NS_OK without actually ... we add the new lookup to mPendingQ before testing mThreadCount. if we find that there are already active threads (and if the max thread count is reached), then we assume that one of the busy threads will process our lookup once it finishes with the one it is processing. > At line 388: does result's constructor initialize it to NULL or something > equivilent? Otherwise it's used without being initialized at line 451. yeah, nsRefPtr initializes its "value" to null by default.
Comment 42•20 years ago
|
||
OK please ignore all my previous comments on this bug (you probably are anyway :-) ) What I was mis-remembering is a previously reported bug 219088. The issue was a bug in the DNS Client service in Windows. The issue had to do with resolving hostnames that resolved to a CNAME record that pointed to a host with both A and AAAA records (and I think you had to have IPV6 installed on the PC too) to see the issue. The problem was that only the AAAA records were being returned and not he A records in the CNAME case. As far as I know this issue was subsequently fixed by Microsoft, but I am not 100% sure of that. So some of the people seeing this issue could actually be seeing the bug 219088 issue which turned out to be a Windows OS issue. The workaround for that issue was to not run the DNC Client service so that it was not used at all for queries, either by stopping the service via the control panel or with the command "net stop dnscache". IPCONFIG /FLUSHDNS as used in comment #29 is not sufficient. It sounds as if there is a new issue here having to do with the way mozilla does the name resolution that appears to effect multi-threaded processors only. However, I still think it would be a good idea for people working this issue to disable the DNS CLient service just to be sure it is not the cause of the problem.
Comment 43•20 years ago
|
||
If you'll tell me how to turn on debug tracing for the condition variable locking scheme(s), I'll try to look for a deadlock that way... I've gone back to using the 1.0PR branch, but I might have time to revert to the trunk if needed to get the appropriate output. Anyway, I'm always suspicious about code that looks like it's holding a lock throughout that actually releases/regrabs it implicitly somewhere in the middle... For example, it looks to me like the variable mHaveIdleThread isn't effectively protected by mLock. If I'm reading it right, 2 threads can both set mHaveIdleThread while only 1 of them is actually waiting for the condition variable via the underlying OS mutex (the second thread can grab mLock for a very brief time after the first thread unlocks it inside PR_WaitCondVar). As a result, the one that is actually waiting can be freed up by line 509, and set mHaveIdleThread=FALSE before the second thread even waits. Now we have the situation where there's an idle thread, but the rest of the system thinks there isn't one. That might be harmless (hard to tell), because it will just create a new thread if it thinks there isn't an idle one... *unless* we get to the state where mThreadCount == MAX_THREADS. Then IssueLookup will neither create a new thread nor signal the waiting one. But then eventually the one that *was* freed will come back and become free again, I guess... bleah... what a mess. <----Brain melts down here. Also, I can't quite figure out whether there are actually 2 mutexes/critical sections here: mLock, plus some kind of internal lock protecting access to the cvar struct itself (there are calls to something called _PR_CVAR_LOCK(cvar) inside _PR_WaitCondVar in addition to what looks like a thread lock... as well as locking and unlocking mLock). If there really are 2+ locks here, then there are also potential inversions in the order locks are grabbed that would have to be identified and analyzed, because of course these could be leading to deadlocks. All of this makes it really hard to debug this problem by analysis... (side note: I don't know what happens here if 2 threads are waiting on a condition variable when it is released... this would depend on the exact kind of mutex that is used. Some kinds will free all threads, some will free only 1, some have other behaviors... I got lost digging that out of the source, so I'm a bit unclear on the exact behavior of the above scenario... Therefore, I worded it so it doesn't matter, because only 1 thread will actually be waiting... if the mutex is a "single release" kind, then this scenario can happen much more easily).
Comment 44•20 years ago
|
||
Seeing this bug on WinXP sp1 and sp2 FF1.0.1PR Seems random and non-replicatable, but still occurs infrequently.. anecdotally I seem to recall that it happened after an initial problem loading a request, thought it was a problem with firewall at first, but have confirmed that's not the case, then thought it was the fact that windows update was running at the same time, this also not the case..
Comment 46•20 years ago
|
||
Hmmm. Thinking about this some more, what about this scenario: 1) The max number of threads are allocated, and all the threads are idle and waiting on the condition variable. 2) A new request comes in, releasing one thread, which sets mHaveIdleThread to false and calls PR_GetAddrInfoByName. 3) For some reason, this hangs (perhaps DNS is unhappy for some reason for that particular host). 4) No further requests can be processed. This happens because mHaveIdleThread was cleared when GetHostToLookup was released from its wait on the condition variable. As a result, we have mThreadCount == MAX_THREADS and mHaveIdleThread == FALSE, so IssueLookup neither creates a new thread nor signals any of the waiting ones. Hung until either the pending DNS lookup finally finishes, or one of the idle threads times out. This would only happen if you managed to saturate the thread pool with requests, so that might explain why this isn't seen very much, because as far as I can tell, this will happen every time. Of course, perhaps DNS lookup failures also are rare.
Assignee | ||
Comment 47•20 years ago
|
||
Ray, I think you may have figured it out! I think mHaveIdleThread should be a counter instead of a boolean. Well done! I'll cut a patch, and then we can try it out on the trunk and see if it solves the problem.
Assignee | ||
Comment 48•20 years ago
|
||
Sigh... I take that back. In fact there can only ever be one idle thread. We let the rest of the threads terminate. So, a boolean is perfectly fine. I forgot that small but significant detail when I first read your comment. Do you have any reason to believe that all 8 resolver threads are blocked? Do you have a copy of MSVC so that you can break into the debugger and look at the call stacks for each thread?
Comment 49•20 years ago
|
||
The threads *do* terminate, as you say, but only after a 1 minute (?) timeout has elapsed, and then only if the queue actually stays empty for that whole time. If requests trickle in, they will get picked up and handled (*very* slowly, because we're relying on individual unrelated timeouts expiring), but then the threads will (potentially) wait a minute again. The only part I don't understand is that it (subjectively :-) seems like much longer than 1 minute. I'll go measure that (after I hit "Commit" on this comment :-). I think a counter is a better idea than a boolean in any case, because it's definitely a wedging that easily can, but shouldn't, be happening. Also, if the DNS hang is reasonably common and semi-permanent for some reason (perhaps one of the Windows bugs mentioned earlier), then the various threads will slowly become wedged over time and the pool will get smaller and smaller, which will force this wedging to happen more and more frequently. I suppose it might be nice to have a watchdog on the DNS threads to avoid such problems, but perhaps that's asking too much... Of course, this is still all hypothetical. This might not be where we're getting wedged at all... it could, for example, be some kind of deadlock-with-timeout inside Windows' version of getaddrinfo that only a few people's machines are experiencing. I do have MSVC... any suggestions for (hopefully a not-too-time-consuming) way to proceed with that? I suppose if I break in and "a bunch" of threads are all waiting somewhere, that would be a clue... but I probably don't have time to figure out how to set up a whole build environment in order to get myself some symbols to debug against...
Assignee | ||
Comment 50•20 years ago
|
||
we only wait on the condition variable if !mHaveIdleThread, which means that there should only ever be one thread waiting on the condition variable. yeah, building mozilla on win32 is somewhat involved since it requires cygwin. even without symbols, the stack traces may be valuable since MSVC will tell us the DLLs for each function on the stack. if we see > 8 threads coming from necko.dll stuck inside ws2_32.dll then that will be a good clue that those threads are stuck inside win32 resolver code.
Comment 51•20 years ago
|
||
Dang... that seemed so promising. I got it running in MSVC, but I'm getting the very weird result that all the threads are stuck at the same address in ntdll.dll (and I don't happen to have the symbols for that to see if that's a "reasonable" place). The weird part is that I mean *all* the threads, main and otherwise. That's just too weird. So I'm quite suspicious of that result. If it were true, that would point to either a Windows bug or some kind of API misuse. I'll go try to debug some more. I'm downloading symbols (and also WinDbg, because it can often resolve stuff that MSVC 6.0 can't). Normally, I'd already have those on my machine... but I switched machines a few months ago and haven't had the need until now...
Comment 52•20 years ago
|
||
Ok, I think I may have tracked this down. Here's a comment I found at a network optimization site (http://snakefoot.fateback.com/tweak/winnt/network.html): --- Start of quote --- Note if having activated SYN-Attack-Protection or installed WinXP SP2, a limit is introduced on how many connection attempts (half-open) one can make simultanously. This will limit worms like blaster and sasser from spreading too fast, but it will also limit other applications that creates many new connections simultanously (Like P2P). EventID 4226: TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts --- End of quote --- Looking in my event log, I find that indeed I'm seeing this event being logged at around the time when this "freeze" is happening. So, it is likely the MS is gating on the number of simultaneously open connection attempts and thereafter letting Firefox only make connection attempts at some fixed (slow) rate. The question then becomes, what is IE doing differently that doesn't invoke this protection. I'm guessing that perhaps it persists its connections or avoids too many connections in SYN_WAIT state or something. I'll continue to investigate. Other reporters should check their event log (Control Panel->Administrative Tools->Event Log->System) for this error to double check my result. If this is indeed the problem, the question becomes what to do about it. It's possible to just say "it's a Windows bug", but that's quite unsatisfying. If I have more of a complaint with the intent of what MS is trying to do with this security "fix" I would tend to be less irritated, but what they are trying to do seems reasonable, actually.
Comment 53•20 years ago
|
||
I'm feeling pretty postive that my last comment is on track, but just in case, here's a couple of stack dumps from WinDbg showing where Firefox's threads are. Several of them are waiting on condition variables (though there's not enough stack information to really be sure of that), and there's one interesting commonality between the two dumps in this file, which is that one of the threads is actually in PR_MD_UNLOCK, which is (seemingly) calling select(). I've seen some other indications that perhaps Windows has some performance issues with select in some circumstances, but in any case it seems odd that both dumps would show the same situation, so I thought it would be worth mentioning.
Comment 54•20 years ago
|
||
Ok, I've found out why I'm one of the few people that sees this, and have a workaround. I had bumped up my max connections parameters (I occasionally want to download a large number of files from very slow servers in parallel). I was allowing a max of 100 persistent connections, and something like 30 per server. When I reset these all to the defaults, this problem went away entirely. I'm ok (not hugely happy, but ok) with that as a workaround, but a true fix would be to separately limit the number of half-open connections to the XPSP2 limit to avoid triggering this security "feature". The only fix to in XPSP2 appears to be patching the TCP/IP stack, which I'm not willing to do (in SP2 there's no registry key controlling how many half-open connections it will allow, unlike the older SYN attack fix). Clarification: Total open connections aren't limited in SP2, just incompleted "half open" ones. Once this limit is hit, your process is hosed for good.
Comment 55•20 years ago
|
||
BTW, the XP SP2 limit I mention in comment 54 is 10 simultaneous half-open connections. So with Firefox's default max connections of 24, there is still the possibility that this could happen if you really banged on it hard (very unlikely, but possible). I'm thinking that if you looked at a page with images loaded from 30 different sites, that that would prompt Mozilla to possibly trigger this even with the default max-connections, so I'm going to try that when I have time. As for why this looks like a DNS issue (superficially?), perhaps someone with more knowledge of Firefox's TCP intricacies than I could say, but I'm guessing that that just happens to be the most common message to get "stuck" in the status bar...
Assignee | ||
Comment 56•20 years ago
|
||
great work tracking this down Ray!!! the default max number of connections applies to all sites. we limit the number of persistent connections to just 2 per site. so, you'd really have to exercise the browser in some extraordinary way to trigger this bug given our default prefs. downgrading severity to minor since this does not happen or at least is virtually impossible to replicate given the default mozilla prefs. we don't really support modifying the default http connection limits. the defaults are set according to the recommendations of RFC 2616 (as well as some basic performance tuning i did several years back). but, yeah... it would be good to address this issue better. perhaps by putting a hard cap on the number of persistent connections, or something like that. futuring this bug. patches welcome :)
Severity: major → minor
Keywords: helpwanted
Summary: DNS name resolution fails in mozilla; suceeds with nslookup; dotted IP addr works for URL → DNS name resolution fails under WinXP-SP2 when max persistent connections bumped up to 100
Target Milestone: mozilla1.8beta → Future
Comment 57•20 years ago
|
||
I agree with the severity assessment. I tried to really beat up on it this morning. I tried creating a document with tons of external images to different sites, shift-refreshing frequently... trying to get it to make all 24 connections simultaneously. I also tried rapidly Ctrl-clicking on a ton of links on news.google.com. Only once did I manage to get it to trigger XP's connection limit security feature (verified in the event log), and I couldn't repro that afterwards even if I stopped XP's DNS cache service, set the DNS cache limit to 1/1 in Firefox, and cleared the cache between attempts. If this gets implemented in the future, the key (from what I can figure out looking at the security feature's documentation) will be limiting the number of connections that are in the SYN_WAIT state to 10 (I believe this state indicates that the TCP connection request has been sent out but no response has yet been received). The one thing that worries me about my failure to repro this as described above is that the intervening routers cache various bits of connection state, so it may very well be that I couldn't repro it because the connections were all being made very quickly the second and subsequent times (I have only so many unique sites I can find in large numbers in one place :-). Anyway, there's no need to hard limit the total number of active connections to 10, and indeed I would find that very inconvenient.
Reporter | ||
Comment 58•20 years ago
|
||
Well, I'm happy that you found and resolved a problem with Mozilla, but; since I found the original reported problem on W2K and XP SP1, and since it wasn't confirmed to be resolved by increasing the number of connections in WXP-SP2 allows. I don't know why you didn't choose to create you own bug report and add your fixes to it. I believe you resolved a much more narrow problem than was first suggested, and also, I know I read of reports of similar symptoms on Linux. Now, what are those users to use as a bug report??
Comment 59•20 years ago
|
||
He's right, this looks like a totally different bug (originally it looked similar). I hate to suggest this, because it's "practical" solution that could be construed as disrespectful to the original bug poster, but given that almost all the discussion here is about the bug I found, perhaps the most effective thing to do would be to re-post the original bug so it can be investigated separately? My apologies for (unwittingly) hijacking your bug.
Assignee | ||
Comment 60•20 years ago
|
||
Ack, yikes... sorry... I too didn't mean to morph the original bug report so badly!
Reporter | ||
Comment 61•20 years ago
|
||
Ok, I have enough experiencial evidence to conclude that the problem I originally reported with MOzilla was in fact some problem with the DNS cache service on windows. It turns out, not all tools (e.q. nslookup must resolve with this service) and so some tools can receive good look-ups while Mozilla could miss some host URL. It appears the windows DNS cache service freezes up at some point, and changing host IP do not get resolved. Stopping and restarting (or just disabling the service) resolved thia problem on all systems, and all Mozilla versions considered. Sorry for the overlap with the other actual problem. If somebody knows how to split the problem record (or leave this embedded) they are welcomed to. Best regards to all, Mike
Assignee | ||
Comment 62•20 years ago
|
||
marking this bug invalid. i think in both cases, the bug is not really with mozilla but with the underlying operating system. or, you could say in the multiple connections case that the bug is really a feature of the OS, and that the problem here is that the user configured mozilla in an unsupported way. that'd still make this bug report invalid.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•