Closed Bug 508292 Opened 15 years ago Closed 9 years ago

[WinXP] crash [@ wcslen | HostentBlob_WriteNameOrAlias] under PR_GetHostByName

Categories

(Core :: Networking, defect, P2)

x86
Windows XP
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
blocking1.9.1 --- needed
status1.9.1 --- wanted

People

(Reporter: martijn.martijn, Unassigned)

References

()

Details

(Keywords: crash, Whiteboard: [startupcrash][crashkill][crashkill-thirdparty][in contact with Microsoft])

Crash Data

Currently nr. 2 in the topcrasher list for Firefox3.5.2: http://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A3.5.2&date=&range_value=1&range_unit=weeks&query_search=signature&query_type=exact&query=&do_query=1 This is a list of the stacks: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.2&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=PR_GetHostByName Example stack: http://crash-stats.mozilla.com/report/index/0984e159-39b7-4515-ad9f-677f42090804 0 @0x5f90005 1 nspr4.dll PR_GetHostByName nsprpub/pr/src/misc/prnetdb.c:722 2 nspr4.dll pr_GetAddrInfoByNameFB nsprpub/pr/src/misc/prnetdb.c:1987 3 nspr4.dll PR_GetAddrInfoByName nsprpub/pr/src/misc/prnetdb.c:2016 4 xul.dll nsHostResolver::ThreadFunc netwerk/dns/src/nsHostResolver.cpp:884 5 nspr4.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:426 6 xul.dll nsACString_internal::MutatePrep xpcom/string/src/nsTSubstring.cpp:179 7 mozcrt19.dll _callthreadstartex obj-firefox/memory/jemalloc/src/threadex.c:348 The crash seems to happen at start up. Most cases the uptime is less than 10s. I'm not sure if these crashes are the same, I think they are, they seem to give a better stack: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.2&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=wcslen Example stack of these crashes: http://crash-stats.mozilla.com/report/index/51eb5f20-c37e-4fa8-b831-fa2e12090804 0 msvcrt.dll wcslen 1 mswsock.dll mswsock.dll@0x9629 2 mswsock.dll mswsock.dll@0x9809 3 mswsock.dll mswsock.dll@0x96d2 4 mswsock.dll mswsock.dll@0x266e 5 mswsock.dll mswsock.dll@0x1b09 6 ws2_32.dll NSPROVIDER::NSPLookupServiceNext 7 ws2_32.dll NSPROVIDERSTATE::LookupServiceNext 8 ws2_32.dll NSQUERY::LookupServiceNext 9 ws2_32.dll WSALookupServiceNextW 10 ws2_32.dll WSALookupServiceNextA 11 ws2_32.dll getxyDataEnt 12 ws2_32.dll gethostbyname 13 nspr4.dll PR_GetHostByName nsprpub/pr/src/misc/prnetdb.c:722 14 nspr4.dll pr_GetAddrInfoByNameFB nsprpub/pr/src/misc/prnetdb.c:1987 15 nspr4.dll PR_GetAddrInfoByName nsprpub/pr/src/misc/prnetdb.c:2016 16 xul.dll nsHostResolver::ThreadFunc netwerk/dns/src/nsHostResolver.cpp:884 17 nspr4.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:426 18 xul.dll nsACString_internal::MutatePrep xpcom/string/src/nsTSubstring.cpp:179 19 mozcrt19.dll _callthreadstartex obj-firefox/memory/jemalloc/src/threadex.c:348
Thanks for the bug report. Usually this kind of crash is caused by a bug elsewhere. I suggest that you start the investigation in Necko (netwerk/dns/src/nsHostResolver.cpp). There are no NSPR changes between Firefox 3.5 and 3.5.2. There are only a small number of NSS changes between Firefox 3.5 and 3.5.1, and no NSS changes between Firefox 3.5.1 and 3.5.2.
Assignee: nobody → nobody
Component: Libraries → Networking
Product: NSS → Core
QA Contact: libraries → networking
Version: unspecified → 1.9.1 Branch
blocking1.9.1: --- → ?
blocking1.9.1: ? → needed
So, some of these crashes actually have complete stack traces http://code.google.com/p/chromium/issues/detail?id=17047#c0 indicates it's also a problem for chromium. We need someone to contact microsoft, we have hundreds of these. Signature wcslen UUID 63d9af3c-442f-4cc2-8cd1-015972090809 Time 2009-08-09 22:34:26.35625 Uptime 25 Last Crash 120683 seconds before submission Product Firefox Version 3.5.2 Build ID 20090729225027 OS Windows NT OS Version 5.1.2600 Service Pack 2 CPU x86 CPU Info GenuineIntel family 15 model 4 stepping 9 Crash Reason EXCEPTION_ACCESS_VIOLATION Crash Address 0x0 User Comments Processor Notes Crashing Thread Frame Module Signature [Expand] Source 0 msvcrt.dll wcslen 1 mswsock.dll HostentBlob_WriteNameOrAlias 2 mswsock.dll HostentBlob_CreateFromRecords 3 mswsock.dll HostentBlob_WriteNameOrAlias 4 mswsock.dll Rnr_DoDnsLookup 5 mswsock.dll NSPLookupServiceNext 6 ws2_32.dll NSPROVIDER::NSPLookupServiceNext 7 ws2_32.dll NSPROVIDERSTATE::LookupServiceNext 8 ws2_32.dll NSQUERY::LookupServiceNext 9 ws2_32.dll WSALookupServiceNextW 10 ws2_32.dll WSALookupServiceNextA 11 ws2_32.dll getxyDataEnt 12 ws2_32.dll gethostbyname 13 nspr4.dll PR_GetHostByName nsprpub/pr/src/misc/prnetdb.c:722 14 nspr4.dll pr_GetAddrInfoByNameFB nsprpub/pr/src/misc/prnetdb.c:1987 15 nspr4.dll PR_GetAddrInfoByName nsprpub/pr/src/misc/prnetdb.c:2016 16 xul.dll nsHostResolver::ThreadFunc netwerk/dns/src/nsHostResolver.cpp:884 17 nspr4.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:426 18 xul.dll nsACString_internal::MutatePrep xpcom/string/src/nsTSubstring.cpp:179 19 mozcrt19.dll _callthreadstartex obj-firefox/memory/jemalloc/src/threadex.c:348
so, i went through about 200 crashes from the url, and all but two are the signature i'm listing, i've filed bug 509365 for the other two.
Summary: Topcrash [@ PR_GetHostByName] for Firefox3.5.2 → Topcrash [@ wcslen - HostentBlob_WriteNameOrAlias] under [@ PR_GetHostByName] for Firefox3.5.2
I looked though a sample of wcslen crashes from the 8/19 843 total crashes for wcslen on 20090819-crashdata.csv 408 start up crashes inside 3 minutes os breakdown 400 wcslen Windows NT 5.1.2600 Service Pack 3 385 wcslen Windows NT 5.1.2600 Service Pack 2 30 wcslen Windows NT 6.0.6001 Service Pack 1 7 wcslen Windows NT 6.0.6000 6 wcslen Windows NT 5.1.2600 Service Pack 1 6 wcslen Windows NT 5.1.2600 5 wcslen Windows NT 6.0.6002 Service Pack 2 1 wcslen Windows NT 6.1.7100 1 wcslen Windows NT 5.2.3790 Service Pack 2 1 wcslen Windows NT 5.2.3790 Service Pack 1 1 wcslen Windows NT 5.1.2600 Service Pack 3, v.3264 distribution of versions where the crash was found on 20090819-crashdata.csv 494 Firefox 3.5.2 251 Firefox 3.0.13 21 Firefox 3.5 19 Firefox 3.0.11 11 Firefox 3.5.1 11 Firefox 3.0.12 9 Firefox 3.0 7 Firefox 3.0.10 6 Firefox 3.0.5 4 Firefox 3.0.6 3 Firefox 3.1b3 3 Firefox 3.0.7 2 Firefox 3.0.1 There is also some interesting stuff in the urls an abnormaly high pct. of the crash urls are from the top level domain in turkey.. #crashes - domain where the user might be when the crash happens 210 http://www.google.com.tr 45 http://tr.www.mozilla.com 17 http://images.google.com.tr 1 http://turkoloji.cu.edu.tr 1 http://s12.travian.com.tr 1 https://mail.etu.edu.tr and many more assorted... and various other turkish sites or content 2 http://turkcedublajsinema.bloggum.com/yazi maybe some wild speculation but I wonder if that is somehow related to the hostname/dns look ups that are going on in the stack in comment 2 and there is some incompatibility we hit related to turkish sites...
maybe erkan can help keep a look at for anecdotal reports of start up or other crashes in turkey were we might be hitting bad dns servers, sites with unusual host names that we might choke on, or some possible dns or content problem that might be related to a crash.
Add CC
Its interesting to see there is a definite spike in the PR_GetHostByName crashes for this on 3.5.3 and 3.0.14 in the immediate days following their release. 6 3.5.3 20090910-crashdata.csv 69 3.5.3 20090911-crashdata.csv 77 3.5.3 20090912-crashdata.csv 80 3.5.3 20090913-crashdata.csv 2 3.0.14 20090904-crashdata.csv 1 3.0.14 20090908-crashdata.csv 3 3.0.14 20090910-crashdata.csv 137 3.0.14 20090911-crashdata.csv 236 3.0.14 20090912-crashdata.csv 187 3.0.14 20090913-crashdata.csv volume for 3.0.x over the same period is much lower in comparison 1 3.0.12 20090903-crashdata.csv 4 3.0.12 20090907-crashdata.csv 1 3.0.12 20090908-crashdata.csv 1 3.0.13 20090903-crashdata.csv 1 3.0.13 20090904-crashdata.csv 1 3.0.13 20090905-crashdata.csv 2 3.0.13 20090910-crashdata.csv volume for 3.5.2 is much more steady and around the same level as 3.5.3 81 3.5.2 20090901-crashdata.csv 93 3.5.2 20090902-crashdata.csv 76 3.5.2 20090903-crashdata.csv 57 3.5.2 20090904-crashdata.csv 94 3.5.2 20090905-crashdata.csv 91 3.5.2 20090906-crashdata.csv 50 3.5.2 20090907-crashdata.csv 81 3.5.2 20090908-crashdata.csv 55 3.5.2 20090909-crashdata.csv 51 3.5.2 20090910-crashdata.csv 33 3.5.2 20090911-crashdata.csv 33 3.5.2 20090912-crashdata.csv 28 3.5.2 20090913-crashdata.csv This might indicate that the PR_GetHostByName problem is associated with new installs and/or updates. Martin's comment 0 and filing of this bug was also right around the release time of 3.5.2 indicating a possible spike around that release.
my guess is that people have a problem (caused by LSPs), each time there's a new release available, they try it to see if it fixes the problem. it doesn't fix the problem, so they stop using firefox. this repeats with each release...
Assignee: nobody → dolske
Summary: Topcrash [@ wcslen - HostentBlob_WriteNameOrAlias] under [@ PR_GetHostByName] for Firefox3.5.2 → Topcrash [@ wcslen] - HostentBlob_WriteNameOrAlias under [@ PR_GetHostByName]
Blocking 1.9.2+ as part of the CrashKill effort.
Flags: blocking1.9.2+
Marking all topcrash bugs as P2 (3.6 release blockers, but not 3.6b1 blockers)
Priority: -- → P2
Summary: Topcrash [@ wcslen] - HostentBlob_WriteNameOrAlias under [@ PR_GetHostByName] → Topcrash [@ wcslen] - HostentBlob_WriteNameOrAlias under [@ PR_GetHostByName] on Turkish sites or in Turkish locale
Indeed, looking through URLs associated with these crashes there are tons of Turkish sites. Including google.com.tr and tr.www.mozilla.com. I see these crashes with both short and long uptimes; but would have to have Lars run some DB queries to see if there's a strong correlation or not. I can't reproduce the crash surfing around the various .tr URLs, next step would be to try changing locale and use a localized Firefox. I also can't help but wonder if it's possible there's a buggy or malicious Turkish ISP / DNS server that's twiddling some MS bug. Seems less likely, though, or we'd see it everywhere by now. I happened to notice that one crash stack passed through cafeplusfilterhook.dll in an early frame, Googling this leads me to what appears to be Turkish filtering software, http://akinsoft.com.tr/. Probably just a coincidence, though, I don't see the module in other crash reports. Also, dbaron noted that Turkish has interesting rules for converting between upper and lowercase letters (eg, http://en.wikipedia.org/wiki/Dotted_and_dotless_I). It's possible this could be involved in the crash, especially if buggy code is expecting the upper/lowercase transformation of a UTF8 string to have the same number of bytes.
AFAIK, TT (Türk Telekom) DNS cache servers use a filtering for blocking some sites. So it may be bugs in their DNS replies.
Summary: Topcrash [@ wcslen] - HostentBlob_WriteNameOrAlias under [@ PR_GetHostByName] on Turkish sites or in Turkish locale → Topcrash [@ wcslen | HostentBlob_WriteNameOrAlias] under PR_GetHostByName on Turkish sites or in Turkish locale
Might be interesting to see if we have a minidump available for this from the 24 hours of saved minidumps. We could at least verify that we're not passing bad data to gethostbyname then. Unfortunately without the heap data we probably can't find out exactly what is being passed. Assuming we're not somehow passing bad data down, this has to be a Windows bug. It's quite deep in Windows system libraries, with no third-party code on the stack.
I acquired a dump, and I can say that we are passing a valid-looking char* to gethostbyname. Without heap memory there's not much else to be said.
Yeah, I pulled a couple of minidumps last week and looked, seemed legit but the pointer wasn't to anything on the stack. The 2nd frame's name (HostentBlob_CreateFromRecords) makes me suspect a bug with the way the OS's resolver is parsing the reply. Wonder if we can find someone who can reproduce this, record the network traffic with a packet sniffer, and forward on to Microsoft...
Getting a network trace or a full-memory minidump would definitely be ideal here. Could probably use some outreach to the Turkish community.
they are on the cc.
> they are on the cc. But most of them live out of Turkey or don't use Windows. :(
Are there any Turkish Firefox user forums, where we might find people experiencing the problem and willing to help? We might also try emailing people who have provided their email address in the crash report, although I'm wary of the odds of reaching advanced users that way (who could help with network analysis).
dolske: unfortunately we stopped collecting email addresses a while ago because we weren't actually using them. (bug 472357 unfortunately has yet to be fixed)
Whiteboard: [crashkill][crashkill-thirdparty]
(Clearing regression / regression-window-wanted, since this doesn't seem to actually be a regression, has unknown steps to reproduce, and is in a MS library.)
(In reply to comment #19) > But most of them live out of Turkey or don't use Windows. :( Tomcat pointed out that you're on our Turkish L10N team and do QA, would you be able to try and reproduce this?
CCing Kadir, he might have contacts within Turkey that can help?
I am going to pile on a bit. Erkan - As the Turkish locale owner, can you help find reproducible test cases from the Turkish sites. Or are there others that you know that might be able to help?
Well, most of my relatives in Turkey do use Firefox, but they aren't exactly computer experts. And with unknown steps to reproduce this bug I guess they won't be of much help.
(In reply to comment #23) > (In reply to comment #19) > > > But most of them live out of Turkey or don't use Windows. :( > > Tomcat pointed out that you're on our Turkish L10N team and do QA, would you be > able to try and reproduce this? Well, I'm a part of Turkish L10N team, but live not in Turkey. IIRC, this issue is reproducible only in Turkey. I can try to reproduce this bug within a virtual machine outside of Turkey. If it is not repoducible I'll try get help from other community members...
Belated update: last week we made contact with the Microsoft folks and have a case open for this. I suspect progress will be slow, though, until we're able to collect additional data from a user experiencing the problem.
Whiteboard: [crashkill][crashkill-thirdparty] → [crashkill][crashkill-thirdparty][in contact with Microsoft]
I had our Socorro guys run a query to dump all reports for this crash within the last 3 months where the user added a comment or provided their email address (bug 525962)... Unfortunately none of the 42 such submissions contained an email address in either field. [We might try running this query again later, because it was only recently that we started saving submitted email addresses.] bp-f0f54cac-2bcd-4b35-8cde-7262a2091018 was one of the few that contained a comment that was (1) of significant length (2) in English (3) and wasn't cursing us. :-) "this version of Firefox is really makes me messed up, i have never had such a redicilous problems ever, it's happenning all the time after i make google search i touch search button and Boom Mozilla crashes i have disabled ALL Extensions and Plug- in but no way what's wrong with this verison of Firefox :S, i haev never had problems with this browser..." That does suggest that this is happening repeatedly to users, and that the crashes on Google are not just an artifact of out sometimes-broken reporting of the URL that the user crashed on. One curious thing about the crash description is that it's a bit strange to be doing a DNS resolution for a search submission. Maybe a google.com --> google.tr transition? Or other content in the page? This also makes me wonder if DNS prefetching could be exasperating the problem. If so, turning it off in Turkish localized builds might help mitigate these crashes (though I don't know how many users are actually using the localized version).
I've tried to reproduce this bug but without luck. For those who want to investigate the DNS replies from servers used for most of Internet users in Turkey, here are DNS servers of TTNet (the only, AFAIK, uplink owner): 195.175.39.39 195.175.39.40 They work even if you are not within their network. And don't be surprised if you cannot reach youtude.com. :D
I wonder if there is any kind of tor debugging mode that might force traffic through Turkey that could also help out in others being able to participate in debuging this?
In talks with Microsoft, still need to repro, so unblocking.
Flags: blocking1.9.2+ → blocking1.9.2-
We collected some info from Google Chrome crashes in getaddrinfo. We can't establish a relation to the Turkish locale. Our findings are summarized at http://code.google.com/p/chromium/issues/detail?id=22083#c16
(Unassigning, I'm not actively tracking this)
Assignee: dolske → nobody
Crash Signature: [@ wcslen | HostentBlob_WriteNameOrAlias]
Still a valid crash but appears at #170 on 8.0. Removing the top crash keyword.
Keywords: topcrash
Based on recent comments, I doubt it's still fully correlated to Turkish users. I see Vietnamese comments confirmed by correlations: 41% (23/56) vs. 1% (964/136061) UKHook40.dll (Vietnamese keyboard) Russian ones confirmed also by correlations: 11% (6/56) vs. 3% (4511/136061) vb@yandex.ru 9% (5/56) vs. 3% (4437/136061) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495) Turkish and English ones. More reports at: https://crash-stats.mozilla.com/report/list?signature=wcslen+|+HostentBlob_WriteNameOrAlias
Summary: Topcrash [@ wcslen | HostentBlob_WriteNameOrAlias] under PR_GetHostByName on Turkish sites or in Turkish locale → [WinXP] crash [@ wcslen | HostentBlob_WriteNameOrAlias] under PR_GetHostByName
Whiteboard: [crashkill][crashkill-thirdparty][in contact with Microsoft] → [startupcrash][crashkill][crashkill-thirdparty][in contact with Microsoft]
Version: 1.9.1 Branch → Trunk
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.