Closed Bug 508292 Opened 15 years ago Closed 8 years ago

[WinXP] crash [@ wcslen | HostentBlob_WriteNameOrAlias] under PR_GetHostByName

Categories

(Core :: Networking, defect, P2)

x86
Windows XP
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
blocking1.9.1 --- needed
status1.9.1 --- wanted

People

(Reporter: martijn.martijn, Unassigned)

References

()

Details

(Keywords: crash, Whiteboard: [startupcrash][crashkill][crashkill-thirdparty][in contact with Microsoft])

Crash Data

Currently nr. 2 in the topcrasher list for Firefox3.5.2:
http://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A3.5.2&date=&range_value=1&range_unit=weeks&query_search=signature&query_type=exact&query=&do_query=1

This is a list of the stacks:
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.2&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=PR_GetHostByName

Example stack:
http://crash-stats.mozilla.com/report/index/0984e159-39b7-4515-ad9f-677f42090804
0  	 	@0x5f90005  	
1 	nspr4.dll 	PR_GetHostByName 	nsprpub/pr/src/misc/prnetdb.c:722
2 	nspr4.dll 	pr_GetAddrInfoByNameFB 	nsprpub/pr/src/misc/prnetdb.c:1987
3 	nspr4.dll 	PR_GetAddrInfoByName 	nsprpub/pr/src/misc/prnetdb.c:2016
4 	xul.dll 	nsHostResolver::ThreadFunc 	netwerk/dns/src/nsHostResolver.cpp:884
5 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426
6 	xul.dll 	nsACString_internal::MutatePrep 	xpcom/string/src/nsTSubstring.cpp:179
7 	mozcrt19.dll 	_callthreadstartex 	obj-firefox/memory/jemalloc/src/threadex.c:348

The crash seems to happen at start up. Most cases the uptime is less than 10s.

I'm not sure if these crashes are the same, I think they are, they seem to give a better stack:
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.2&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=wcslen

Example stack of these crashes:
http://crash-stats.mozilla.com/report/index/51eb5f20-c37e-4fa8-b831-fa2e12090804
0  	msvcrt.dll  	wcslen  	
1 	mswsock.dll 	mswsock.dll@0x9629 	
2 	mswsock.dll 	mswsock.dll@0x9809 	
3 	mswsock.dll 	mswsock.dll@0x96d2 	
4 	mswsock.dll 	mswsock.dll@0x266e 	
5 	mswsock.dll 	mswsock.dll@0x1b09 	
6 	ws2_32.dll 	NSPROVIDER::NSPLookupServiceNext 	
7 	ws2_32.dll 	NSPROVIDERSTATE::LookupServiceNext 	
8 	ws2_32.dll 	NSQUERY::LookupServiceNext 	
9 	ws2_32.dll 	WSALookupServiceNextW 	
10 	ws2_32.dll 	WSALookupServiceNextA 	
11 	ws2_32.dll 	getxyDataEnt 	
12 	ws2_32.dll 	gethostbyname 	
13 	nspr4.dll 	PR_GetHostByName 	nsprpub/pr/src/misc/prnetdb.c:722
14 	nspr4.dll 	pr_GetAddrInfoByNameFB 	nsprpub/pr/src/misc/prnetdb.c:1987
15 	nspr4.dll 	PR_GetAddrInfoByName 	nsprpub/pr/src/misc/prnetdb.c:2016
16 	xul.dll 	nsHostResolver::ThreadFunc 	netwerk/dns/src/nsHostResolver.cpp:884
17 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426
18 	xul.dll 	nsACString_internal::MutatePrep 	xpcom/string/src/nsTSubstring.cpp:179
19 	mozcrt19.dll 	_callthreadstartex 	obj-firefox/memory/jemalloc/src/threadex.c:348
Thanks for the bug report.  Usually this kind of crash is caused
by a bug elsewhere.  I suggest that you start the investigation in
Necko (netwerk/dns/src/nsHostResolver.cpp).

There are no NSPR changes between Firefox 3.5 and 3.5.2.

There are only a small number of NSS changes between Firefox
3.5 and 3.5.1, and no NSS changes between Firefox 3.5.1 and 3.5.2.
Assignee: nobody → nobody
Component: Libraries → Networking
Product: NSS → Core
QA Contact: libraries → networking
Version: unspecified → 1.9.1 Branch
blocking1.9.1: --- → ?
blocking1.9.1: ? → needed
So, some of these crashes actually have complete stack traces
http://code.google.com/p/chromium/issues/detail?id=17047#c0
indicates it's also a problem for chromium. We need someone to contact microsoft, we have hundreds of these.

Signature	wcslen
UUID	63d9af3c-442f-4cc2-8cd1-015972090809
Time 	2009-08-09 22:34:26.35625
Uptime	25
Last Crash	120683 seconds before submission
Product	Firefox
Version	3.5.2
Build ID	20090729225027
OS	Windows NT
OS Version	5.1.2600 Service Pack 2
CPU	x86
CPU Info	GenuineIntel family 15 model 4 stepping 9
Crash Reason	EXCEPTION_ACCESS_VIOLATION
Crash Address	0x0
User Comments	
Processor Notes 	
Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	msvcrt.dll 	wcslen 	
1 	mswsock.dll 	HostentBlob_WriteNameOrAlias 	
2 	mswsock.dll 	HostentBlob_CreateFromRecords 	
3 	mswsock.dll 	HostentBlob_WriteNameOrAlias 	
4 	mswsock.dll 	Rnr_DoDnsLookup 	
5 	mswsock.dll 	NSPLookupServiceNext 	
6 	ws2_32.dll 	NSPROVIDER::NSPLookupServiceNext 	
7 	ws2_32.dll 	NSPROVIDERSTATE::LookupServiceNext 	
8 	ws2_32.dll 	NSQUERY::LookupServiceNext 	
9 	ws2_32.dll 	WSALookupServiceNextW 	
10 	ws2_32.dll 	WSALookupServiceNextA 	
11 	ws2_32.dll 	getxyDataEnt 	
12 	ws2_32.dll 	gethostbyname 	
13 	nspr4.dll 	PR_GetHostByName 	nsprpub/pr/src/misc/prnetdb.c:722
14 	nspr4.dll 	pr_GetAddrInfoByNameFB 	nsprpub/pr/src/misc/prnetdb.c:1987
15 	nspr4.dll 	PR_GetAddrInfoByName 	nsprpub/pr/src/misc/prnetdb.c:2016
16 	xul.dll 	nsHostResolver::ThreadFunc 	netwerk/dns/src/nsHostResolver.cpp:884
17 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426
18 	xul.dll 	nsACString_internal::MutatePrep 	xpcom/string/src/nsTSubstring.cpp:179
19 	mozcrt19.dll 	_callthreadstartex 	obj-firefox/memory/jemalloc/src/threadex.c:348
so, i went through about 200 crashes from the url, and all but two are the signature i'm listing, i've filed bug 509365 for the other two.
Summary: Topcrash [@ PR_GetHostByName] for Firefox3.5.2 → Topcrash [@ wcslen - HostentBlob_WriteNameOrAlias] under [@ PR_GetHostByName] for Firefox3.5.2
I looked though a sample of wcslen crashes from the 8/19

843 total crashes for wcslen on 20090819-crashdata.csv
408 start up crashes inside 3 minutes

os breakdown
 400 wcslen Windows NT 5.1.2600 Service Pack 3
 385 wcslen Windows NT 5.1.2600 Service Pack 2
  30 wcslen Windows NT 6.0.6001 Service Pack 1
   7 wcslen Windows NT 6.0.6000
   6 wcslen Windows NT 5.1.2600 Service Pack 1
   6 wcslen Windows NT 5.1.2600
   5 wcslen Windows NT 6.0.6002 Service Pack 2
   1 wcslen Windows NT 6.1.7100
   1 wcslen Windows NT 5.2.3790 Service Pack 2
   1 wcslen Windows NT 5.2.3790 Service Pack 1
   1 wcslen Windows NT 5.1.2600 Service Pack 3, v.3264

distribution of versions where the crash was found on 20090819-crashdata.csv
 494 Firefox 3.5.2
 251 Firefox 3.0.13
  21 Firefox 3.5
  19 Firefox 3.0.11
  11 Firefox 3.5.1
  11 Firefox 3.0.12
   9 Firefox 3.0
   7 Firefox 3.0.10
   6 Firefox 3.0.5
   4 Firefox 3.0.6
   3 Firefox 3.1b3
   3 Firefox 3.0.7
   2 Firefox 3.0.1

There is also some interesting stuff in the urls

an abnormaly high pct. of the crash urls are from the top level domain in turkey..

#crashes -  domain where the user might be when the crash happens
 210 http://www.google.com.tr
  45 http://tr.www.mozilla.com
  17 http://images.google.com.tr
   1 http://turkoloji.cu.edu.tr
  1 http://s12.travian.com.tr
   1 https://mail.etu.edu.tr
 and many more assorted...

and various other turkish sites or content

 2 http://turkcedublajsinema.bloggum.com/yazi

maybe some wild speculation but I wonder if that is somehow related to the hostname/dns look ups that are going on in the stack in comment 2 and there is some incompatibility we hit related to turkish sites...
maybe erkan can help keep a look at for anecdotal reports of start up or other crashes in turkey were we might be hitting bad dns servers, sites with unusual host names that we might choke on, or some possible dns or content problem that might be related to a crash.
Add CC
Its interesting to see there is a definite spike in the PR_GetHostByName crashes for this on 3.5.3 and 3.0.14 in the immediate days following their release.

   6 3.5.3 20090910-crashdata.csv
  69 3.5.3 20090911-crashdata.csv
  77 3.5.3 20090912-crashdata.csv
  80 3.5.3 20090913-crashdata.csv

   2 3.0.14 20090904-crashdata.csv
   1 3.0.14 20090908-crashdata.csv
   3 3.0.14 20090910-crashdata.csv
 137 3.0.14 20090911-crashdata.csv
 236 3.0.14 20090912-crashdata.csv
 187 3.0.14 20090913-crashdata.csv

volume for 3.0.x over the same period is much lower in comparison

   1 3.0.12 20090903-crashdata.csv
   4 3.0.12 20090907-crashdata.csv
   1 3.0.12 20090908-crashdata.csv

   1 3.0.13 20090903-crashdata.csv
   1 3.0.13 20090904-crashdata.csv
   1 3.0.13 20090905-crashdata.csv
   2 3.0.13 20090910-crashdata.csv

volume for 3.5.2 is much more steady and around the same level as 3.5.3

  81 3.5.2 20090901-crashdata.csv
  93 3.5.2 20090902-crashdata.csv
  76 3.5.2 20090903-crashdata.csv
  57 3.5.2 20090904-crashdata.csv
  94 3.5.2 20090905-crashdata.csv
  91 3.5.2 20090906-crashdata.csv
  50 3.5.2 20090907-crashdata.csv
  81 3.5.2 20090908-crashdata.csv
  55 3.5.2 20090909-crashdata.csv
  51 3.5.2 20090910-crashdata.csv
  33 3.5.2 20090911-crashdata.csv
  33 3.5.2 20090912-crashdata.csv
  28 3.5.2 20090913-crashdata.csv


This might indicate that the PR_GetHostByName problem is associated with new installs and/or updates.  Martin's comment 0 and filing of this bug was also right around the release time of 3.5.2 indicating a possible spike around that release.
my guess is that people have a problem (caused by LSPs), each time there's a new release available, they try it to see if it fixes the problem. it doesn't fix the problem, so they stop using firefox. this repeats with each release...
Assignee: nobody → dolske
Summary: Topcrash [@ wcslen - HostentBlob_WriteNameOrAlias] under [@ PR_GetHostByName] for Firefox3.5.2 → Topcrash [@ wcslen] - HostentBlob_WriteNameOrAlias under [@ PR_GetHostByName]
Blocking 1.9.2+ as part of the CrashKill effort.
Flags: blocking1.9.2+
Marking all topcrash bugs as P2 (3.6 release blockers, but not 3.6b1 blockers)
Priority: -- → P2
Summary: Topcrash [@ wcslen] - HostentBlob_WriteNameOrAlias under [@ PR_GetHostByName] → Topcrash [@ wcslen] - HostentBlob_WriteNameOrAlias under [@ PR_GetHostByName] on Turkish sites or in Turkish locale
Indeed, looking through URLs associated with these crashes there are tons of Turkish sites. Including google.com.tr and tr.www.mozilla.com. I see these crashes with both short and long uptimes; but would have to have Lars run some DB queries to see if there's a strong correlation or not.

I can't reproduce the crash surfing around the various .tr URLs, next step would be to try changing locale and use a localized Firefox. I also can't help but wonder if it's possible there's a buggy or malicious Turkish ISP / DNS server that's twiddling some MS bug. Seems less likely, though, or we'd see it everywhere by now.

I happened to notice that one crash stack passed through cafeplusfilterhook.dll in an early frame, Googling this leads me to what appears to be Turkish filtering software, http://akinsoft.com.tr/. Probably just a coincidence, though, I don't see the module in other crash reports.

Also, dbaron noted that Turkish has interesting rules for converting between upper and lowercase letters (eg, http://en.wikipedia.org/wiki/Dotted_and_dotless_I). It's possible this could be involved in the crash, especially if buggy code is expecting the upper/lowercase transformation of a UTF8 string to have the same number of bytes.
AFAIK, TT (Türk Telekom) DNS cache servers use a filtering for blocking some sites. So it may be bugs in their DNS replies.
Summary: Topcrash [@ wcslen] - HostentBlob_WriteNameOrAlias under [@ PR_GetHostByName] on Turkish sites or in Turkish locale → Topcrash [@ wcslen | HostentBlob_WriteNameOrAlias] under PR_GetHostByName on Turkish sites or in Turkish locale
Might be interesting to see if we have a minidump available for this from the 24 hours of saved minidumps. We could at least verify that we're not passing bad data to gethostbyname then. Unfortunately without the heap data we probably can't find out exactly what is being passed.

Assuming we're not somehow passing bad data down, this has to be a Windows bug. It's quite deep in Windows system libraries, with no third-party code on the stack.
I acquired a dump, and I can say that we are passing a valid-looking char* to gethostbyname. Without heap memory there's not much else to be said.
Yeah, I pulled a couple of minidumps last week and looked, seemed legit but the pointer wasn't to anything on the stack.

The 2nd frame's name (HostentBlob_CreateFromRecords) makes me suspect a bug with the way the OS's resolver is parsing the reply. Wonder if we can find someone who can reproduce this, record the network traffic with a packet sniffer, and forward on to Microsoft...
Getting a network trace or a full-memory minidump would definitely be ideal here. Could probably use some outreach to the Turkish community.
they are on the cc.
> they are on the cc.

But most of them live out of Turkey or don't use Windows. :(
Are there any Turkish Firefox user forums, where we might find people experiencing the problem and willing to help?

We might also try emailing people who have provided their email address in the crash report, although I'm wary of the odds of reaching advanced users that way (who could help with network analysis).
dolske: unfortunately we stopped collecting email addresses a while ago because we weren't actually using them. (bug 472357 unfortunately has yet to be fixed)
Whiteboard: [crashkill][crashkill-thirdparty]
(Clearing regression / regression-window-wanted, since this doesn't seem to actually be a regression, has unknown steps to reproduce, and is in a MS library.)
(In reply to comment #19)

> But most of them live out of Turkey or don't use Windows. :(

Tomcat pointed out that you're on our Turkish L10N team and do QA, would you be able to try and reproduce this?
CCing Kadir, he might have contacts within Turkey that can help?
I am going to pile on a bit.  Erkan - As the Turkish locale owner, can you help find reproducible test cases from the Turkish sites.  Or are there others that you know that might be able to help?
Well, most of my relatives in Turkey do use Firefox, but they aren't exactly computer experts. And with unknown steps to reproduce this bug I guess they won't be of much help.
(In reply to comment #23)
> (In reply to comment #19)
> 
> > But most of them live out of Turkey or don't use Windows. :(
> 
> Tomcat pointed out that you're on our Turkish L10N team and do QA, would you be
> able to try and reproduce this?

Well, I'm a part of Turkish L10N team, but live not in Turkey. IIRC, this issue is reproducible only in Turkey.

I can try to reproduce this bug within a virtual machine outside of Turkey. If it is not repoducible I'll try get help from other community members...
Belated update: last week we made contact with the Microsoft folks and have a case open for this. I suspect progress will be slow, though, until we're able to collect additional data from a user experiencing the problem.
Whiteboard: [crashkill][crashkill-thirdparty] → [crashkill][crashkill-thirdparty][in contact with Microsoft]
I had our Socorro guys run a query to dump all reports for this crash within the last 3 months where the user added a comment or provided their email address (bug 525962)... Unfortunately none of the 42 such submissions contained an email address in either field. [We might try running this query again later, because it was only recently that we started saving submitted email addresses.]

bp-f0f54cac-2bcd-4b35-8cde-7262a2091018 was one of the few that contained a comment that was (1) of significant length (2) in English (3) and wasn't cursing us. :-) 

"this version of Firefox is really makes me messed up, i have never had such a redicilous problems ever, it's happenning all the time after i make google search i touch search button and Boom Mozilla crashes i have disabled  ALL Extensions and Plug- in but no way what's wrong with this verison of Firefox :S, i haev never had problems with this browser..."

That does suggest that this is happening repeatedly to users, and that the crashes on Google are not just an artifact of out sometimes-broken reporting of the URL that the user crashed on.

One curious thing about the crash description is that it's a bit strange to be doing a DNS resolution for a search submission. Maybe a google.com --> google.tr transition? Or other content in the page?

This also makes me wonder if DNS prefetching could be exasperating the problem. If so, turning it off in Turkish localized builds might help mitigate these crashes (though I don't know how many users are actually using the localized version).
I've tried to reproduce this bug but without luck.

For those who want to investigate the DNS replies from servers used for most of Internet users in Turkey, here are DNS servers of TTNet (the only, AFAIK, uplink owner):

195.175.39.39
195.175.39.40

They work even if you are not within their network. And don't be surprised if you cannot reach youtude.com. :D
I wonder if there is any kind of tor debugging mode that might force traffic through Turkey that could also help out in others being able to participate in debuging this?
In talks with Microsoft, still need to repro, so unblocking.
Flags: blocking1.9.2+ → blocking1.9.2-
We collected some info from Google Chrome crashes in getaddrinfo.
We can't establish a relation to the Turkish locale.  Our findings
are summarized at
http://code.google.com/p/chromium/issues/detail?id=22083#c16
(Unassigning, I'm not actively tracking this)
Assignee: dolske → nobody
Crash Signature: [@ wcslen | HostentBlob_WriteNameOrAlias]
Still a valid crash but appears at #170 on 8.0. Removing the top crash keyword.
Keywords: topcrash
Based on recent comments, I doubt it's still fully correlated to Turkish users.

I see Vietnamese comments confirmed by correlations:
     41% (23/56) vs.   1% (964/136061) UKHook40.dll (Vietnamese keyboard)
Russian ones confirmed also by correlations:
     11% (6/56) vs.   3% (4511/136061) vb@yandex.ru
      9% (5/56) vs.   3% (4437/136061) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
Turkish and English ones.

More reports at:
https://crash-stats.mozilla.com/report/list?signature=wcslen+|+HostentBlob_WriteNameOrAlias
Summary: Topcrash [@ wcslen | HostentBlob_WriteNameOrAlias] under PR_GetHostByName on Turkish sites or in Turkish locale → [WinXP] crash [@ wcslen | HostentBlob_WriteNameOrAlias] under PR_GetHostByName
Whiteboard: [crashkill][crashkill-thirdparty][in contact with Microsoft] → [startupcrash][crashkill][crashkill-thirdparty][in contact with Microsoft]
Version: 1.9.1 Branch → Trunk
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.