Last Comment Bug 290190 - crash when network connection drops and reconnects [@ msvcrt.dll - nsDNSRecord::GetNextAddr ]
: crash when network connection drops and reconnects [@ msvcrt.dll - nsDNSRecor...
Status: RESOLVED DUPLICATE of bug 337418
: crash
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: All All
: P3 critical (vote)
: mozilla1.9alpha1
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
http://gmail.google.com/
: 293111 (view as bug list)
Depends on: 426060
Blocks:
  Show dependency treegraph
 
Reported: 2005-04-13 11:22 PDT by Darin Fisher
Modified: 2011-08-05 22:28 PDT (History)
12 users (show)
asa: blocking1.8rc1+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
v1 patch - bandaid (1.18 KB, patch)
2005-10-12 17:33 PDT, Darin Fisher
cbiesinger: review+
bzbarsky: superreview+
asa: approval1.8rc1+
Details | Diff | Splinter Review

Description Darin Fisher 2005-04-13 11:22:18 PDT
crash when network connection drops and reconnects, while using gmail.

see:
http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=2&type=iid&id=5046300
Comment 1 timeless 2005-04-13 13:00:09 PDT
this has happened to our product too. i couldn't figure out what to do with it.
Comment 2 Darin Fisher 2005-05-06 01:40:18 PDT
*** Bug 293111 has been marked as a duplicate of this bug. ***
Comment 3 Ginn Chen 2005-07-19 02:53:13 PDT
Also happens on Mac OS X...

Stack Signature	 0xffff89c0 0d35ece4
Product ID	Firefox10
Build ID	2005071117
Trigger Time	2005-07-17 19:59:30.0
Platform	MacOSX
Operating System	Darwin 8.2.0
Module	
URL visited	
User Comments	
Since Last Crash	298633 sec
Total Uptime	298633 sec
Trigger Reason	SIGBUS: Bus Error: (signal 10)
Source File, Line No.	N/A
Stack Trace	
0xffff89c0
nsDNSRecord::GetNextAddr()  [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/
netwerk/dns/src/nsDNSService2.cpp, line 122]
nsSocketTransport::OnSocketDetached()  [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/
mozilla/netwerk/base/src/nsSocketTransport2.cpp, line 1442]
nsSocketTransportService::DetachSocket(nsSocketTransportService::SocketContext*)()  [/builds/
tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/netwerk/base/src/
nsSocketTransportService2.cpp, line 187]
nsSocketTransportService::Run()  [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/
netwerk/base/src/nsSocketTransportService2.cpp, line 548]
nsThread::Main()  [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/xpcom/threads/
nsThread.cpp, line 607]
_pt_root()  [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/nsprpub/pr/src/
pthreads/ptthread.c, line 217]
libSystem.B.dylib.88.0.0 + 0x2c3d4 (0x9002c3d4)
Comment 4 Asa Dotzler [:asa] 2005-10-10 16:59:38 PDT
This looks high-profile. If we can get a low-risk patch, we should take it.
Comment 5 Darin Fisher 2005-10-12 17:33:12 PDT
Created attachment 199372 [details] [diff] [review]
v1 patch - bandaid

It seems like we are crashing on the memcpy in nsDNSRecord::GetNextAddr, which
implies that the nsHostRecord's addr_info field is null.  In that case, we
expect the addr field to be non-null.  The addr_info field is null and the addr
field is non-null when the hostname was an IP address literal that could simply
be parsed.  I doubt this patch will actually help.  I suspect it is more likely
that the nsHostRecord data structure is getting corrupted, but I figure that
this patch is worth a shot since it might help.  If it does, then at least we
know that addr is somehow ending up null, and we can go from there.  Otherwise,
we're no worse off than we are now.
Comment 6 Christian :Biesinger (don't email me, ping me on IRC) 2005-10-13 15:43:16 PDT
is there some way to find out from the talkback data whether that variable was
null or not?
Comment 7 Darin Fisher 2005-10-13 16:18:26 PDT
That information is not exposed on http://talkback-reports.mozilla.org/ :-(
Comment 8 Boris Zbarsky [:bz] 2005-10-13 20:44:38 PDT
Comment on attachment 199372 [details] [diff] [review]
v1 patch - bandaid

sr=bzbarsky if we come up with nothing better...
Comment 9 Boris Zbarsky [:bz] 2005-10-13 21:08:46 PDT
Jay, can you get darin local var data and such for the relevant talkbacks?
Comment 10 Christian :Biesinger (don't email me, ping me on IRC) 2005-10-15 10:20:29 PDT
Comment on attachment 199372 [details] [diff] [review]
v1 patch - bandaid

r=biesi I guess. but understanding why this is not a valid pointer would be
good...
Comment 11 Jay Patel [:jay] 2005-10-17 13:15:13 PDT
Has anyone been able to reproduce this with a recent 1.5 branch build?  Is this
a problem on Win32 and Linux, or just MacOSX?

I tried looking for more detailed info on these crashes, couldn't find any.  But
if anyone is able to reproduce this with a recent build, post your incident id
so I can try to dig up any details that we might be collecting.  I'm pretty sure
our best bet for local var info is with Win32 crashes...if I get a chance to
look it up within 1-2 days of submission (Talkback db problems force us to
delete a lot of detailed info every couple of days).
Comment 12 Scott MacGregor 2005-10-18 10:04:37 PDT
How would you guys like to proceed with this bug? Wait another day to see if we
can get local var data from someone who crashes on this in the next day or so?
Or do you want approval on the patch before that?
Comment 13 Jay Patel [:jay] 2005-10-18 10:24:37 PDT
If this is a low risk patch, as it seems to be... we should take this and I can
keep an eye on Talkback data going foward.  If anyone is able to reproduce this
consistently with older builds, we can have them retest with the latest builds
with this fix to verify whether or not it actually fixed the problem.
Comment 14 Asa Dotzler [:asa] 2005-10-18 14:13:21 PDT
Comment on attachment 199372 [details] [diff] [review]
v1 patch - bandaid

we'll take this and then watch talkback.
Comment 15 Darin Fisher 2005-10-18 14:26:08 PDT
fixed-on-trunk, fixed1.8
Comment 16 Darin Fisher 2006-01-23 07:57:09 PST
I think the band-aid was not good enough.  Reopening this bug.

L. David Baron wrote:
> This is a crash I saw while Thunderbird 1.5 was idle.  (I use it as a
> feedreader, so it was probably fetching feeds.)
>
> I've also noticed that sometimes it just stops fetching new feeds and I
> have to restart it -- that behavior *may* be associated with switching
> networks between home and office, but I'm not sure.  I've seen that
> quite a number of times, but I've only seen this crash once.  (I haven't
> seen any similar problems with Firefox trunk, though, so it seems more
> likely to be mail-specific.)
>
> I was wondering if you thought this was related to
> https://bugzilla.mozilla.org/show_bug.cgi?id=290190 or if there was
> anything else (e.g., PR_LOGging) I should investigate.
>
> The address in libc.so.6 is a non-instruction-aligned address in the
> middle of strstr.
>
> -David
>
> Incident ID: 14252378
> Stack Signature libc.so.6 + 0x6a58c (0x00b5058c) 2215a3c4
> Product ID      Thunderbird15
> Build ID        2005120113
> Trigger Time    2006-01-21 15:50:15.0
> Platform        LinuxIntel
> Operating System        Linux 2.6.14-1.1656_FC4
> Module  libc.so.6 + (0006a58c)
> URL visited     
> User Comments   unknown
> Since Last Crash        193 sec
> Total Uptime    193 sec
> Trigger Reason  SIGSEGV: Segmentation Fault: (signal 11)
> Source File, Line No.   N/A
> Stack Trace     
> libc.so.6 + 0x6a58c (0x00b5058c)
> PR_EnumerateAddrInfo()  [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/nsprpub/pr/src/misc/prnetdb.c, line 2149]
> nsDNSRecord::GetNextAddr()  [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/dns/src/nsDNSService2.cpp, line 136]
> nsSocketTransport::RecoverFromError()  [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/base/src/nsSocketTransport2.cpp, line 1223]
> nsSocketTransport::OnSocketDetached()  [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/base/src/nsSocketTransport2.cpp, line 1534]
> nsSocketTransportService::DetachSocket(nsSocketTransportService::SocketContext*)()  [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/base/src/nsSocketTransportService2.cpp, line 196]
> nsSocketTransportService::Run()  [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/base/src/nsSocketTransportService2.cpp, line 605]
> nsThread::Main()  [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/xpcom/threads/nsThread.cpp, line 713]
> _pt_root()  [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/nsprpub/pr/src/pthreads/ptthread.c, line 223]
> libpthread.so.0 + 0x5341 (0x00d2e341)
Comment 17 Darin Fisher 2006-03-22 19:44:23 PST
Using the new search in the top 5 frames feature of talkback, I found 3 FF1.5 crashes and 1 TB1.5 crash involving nsDNSRecord::GetNextAddr.  By comparison, there are roughly 160 FF1.0/TB1.0/MZ1.7 crashes.
Comment 18 Colin Blake 2006-11-07 15:51:03 PST
This looks very much like bug 337418.
Comment 19 timeless 2006-11-08 00:56:27 PST
clearng fixed flag. i'm fairly certain that it isn't fixed on branch either.

and i don't think the structure was null.

the comparison is useless, you want percentage of total crashes for a release. because the 1.7 series has been deployed much longer.
Comment 20 Darin Fisher 2007-06-11 00:38:35 PDT
-> reassign to default owner
Comment 21 timeless 2007-11-02 01:14:06 PDT
i agree with colin

*** This bug has been marked as a duplicate of bug 337418 ***

Note You need to log in before you can comment on or make changes to this bug.