Closed Bug 290190 Opened 20 years ago Closed 17 years ago

crash when network connection drops and reconnects [@ msvcrt.dll - nsDNSRecord::GetNextAddr ]

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 337418
mozilla1.9alpha1

People

(Reporter: darin.moz, Unassigned)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

crash when network connection drops and reconnects, while using gmail. see: http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=2&type=iid&id=5046300
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → mozilla1.8beta2
this has happened to our product too. i couldn't figure out what to do with it.
Summary: crash when network connection drops and reconnects → crash when network connection drops and reconnects [@ msvcrt.dll - nsDNSRecord::GetNextAddr ]
*** Bug 293111 has been marked as a duplicate of this bug. ***
Also happens on Mac OS X... Stack Signature 0xffff89c0 0d35ece4 Product ID Firefox10 Build ID 2005071117 Trigger Time 2005-07-17 19:59:30.0 Platform MacOSX Operating System Darwin 8.2.0 Module URL visited User Comments Since Last Crash 298633 sec Total Uptime 298633 sec Trigger Reason SIGBUS: Bus Error: (signal 10) Source File, Line No. N/A Stack Trace 0xffff89c0 nsDNSRecord::GetNextAddr() [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/ netwerk/dns/src/nsDNSService2.cpp, line 122] nsSocketTransport::OnSocketDetached() [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/ mozilla/netwerk/base/src/nsSocketTransport2.cpp, line 1442] nsSocketTransportService::DetachSocket(nsSocketTransportService::SocketContext*)() [/builds/ tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/netwerk/base/src/ nsSocketTransportService2.cpp, line 187] nsSocketTransportService::Run() [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/ netwerk/base/src/nsSocketTransportService2.cpp, line 548] nsThread::Main() [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/xpcom/threads/ nsThread.cpp, line 607] _pt_root() [/builds/tinderbox/Fx-Aviary1.0.1/Darwin_7.9.0_Depend/mozilla/nsprpub/pr/src/ pthreads/ptthread.c, line 217] libSystem.B.dylib.88.0.0 + 0x2c3d4 (0x9002c3d4)
OS: Windows XP → All
Hardware: PC → All
Target Milestone: mozilla1.8beta2 → mozilla1.8beta5
Flags: blocking1.8rc1?
This looks high-profile. If we can get a low-risk patch, we should take it.
Flags: blocking1.8rc1? → blocking1.8rc1+
It seems like we are crashing on the memcpy in nsDNSRecord::GetNextAddr, which implies that the nsHostRecord's addr_info field is null. In that case, we expect the addr field to be non-null. The addr_info field is null and the addr field is non-null when the hostname was an IP address literal that could simply be parsed. I doubt this patch will actually help. I suspect it is more likely that the nsHostRecord data structure is getting corrupted, but I figure that this patch is worth a shot since it might help. If it does, then at least we know that addr is somehow ending up null, and we can go from there. Otherwise, we're no worse off than we are now.
Attachment #199372 - Flags: review?(cbiesinger)
Attachment #199372 - Flags: superreview?(bzbarsky)
is there some way to find out from the talkback data whether that variable was null or not?
That information is not exposed on http://talkback-reports.mozilla.org/ :-(
Comment on attachment 199372 [details] [diff] [review] v1 patch - bandaid sr=bzbarsky if we come up with nothing better...
Attachment #199372 - Flags: superreview?(bzbarsky) → superreview+
Jay, can you get darin local var data and such for the relevant talkbacks?
Comment on attachment 199372 [details] [diff] [review] v1 patch - bandaid r=biesi I guess. but understanding why this is not a valid pointer would be good...
Attachment #199372 - Flags: review?(cbiesinger) → review+
Attachment #199372 - Flags: approval1.8rc1?
Has anyone been able to reproduce this with a recent 1.5 branch build? Is this a problem on Win32 and Linux, or just MacOSX? I tried looking for more detailed info on these crashes, couldn't find any. But if anyone is able to reproduce this with a recent build, post your incident id so I can try to dig up any details that we might be collecting. I'm pretty sure our best bet for local var info is with Win32 crashes...if I get a chance to look it up within 1-2 days of submission (Talkback db problems force us to delete a lot of detailed info every couple of days).
How would you guys like to proceed with this bug? Wait another day to see if we can get local var data from someone who crashes on this in the next day or so? Or do you want approval on the patch before that?
If this is a low risk patch, as it seems to be... we should take this and I can keep an eye on Talkback data going foward. If anyone is able to reproduce this consistently with older builds, we can have them retest with the latest builds with this fix to verify whether or not it actually fixed the problem.
Comment on attachment 199372 [details] [diff] [review] v1 patch - bandaid we'll take this and then watch talkback.
Attachment #199372 - Flags: approval1.8rc1? → approval1.8rc1+
fixed-on-trunk, fixed1.8
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Keywords: fixed1.8
Resolution: --- → FIXED
I think the band-aid was not good enough. Reopening this bug. L. David Baron wrote: > This is a crash I saw while Thunderbird 1.5 was idle. (I use it as a > feedreader, so it was probably fetching feeds.) > > I've also noticed that sometimes it just stops fetching new feeds and I > have to restart it -- that behavior *may* be associated with switching > networks between home and office, but I'm not sure. I've seen that > quite a number of times, but I've only seen this crash once. (I haven't > seen any similar problems with Firefox trunk, though, so it seems more > likely to be mail-specific.) > > I was wondering if you thought this was related to > https://bugzilla.mozilla.org/show_bug.cgi?id=290190 or if there was > anything else (e.g., PR_LOGging) I should investigate. > > The address in libc.so.6 is a non-instruction-aligned address in the > middle of strstr. > > -David > > Incident ID: 14252378 > Stack Signature libc.so.6 + 0x6a58c (0x00b5058c) 2215a3c4 > Product ID Thunderbird15 > Build ID 2005120113 > Trigger Time 2006-01-21 15:50:15.0 > Platform LinuxIntel > Operating System Linux 2.6.14-1.1656_FC4 > Module libc.so.6 + (0006a58c) > URL visited > User Comments unknown > Since Last Crash 193 sec > Total Uptime 193 sec > Trigger Reason SIGSEGV: Segmentation Fault: (signal 11) > Source File, Line No. N/A > Stack Trace > libc.so.6 + 0x6a58c (0x00b5058c) > PR_EnumerateAddrInfo() [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/nsprpub/pr/src/misc/prnetdb.c, line 2149] > nsDNSRecord::GetNextAddr() [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/dns/src/nsDNSService2.cpp, line 136] > nsSocketTransport::RecoverFromError() [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/base/src/nsSocketTransport2.cpp, line 1223] > nsSocketTransport::OnSocketDetached() [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/base/src/nsSocketTransport2.cpp, line 1534] > nsSocketTransportService::DetachSocket(nsSocketTransportService::SocketContext*)() [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/base/src/nsSocketTransportService2.cpp, line 196] > nsSocketTransportService::Run() [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/netwerk/base/src/nsSocketTransportService2.cpp, line 605] > nsThread::Main() [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/xpcom/threads/nsThread.cpp, line 713] > _pt_root() [/builds/tinderbox/Tb-Mozilla1.8/Linux_2.4.18-14_Depend/mozilla/nsprpub/pr/src/pthreads/ptthread.c, line 223] > libpthread.so.0 + 0x5341 (0x00d2e341)
Status: RESOLVED → REOPENED
Priority: P2 → --
Resolution: FIXED → ---
Target Milestone: mozilla1.8beta5 → mozilla1.9alpha
Status: REOPENED → ASSIGNED
Priority: -- → P1
Using the new search in the top 5 frames feature of talkback, I found 3 FF1.5 crashes and 1 TB1.5 crash involving nsDNSRecord::GetNextAddr. By comparison, there are roughly 160 FF1.0/TB1.0/MZ1.7 crashes.
Priority: P1 → P3
This looks very much like bug 337418.
clearng fixed flag. i'm fairly certain that it isn't fixed on branch either. and i don't think the structure was null. the comparison is useless, you want percentage of total crashes for a release. because the 1.7 series has been deployed much longer.
Keywords: fixed1.8
-> reassign to default owner
Assignee: darin.moz → nobody
Status: ASSIGNED → NEW
QA Contact: benc → networking
i agree with colin
Status: NEW → RESOLVED
Closed: 20 years ago17 years ago
Resolution: --- → DUPLICATE
Depends on: 426060
Crash Signature: [@ msvcrt.dll - nsDNSRecord::GetNextAddr ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: