Closed Bug 426060 Opened 16 years ago Closed 16 years ago

Crash [@ nsDNSRecord::GetNextAddr] when network connection drops

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9

People

(Reporter: florian, Assigned: wtc)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files, 1 obsolete file)

From bug 337418:

Comment #54  Florian Quèze   2008-03-28 10:49:15 PDT

With my trunk build (which is already a bit old: Mozilla/5.0 (Macintosh; U;
Intel Mac OS X 10.4; en-US; rv:1.9b4pre) Gecko/2008022621 Minefield/3.0b4pre) I
sometimes get crashes with a stack like this one:

Thread 1 Crashed:
0   <<00000000>>   0xffff07fc __memcpy + 92 (cpu_capabilities.h:228)
1   libnecko.dylib 0x143e08aa nsDNSRecord::GetNextAddr(unsigned short,
PRNetAddr*) + 460 (nsDNSService2.cpp:143)
2   libnecko.dylib 0x143cf4be nsSocketTransport::RecoverFromError() + 426
(nsSocketTransport2.cpp:1242)
3   libnecko.dylib 0x143cf61d nsSocketTransport::OnSocketDetached(PRFileDesc*)
+ 173 (nsSocketTransport2.cpp:1559)
...

I guess adding back NS_ENSURE_STATE(mHostRecord->addr) (proposed in comment 29)
would fix it.

Not sure if it's better to reopen this bug, or bug 290190 (where this was
added) or file a new one.

Comment #55 Wan-Teh Chang  2008-03-28 16:30:33 PDT

Please open a new bug and cite this bug and bug 290190.
Tell us in the new bug whether the URL has a host name
or literal IP address.
Attached patch trivial patch (obsolete) — Splinter Review
Trivial patch replacing the assertion by NS_ENSURE_STATE like it was done before in bug 290190.

(In reply to comment #55)

> Tell us in the new bug whether the URL has a host name
> or literal IP address.
> 

How can I know this? I see this crash only once or twice a week so adding a printf somewhere would not be a quick way to know...

When I have this crash, it's always just after a network connection drop (bad wifi at school) so I guess returning a failure for the DNS resolution would be OK.
Attachment #312612 - Flags: superreview?(cbiesinger)
Attachment #312612 - Flags: review?(cbiesinger)
We should find out why the nsHostRecord object doesn't have the
required property:
(http://lxr.mozilla.org/seamonkey/source/netwerk/dns/src/nsHostResolver.h#89)

  a fully resolved host record has either a non-null |addr_info|
  or |addr| field.

By code inspection, I believe an nsHostRecord may get into the
state where both the |addr_info| and |addr| fields are null
if we call nsHostResolver::OnLookupComplete with |result| equal
to null.  This can happen after PR_GetAddrInfoByName fails:
http://lxr.mozilla.org/seamonkey/source/netwerk/dns/src/nsHostResolver.cpp#701

695         ai = PR_GetAddrInfoByName(rec->host, rec->af, flags);
...
701         // convert error code to nsresult.
702         nsresult status = ai ? NS_OK : NS_ERROR_UNKNOWN_HOST;
703         resolver->OnLookupComplete(rec, status, ai);

We should find out why an nsHostRecord in this state can still
be passed to nsDNSRecord::GetNextAddr.  If that is not a bug,
we should update the comment quoted above to reflect this, and
the right fix would be for nsDNSRecord::GetNextAddr to return
NS_ERROR_NOT_AVAILABLE if both mHostRecord->addr_info and
mHostRecord->addr are null.  NS_ENSURE_TRUE causes a debug
break in debug build (and returns NS_ERROR_UNEXPECTED), which
isn't the right way to handle an expected error.

On the other hand, if it is a bug for an nsHostRecord in that
state to be passed to nsDNSRecord::GetNextAddr, we should fix
the bug rather than just trying to not crash.
Christian, Boris, Darin, please see my findings in comment 2.

If we fail to get the fresh addr info for a host:

http://lxr.mozilla.org/seamonkey/source/netwerk/dns/src/nsHostResolver.cpp#701

695         ai = PR_GetAddrInfoByName(rec->host, rec->af, flags);
...
701         // convert error code to nsresult.
702         nsresult status = ai ? NS_OK : NS_ERROR_UNKNOWN_HOST;
703         resolver->OnLookupComplete(rec, status, ai);

should resolver->OnLookupComplete keep the expired rec->addr_info
or blow it away?  Right now we blow it away, and this affects
previously created nsDNSRecords that contain |rec|; their
nsDNSRecord::GetNextAddr method will start to fail because they
have no address to return now.
Assignee: nobody → wtc
(In reply to comment #3)
Don't know if the following is relevant, but just in case --

A few years ago (around Fx 0.10 or 1.0 IIRC, or was it NS6?) we kept DNS addresses for as long as the browser was online (i.e., hours or days), and that caused timeouts when the remote host changed its DNS record (e.g. because an alternate server or connection had to be brought up online because of some hardware failure). The workaround, which was completely unobvious for the newbie I was then confronted with a solid browser timeout at a given site, was to bring the browser off-line then back online. Then the bug was fixed (I don't remember the bug number) and henceforward DNS records were only cached for a few minutes.

It was in 2002, see bug 151929 comment #2 (bug still open???)
Ah, there, found it: bug 162871 fixed 2003-10-27
I'm not quite sure what the right thing to do here is, to be honest...
I keep hitting this too (on Linux), I really don't think we want to ship Firefox 3 with this crasher. We had one of these a while back as well, caused lots of crashes with shaky WiFi connections in cafes n' what not.
Flags: blocking1.9?
+'ing this as I agree with Johnny.  Also, we need to make sure that we include tests with the patch as well.  Reviewers should make sure they exist!
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Attached patch patch v2Splinter Review
This patch is a minor improvement over Florian's patch (attachment 312612 [details] [diff] [review]).

jst, could you please test it?  Thanks.
Attachment #312612 - Attachment is obsolete: true
Attachment #313738 - Flags: review?(jst)
Attachment #312612 - Flags: superreview?(cbiesinger)
Attachment #312612 - Flags: review?(cbiesinger)
Comment on attachment 313738 [details] [diff] [review]
patch v2

r=jst, I've been running with this since friday and no crash yet. Given the nature of this I can't say for sure whether this completely fixes the problem, but it's looking good so far. Definitely worth getting in IMO.
Attachment #313738 - Flags: review?(jst) → review+
Comment on attachment 313738 [details] [diff] [review]
patch v2

Christian, could you superreview this patch.

The change to nsDNSService2.cpp should be easy to understand.
It looks like a standard crash-avoiding check, so I added a
comment to explain how that condition can be true in practice.

The change to nsHostResolver.cpp is not necessary for fixing
this crash.  It is intended to make the nsHostRecord expire
immediately if the record contains no lookup results.  (It
doesn't make sense for an empty record to expire in the
future.)  Since our code doesn't check rec->expiration
if |rec| is empty, this change is merely a code cleanup.
Attachment #313738 - Flags: superreview?(cbiesinger)
(In reply to comment #7)
Damon, I'm sorry that I won't have time to write
a test for this bug.  It seems that
mozilla/netwerk/test/TestDNS.cpp could be
extended or a new test based on it could be
created to test the fix for this bug.  If anyone
volunteers, I'll be happy to explain the bug
to him.

Attachment #313738 - Flags: superreview?(cbiesinger) → superreview+
I checked in the patch on the Mozilla trunk for Firefox 3 RC1.

Checking in nsDNSService2.cpp;
/cvsroot/mozilla/netwerk/dns/src/nsDNSService2.cpp,v  <--  nsDNSService2.cpp
new revision: 1.15; previous revision: 1.14
done
Checking in nsHostResolver.cpp;
/cvsroot/mozilla/netwerk/dns/src/nsHostResolver.cpp,v  <--  nsHostResolver.cpp
new revision: 1.23; previous revision: 1.22
done
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9
Crash Signature: [@ nsDNSRecord::GetNextAddr]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: