Closed
Bug 1502641
Opened 7 years ago
Closed 7 years ago
Crash in _purecall | nsHostRecord::~nsHostRecord
Categories
(Core :: Networking: DNS, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla65
| Tracking | Status | |
|---|---|---|
| firefox-esr60 | --- | unaffected |
| firefox63 | --- | unaffected |
| firefox64 | --- | wontfix |
| firefox65 | --- | fixed |
People
(Reporter: philipp, Assigned: dragana)
Details
(Keywords: crash, regression, Whiteboard: [necko-triaged])
Crash Data
Attachments
(1 file)
|
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta-
|
Details | Review |
This bug was filed from the Socorro interface and is
report bp-620d4c50-44cc-41e6-8bab-7c96c0181027.
=============================================================
Top 10 frames of crashing thread:
0 vcruntime140.dll _purecall f:\dd\vctools\crt\vcruntime\src\misc\purevirt.cpp:29
1 xul.dll void nsHostRecord::~nsHostRecord netwerk/dns/nsHostResolver.cpp:215
2 xul.dll void AddrHostRecord::~AddrHostRecord netwerk/dns/nsHostResolver.cpp:323
3 xul.dll AddrHostRecord::Release netwerk/dns/nsHostResolver.cpp:296
4 xul.dll static void nsTHashtable<nsBaseHashtableET<nsGenericHashKey<nsHostKey>, RefPtr<nsHostRecord> > >::s_ClearEntry xpcom/ds/nsTHashtable.h:477
5 xul.dll PLDHashTable::Clear xpcom/ds/PLDHashTable.cpp:360
6 xul.dll nsHostResolver::Shutdown netwerk/dns/nsHostResolver.cpp:831
7 xul.dll nsDNSService::Shutdown netwerk/dns/nsDNSService2.cpp:770
8 xul.dll nsDNSService::Observe netwerk/dns/nsDNSService2.cpp:1253
9 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:295
=============================================================
this crash signature is newly showing up on 64.0b from 32bit windows installations, in low volume though.
perhaps related to the changes from bug 1481251?
Updated•7 years ago
|
Flags: needinfo?(dd.mozilla)
| Assignee | ||
Updated•7 years ago
|
Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Flags: needinfo?(dd.mozilla)
Priority: -- → P1
Whiteboard: [necko-triaged]
Updated•7 years ago
|
Assignee: dd.mozilla → honzab.moz
Status: ASSIGNED → NEW
Comment 1•7 years ago
|
||
Daniel, this has started on Oct 21. Do you happen to know about any changes in that code that may have caused this?
There were no changes in DNS code immediately prior to October 21, and even the weeks before that, the DNS related changes were telemetry oriented.
The crash seems to be shutdown-related. But I fail to see what in nsHostRecord::~nsHostRecord() that can turn into a NULL dereference (all crashes say address 0x0, so it doesn't seem to be 'mCallbacks' that is null) and how that happens! :-(
Puzzling!
Flags: needinfo?(daniel)
Comment 4•7 years ago
|
||
Thanks, Daniel. As the crash rate is low and it's a shutdown crash only, I will lower the priority and look at this later. But it doesn't seem easily actionable, could be a symptom of a previews heap corruption elsewhere too.
Priority: P1 → P2
Comment 5•7 years ago
|
||
It may be some heap corruption that zeros out the vtable.
Comment 6•7 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #5)
> It may be some heap corruption that zeros out the vtable.
_purecall is not a null deref, the crash is actually a C++ exception. this happens when you manage to call a virtual function that is unimplemented (= 0). hard to say how a vtable can start pointing there suddenly, need to study.
Updated•7 years ago
|
Keywords: csectype-nullptr
Comment 7•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #6)
> (In reply to Valentin Gosu [:valentin] from comment #5)
> > It may be some heap corruption that zeros out the vtable.
>
> _purecall is not a null deref, the crash is actually a C++ exception. this
> happens when you manage to call a virtual function that is unimplemented (=
> 0). hard to say how a vtable can start pointing there suddenly, need to
> study.
Status of bug?
Flags: needinfo?(honzab.moz)
Keywords: csectype-nullptr
Comment 8•7 years ago
|
||
Status of the bug is that I would love to try to find the regressing bug first.
> this crash signature is newly showing up on 64.0b from 32bit windows installations
This looks like a low-prob race/heap corruption and it's something we have landed on 64, it then started to manifest under a heavier load on beta. It correlates with the 22-10-2018 merge date (64: m-c -> m-b). Bit-ness suggests that this is likely related to pointer logic.
There has been 11 bugs landed in Net:DNS for 64:
https://bugzilla.mozilla.org/buglist.cgi?list_id=14423882&resolution=FIXED&query_format=advanced&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED&component=Networking%3A%20DNS&product=Core&target_milestone=mozilla64
One of them is probably causing this.
Daniel, I will ask you to look at the list above, maybe you can spot something more quickly than me that could cause this crash. Then feel free to bounce back to me, if you don't find anything. Thanks.
Assignee: honzab.moz → daniel
Flags: needinfo?(honzab.moz)
| Assignee | ||
Comment 9•7 years ago
|
||
this must be my bug 1481251 where I introduces to classes of hostRecord. I do know know how wa can en up with a virtual function that is unimplemented.
| Assignee | ||
Updated•7 years ago
|
Assignee: daniel → dd.mozilla
Status: NEW → ASSIGNED
| Assignee | ||
Comment 10•7 years ago
|
||
the crash is in AddrHostRecord destructor, I do not know how it can point to non implemented virtual function.
Comment 11•7 years ago
|
||
I agree that out of those bugs that is a likely candidate - but still not obvious what the problem is. I also noticed that some of the crashes are not shutdown triggered but instead network-change triggered. Like this: https://crash-stats.mozilla.com/report/index/d3fe2f33-4630-4a9f-bd2a-a5e110181106
Comment 12•7 years ago
|
||
My small research on _purecall suggests that this is a call on an object that is already destroyed or is in the phase between calls of derived/base destructors, while the outer dtor has already removed references to overridden methods from the vtable and made them point to pure calls.
This can be either a thread race or simply a small chance to call a virtual function from dtor. Also could be "illusion of atomic reference counter" case.
| Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #12)
> My small research on _purecall suggests that this is a call on an object
> that is already destroyed or is in the phase between calls of derived/base
> destructors, while the outer dtor has already removed references to
> overridden methods from the vtable and made them point to pure calls.
>
> This can be either a thread race or simply a small chance to call a virtual
> function from dtor. Also could be "illusion of atomic reference counter"
> case.
Yes, I know this all, but I am not sure what is going wrong in this case.No virtual function is called in destructor, actually no functions is called at all just cleaning up of variables. The only possibility is ref counting, that I did something wrong there.
Comment 14•7 years ago
|
||
There are a few issues here, but I'm not sure which one causes the crash:
[1] nsHostRecord extends nsISupports, but NS_DECL_THREADSAFE_ISUPPORTS is done in AddrHostRecord/TypeHostRecord. Normally the base class declares and implements ISUPPORTS, and the child classes use NS_DECL_ISUPPORTS_INHERITED.
[2] NS_IMPL_ISUPPORTS(AddrHostRecord, nsISupports, AddrHostRecord, nsHostRecord) is this definition recursive? Or maybe it's an issue that nsHostRecord has pure virtual methods.
[3] We use nsCOMPtr<AddrHostRecord> and nsCOMPtr<TypeHostRecord>. Normally we should use RefPtr for concrete types, and use do_QueryObject to check their types.
[1] https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/netwerk/dns/nsHostResolver.h#78
[2] https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/netwerk/dns/nsHostResolver.cpp#296,569
| Assignee | ||
Comment 15•7 years ago
|
||
| Assignee | ||
Comment 16•7 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #14)
> There are a few issues here, but I'm not sure which one causes the crash:
> [1] nsHostRecord extends nsISupports, but NS_DECL_THREADSAFE_ISUPPORTS is
> done in AddrHostRecord/TypeHostRecord. Normally the base class declares and
> implements ISUPPORTS, and the child classes use NS_DECL_ISUPPORTS_INHERITED.
> [2] NS_IMPL_ISUPPORTS(AddrHostRecord, nsISupports, AddrHostRecord,
> nsHostRecord) is this definition recursive? Or maybe it's an issue that
> nsHostRecord has pure virtual methods.
> [3] We use nsCOMPtr<AddrHostRecord> and nsCOMPtr<TypeHostRecord>. Normally
> we should use RefPtr for concrete types, and use do_QueryObject to check
> their types.
>
> [1]
> https://searchfox.org/mozilla-central/rev/
> c0b26c40769a1e5607a1ae8be37fe64df64fc55e/netwerk/dns/nsHostResolver.h#78
> [2]
> https://searchfox.org/mozilla-central/rev/
> c0b26c40769a1e5607a1ae8be37fe64df64fc55e/netwerk/dns/nsHostResolver.cpp#296,
> 569
I have change some stuff, maybe even too much...but better now than waiting for release crash stats.
| Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 17•7 years ago
|
||
Pushed by rvandermeulen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b0863b729ae9
Change ref-counting for nsHostRecord, change nsCOMPtr<*HostRecord> into RefPtr<*HostRecord>, remove pure virtual functions from nsHostRecord. r=valentin
Keywords: checkin-needed
Comment 18•7 years ago
|
||
| bugherder | ||
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Comment 19•7 years ago
|
||
Please nominate this for Beta approval when you get a chance.
Flags: needinfo?(dd.mozilla)
| Assignee | ||
Comment 20•7 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)
> Please nominate this for Beta approval when you get a chance.
I will, just waiting for a day for the patch to run on Nightly. Thanks
| Assignee | ||
Comment 21•7 years ago
|
||
Comment on attachment 9025668 [details]
Change ref-counting for nsHostRecord, change nsCOMPtr<*HostRecord> into RefPtr<*HostRecord>, remove pure virtual functions from nsHostRecord.
[Beta/Release Uplift Approval Request]
Feature/Bug causing the regression: Bug 1481251
User impact if declined: It cause crashes.
Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Medium
Why is the change risky/not risky? (and alternatives if risky): The crash is probably connected with refCounting or pure virtual functions. The patch tries to clean all possible causes of the crash. The bug this is fixing only showed up on release, probably we will not notice any problems with the patch until release.
String changes made/needed:
Flags: needinfo?(dd.mozilla)
Attachment #9025668 -
Flags: approval-mozilla-beta?
Comment 22•7 years ago
|
||
Comment on attachment 9025668 [details]
Change ref-counting for nsHostRecord, change nsCOMPtr<*HostRecord> into RefPtr<*HostRecord>, remove pure virtual functions from nsHostRecord.
this is low enough volume that I think I'd prefer to let it ride the trains.
Attachment #9025668 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
Updated•7 years ago
|
Comment 23•7 years ago
|
||
If this is low vol on Beta it will definitely be higher volume on Release. I believe all affected branches should be fixed.
You need to log in
before you can comment on or make changes to this bug.
Description
•