Closed
Bug 1438947
Opened 7 years ago
Closed 7 years ago
Crash in mozilla::net::TRR::Cancel
Categories
(Core :: Networking: DNS, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla60
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox58 | --- | unaffected |
firefox59 | --- | unaffected |
firefox60 | --- | fixed |
People
(Reporter: calixte, Assigned: bagder)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, Whiteboard: [necko-triaged][trr])
Crash Data
Attachments
(2 files, 1 obsolete file)
This bug was filed from the Socorro interface and is
report bp-d894c6f9-e5ef-4bb2-b529-c90f60180216.
=============================================================
Top 10 frames of crashing thread:
0 xul.dll mozilla::net::TRR::Cancel netwerk/dns/TRR.cpp:1036
1 xul.dll nsHostRecord::Cancel netwerk/dns/nsHostResolver.cpp:230
2 xul.dll nsHostResolver::ClearPendingQueue netwerk/dns/nsHostResolver.cpp:599
3 xul.dll nsHostResolver::Shutdown netwerk/dns/nsHostResolver.cpp:684
4 xul.dll nsDNSService::Shutdown netwerk/dns/nsDNSService2.cpp:676
5 xul.dll mozilla::net::nsIOService::SetOffline netwerk/base/nsIOService.cpp:1173
6 xul.dll mozilla::net::nsIOService::Observe netwerk/base/nsIOService.cpp:1488
7 xul.dll nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:112
8 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:296
9 xul.dll nsXREDirProvider::DoShutdown toolkit/xre/nsXREDirProvider.cpp:1095
=============================================================
There is 1 crash in nightly 60 with buildid 20180216104033. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1434852.
[1] https://hg.mozilla.org/mozilla-central/rev?node=6776d69d2f03
Flags: needinfo?(daniel)
Assignee | ||
Comment 1•7 years ago
|
||
Clearly TRR related.
Assignee: nobody → daniel
Flags: needinfo?(daniel)
Priority: -- → P1
Whiteboard: [necko-triaged]
Comment hidden (mozreview-request) |
Comment 3•7 years ago
|
||
mTimeout is of type nsCOMPtr<nsITimer>
So why do we have to initialize it to nullptr? I would expect the default constructor to do that already.
Flags: needinfo?(daniel)
Assignee | ||
Comment 4•7 years ago
|
||
Ehm, right. I just checked TRR.cpp:1036 and figured it looked exactly like an uninitialized mTimeout in there... So its something else!
Flags: needinfo?(daniel)
Assignee | ||
Updated•7 years ago
|
Attachment #8951786 -
Flags: review?(valentin.gosu)
Comment 5•7 years ago
|
||
It seems to me that it might already be nullptr, and maybe there's a race with it being null-ed on another thread.
Assignee | ||
Comment 6•7 years ago
|
||
That might be it, yes. I'll go the other way instead and avoid some of the early cancel+clears since they're done in the destructor anyway. Another take is coming.
Assignee | ||
Comment 7•7 years ago
|
||
Hm, I bet it isn't even necessary to Cancel it in the destructor!
Comment 8•7 years ago
|
||
(In reply to Daniel Stenberg [:bagder] from comment #7)
> Hm, I bet it isn't even necessary to Cancel it in the destructor!
No. The destructor for nsTimer does that anyway.
Comment hidden (mozreview-request) |
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8951786 [details]
bug 1438947 - avoid unnecessary timer cancels + clears
https://reviewboard.mozilla.org/r/221054/#review226968
Attachment #8951786 -
Flags: review?(valentin.gosu) → review+
Comment 11•7 years ago
|
||
Pushed by daniel@haxx.se:
https://hg.mozilla.org/integration/autoland/rev/09cda613b1c4
avoid unnecessary timer cancels + clears r=valentin
Comment 12•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Reporter | ||
Comment 13•7 years ago
|
||
There still are 12 crashes in nightly 60 starting with buildid 20180217100053 so after the patch landed.
Assignee | ||
Comment 14•7 years ago
|
||
In TRR::Cancel?
Reporter | ||
Comment 15•7 years ago
|
||
Yes, see http://bit.ly/2HuBoKS to get the crash reports.
Assignee | ||
Comment 16•7 years ago
|
||
The crash reports indicate that there are at least two or three different ways that lead into the TRR::Cancel crash.
1 - (example https://crash-stats.mozilla.com/report/index/4d0fc238-85ed-444f-99a2-601e90180219) Via nsHostResolver::FlushCache that gets called from nsDNSService::Observe() from mozilla::net::nsIOService::NotifyWakeup. This is a machine waking up and it gets a network-change event and flushes the DNS cache.
2 - (example https://crash-stats.mozilla.com/report/index/5a160053-88e8-4416-80ee-de0890180219) Via nsHostRecord::Cancel() and nsHostResolver::Shutdown(). This is the browser shutting down.
3 - A variation of the first one (example https://crash-stats.mozilla.com/report/index/7e58d309-5468-4704-a008-f21760180219) Via nsHostRecord::Cancel and mozilla::net::nsIOService::SetOffline. This is the browser going offline and as a part of that, it flushes the DNS cache.
I can't see how any of them can be reached without TRR being activated, which is good since TRR is still preffed off by default.
Assignee | ||
Comment 17•7 years ago
|
||
I believe this might be a race when accessing/changing the mTrrA and mTrrAAAA in the nsHostRecord objects.
Partly this happens because the code currently cancels TRR resolves from nsHostRecord::RemoveOrRefresh, which I don't think it should since we don't need to refresh TRR name resolves due to network changes.
I'm testing out a fix for this locally which I'll submit for review as soon as I feel confident.
Assignee | ||
Comment 18•7 years ago
|
||
(reviewboard refuses me to push another patch to the same bug that already had a patch previously so I'm doing it the old style as a work-around)
This patch puts a mutex around all nsHostRecord::mTrrA[AAA] accesses since they might be accessed and updated from different threads.
... and remove Cancel() from nsHostRecord::RemoveOrRefresh since we don't need to cancel TRR resolves due to network changes.
Attachment #8952965 -
Flags: review?(valentin.gosu)
Assignee | ||
Updated•7 years ago
|
Whiteboard: [necko-triaged] → [necko-triaged][trr]
Comment 19•7 years ago
|
||
Comment on attachment 8952965 [details] [diff] [review]
0001-bug-1438947-mutex-nsHostRecord-mTrrA-AAA-accesses-r-.patch
Review of attachment 8952965 [details] [diff] [review]:
-----------------------------------------------------------------
::: netwerk/dns/nsHostResolver.cpp
@@ +1076,5 @@
> RefPtr<nsHostRecord> rec(aRec);
> mLock.AssertCurrentThreadOwns();
> + {
> + MutexAutoLock trrlock(rec->mTrr_lock);
> + MOZ_ASSERT(!TRROutstanding());
Let's wrap this block in a #ifdef DEBUG, since we're only locking for this assert
::: netwerk/dns/nsHostResolver.h
@@ +189,5 @@
> enum {
> INIT, STARTED, OK, FAILED
> } mTrrAUsed, mTrrAAAAUsed;
>
> + Mutex mTrr_lock; // lock when accessing the mTrrA[AAA] pointers
Let's call it mTrrLock or mTRRLock.
Attachment #8952965 -
Flags: review?(valentin.gosu) → review+
Assignee | ||
Comment 20•7 years ago
|
||
Attachment #8952965 -
Attachment is obsolete: true
Attachment #8953043 -
Flags: review+
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 21•7 years ago
|
||
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/11edb4e3cdec
mutex nsHostRecord::mTrrA[AAA] accesses r=valentin
Keywords: checkin-needed
Comment 22•7 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•