Closed Bug 1438947 Opened 7 years ago Closed 7 years ago

Crash in mozilla::net::TRR::Cancel

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla60

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox58

---

unaffected

firefox59

---

unaffected

firefox60

---

fixed

People

(Reporter: calixte, Assigned: bagder)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [necko-triaged][trr])

Crash Data

Attachments

(2 files, 1 obsolete file)

bug 1438947 - avoid unnecessary timer cancels + clears 7 years ago Daniel Stenberg [:bagder] 59 bytes, text/x-review-board-request	valentin : review+	Details
0001-bug-1438947-mutex-nsHostRecord-mTrrA-AAA-accesses-r-.patch 7 years ago Daniel Stenberg [:bagder] 4.37 KB, patch	valentin : review+	Details \| Diff \| Splinter Review
v2-0001-bug-1438947-mutex-nsHostRecord-mTrrA-AAA-accesses.patch 7 years ago Daniel Stenberg [:bagder] 4.39 KB, patch	bagder : review+	Details \| Diff \| Splinter Review

Calixte Denizet (:calixte)

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is report bp-d894c6f9-e5ef-4bb2-b529-c90f60180216. ============================================================= Top 10 frames of crashing thread: 0 xul.dll mozilla::net::TRR::Cancel netwerk/dns/TRR.cpp:1036 1 xul.dll nsHostRecord::Cancel netwerk/dns/nsHostResolver.cpp:230 2 xul.dll nsHostResolver::ClearPendingQueue netwerk/dns/nsHostResolver.cpp:599 3 xul.dll nsHostResolver::Shutdown netwerk/dns/nsHostResolver.cpp:684 4 xul.dll nsDNSService::Shutdown netwerk/dns/nsDNSService2.cpp:676 5 xul.dll mozilla::net::nsIOService::SetOffline netwerk/base/nsIOService.cpp:1173 6 xul.dll mozilla::net::nsIOService::Observe netwerk/base/nsIOService.cpp:1488 7 xul.dll nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:112 8 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:296 9 xul.dll nsXREDirProvider::DoShutdown toolkit/xre/nsXREDirProvider.cpp:1095 ============================================================= There is 1 crash in nightly 60 with buildid 20180216104033. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1434852. [1] https://hg.mozilla.org/mozilla-central/rev?node=6776d69d2f03

Flags: needinfo?(daniel)

Daniel Stenberg [:bagder]

Assignee

Comment 1

•

7 years ago

Clearly TRR related.

Assignee: nobody → daniel

Flags: needinfo?(daniel)

Priority: -- → P1

Whiteboard: [necko-triaged]

Comment hidden (mozreview-request)

Valentin Gosu [:valentin] (he/him)

Comment 3

•

7 years ago

mTimeout is of type nsCOMPtr<nsITimer> So why do we have to initialize it to nullptr? I would expect the default constructor to do that already.

Flags: needinfo?(daniel)

Daniel Stenberg [:bagder]

Assignee

Comment 4

•

7 years ago

Ehm, right. I just checked TRR.cpp:1036 and figured it looked exactly like an uninitialized mTimeout in there... So its something else!

Flags: needinfo?(daniel)

Daniel Stenberg [:bagder]

Assignee

Updated

•

7 years ago

Attachment #8951786 - Flags: review?(valentin.gosu)

Valentin Gosu [:valentin] (he/him)

Comment 5

•

7 years ago

It seems to me that it might already be nullptr, and maybe there's a race with it being null-ed on another thread.

Daniel Stenberg [:bagder]

Assignee

Comment 6

•

7 years ago

That might be it, yes. I'll go the other way instead and avoid some of the early cancel+clears since they're done in the destructor anyway. Another take is coming.

Daniel Stenberg [:bagder]

Assignee

Comment 7

•

7 years ago

Hm, I bet it isn't even necessary to Cancel it in the destructor!

Valentin Gosu [:valentin] (he/him)

Comment 8

•

7 years ago

(In reply to Daniel Stenberg [:bagder] from comment #7) > Hm, I bet it isn't even necessary to Cancel it in the destructor! No. The destructor for nsTimer does that anyway.

Comment hidden (mozreview-request)

Valentin Gosu [:valentin] (he/him)

Comment 10

•

7 years ago

mozreview-review

Comment on attachment 8951786 [details] bug 1438947 - avoid unnecessary timer cancels + clears https://reviewboard.mozilla.org/r/221054/#review226968

Attachment #8951786 - Flags: review?(valentin.gosu) → review+

Pulsebot

Comment 11

•

7 years ago

Pushed by daniel@haxx.se: https://hg.mozilla.org/integration/autoland/rev/09cda613b1c4 avoid unnecessary timer cancels + clears r=valentin

Dorel Luca [:dluca]

Comment 12

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/09cda613b1c4

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox60: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla60

Calixte Denizet (:calixte)

Reporter

Comment 13

•

7 years ago

There still are 12 crashes in nightly 60 starting with buildid 20180217100053 so after the patch landed.

Status: RESOLVED → REOPENED

status-firefox60: fixed → affected

Resolution: FIXED → ---

Daniel Stenberg [:bagder]

Assignee

Comment 14

•

7 years ago

In TRR::Cancel?

Calixte Denizet (:calixte)

Reporter

Comment 15

•

7 years ago

Yes, see http://bit.ly/2HuBoKS to get the crash reports.

Daniel Stenberg [:bagder]

Assignee

Comment 16

•

7 years ago

The crash reports indicate that there are at least two or three different ways that lead into the TRR::Cancel crash. 1 - (example https://crash-stats.mozilla.com/report/index/4d0fc238-85ed-444f-99a2-601e90180219) Via nsHostResolver::FlushCache that gets called from nsDNSService::Observe() from mozilla::net::nsIOService::NotifyWakeup. This is a machine waking up and it gets a network-change event and flushes the DNS cache. 2 - (example https://crash-stats.mozilla.com/report/index/5a160053-88e8-4416-80ee-de0890180219) Via nsHostRecord::Cancel() and nsHostResolver::Shutdown(). This is the browser shutting down. 3 - A variation of the first one (example https://crash-stats.mozilla.com/report/index/7e58d309-5468-4704-a008-f21760180219) Via nsHostRecord::Cancel and mozilla::net::nsIOService::SetOffline. This is the browser going offline and as a part of that, it flushes the DNS cache. I can't see how any of them can be reached without TRR being activated, which is good since TRR is still preffed off by default.

Daniel Stenberg [:bagder]

Assignee

Comment 17

•

7 years ago

I believe this might be a race when accessing/changing the mTrrA and mTrrAAAA in the nsHostRecord objects. Partly this happens because the code currently cancels TRR resolves from nsHostRecord::RemoveOrRefresh, which I don't think it should since we don't need to refresh TRR name resolves due to network changes. I'm testing out a fix for this locally which I'll submit for review as soon as I feel confident.

Daniel Stenberg [:bagder]

Assignee

Comment 18

•

7 years ago

Attached patch 0001-bug-1438947-mutex-nsHostRecord-mTrrA-AAA-accesses-r-.patch (obsolete) — Details — Splinter Review

(reviewboard refuses me to push another patch to the same bug that already had a patch previously so I'm doing it the old style as a work-around) This patch puts a mutex around all nsHostRecord::mTrrA[AAA] accesses since they might be accessed and updated from different threads. ... and remove Cancel() from nsHostRecord::RemoveOrRefresh since we don't need to cancel TRR resolves due to network changes.

Attachment #8952965 - Flags: review?(valentin.gosu)

Daniel Stenberg [:bagder]

Assignee

Updated

•

7 years ago

Whiteboard: [necko-triaged] → [necko-triaged][trr]

Valentin Gosu [:valentin] (he/him)

Comment 19

•

7 years ago

Comment on attachment 8952965 [details] [diff] [review] 0001-bug-1438947-mutex-nsHostRecord-mTrrA-AAA-accesses-r-.patch Review of attachment 8952965 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/dns/nsHostResolver.cpp @@ +1076,5 @@ > RefPtr<nsHostRecord> rec(aRec); > mLock.AssertCurrentThreadOwns(); > + { > + MutexAutoLock trrlock(rec->mTrr_lock); > + MOZ_ASSERT(!TRROutstanding()); Let's wrap this block in a #ifdef DEBUG, since we're only locking for this assert ::: netwerk/dns/nsHostResolver.h @@ +189,5 @@ > enum { > INIT, STARTED, OK, FAILED > } mTrrAUsed, mTrrAAAAUsed; > > + Mutex mTrr_lock; // lock when accessing the mTrrA[AAA] pointers Let's call it mTrrLock or mTRRLock.

Attachment #8952965 - Flags: review?(valentin.gosu) → review+

Daniel Stenberg [:bagder]

Assignee

Comment 20

•

7 years ago

Attached patch v2-0001-bug-1438947-mutex-nsHostRecord-mTrrA-AAA-accesses.patch — Details — Splinter Review

Attachment #8952965 - Attachment is obsolete: true

Attachment #8953043 - Flags: review+

Daniel Stenberg [:bagder]

Assignee

Updated

•

7 years ago

Keywords: checkin-needed

Pulsebot

Comment 21

•

7 years ago

Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/mozilla-inbound/rev/11edb4e3cdec mutex nsHostRecord::mTrrA[AAA] accesses r=valentin

Keywords: checkin-needed

Andrei Ciure[:aciure]

Comment 22

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/11edb4e3cdec

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

status-firefox60: affected → fixed

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.