Closed Bug 1422173 Opened 7 years ago Closed 7 years ago

Crash in nsDNSRecord::GetNextAddr

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla59

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox57

---

unaffected

firefox58

---

unaffected

firefox59

---

verified

People

(Reporter: valentin, Assigned: valentin)

References

Details

(Keywords: crash, Whiteboard: [necko-triaged])

Crash Data

Attachments

(2 files)

Use lock when changing nsHostRecord.addr_info 7 years ago Valentin Gosu [:valentin] (he/him) 1.64 KB, patch	mayhemer : review+	Details \| Diff \| Splinter Review
sample-fix.patch 7 years ago jthemphill 25.41 KB, patch		Details \| Diff \| Splinter Review

Valentin Gosu [:valentin] (he/him)

Assignee

Description

•

7 years ago

This bug was filed from the Socorro interface and is
report bp-3ffb699b-8630-466b-9fc5-8c7d90171130.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll nsDNSRecord::GetNextAddr netwerk/dns/nsDNSService2.cpp:128
1 xul.dll mozilla::net::nsSocketTransport::RecoverFromError netwerk/base/nsSocketTransport2.cpp:1793
2 xul.dll mozilla::net::nsSocketTransport::OnSocketDetached netwerk/base/nsSocketTransport2.cpp:2332
3 xul.dll mozilla::net::nsSocketTransportService::DetachSocket netwerk/base/nsSocketTransportService2.cpp:259
4 xul.dll mozilla::net::nsSocketTransportService::DoPollIteration netwerk/base/nsSocketTransportService2.cpp:1190
5 xul.dll mozilla::net::nsSocketTransportService::Run netwerk/base/nsSocketTransportService2.cpp:921
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1033
7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:508
8 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:334
9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:319

=============================================================

From what I can tell, this occurs when nsHostRecord::~nsHostRecord() is called on one thread, and GetNextAddr on another.
We can confirm by adding a lock to ~nsHostRecord when deleting addr_info.

jthemphill

Comment 1

•

7 years ago

What's the user impact? Are we clear to fix forward, or should we undo our bug 1420677 patch?

jthemphill

Comment 2

•

7 years ago

Also, does this mean that we revealed a use-after-free bug? This code is full of surprises :P

Honza Bambas (:mayhemer)

Comment 3

•

7 years ago

(In reply to Valentin Gosu [:valentin] from comment #0)
> From what I can tell, this occurs when nsHostRecord::~nsHostRecord() is
> called on one thread, and GetNextAddr on another.
> We can confirm by adding a lock to ~nsHostRecord when deleting addr_info.

Yes, because nsHostRecord is not refcounted!

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 4

•

7 years ago

Actually, nsHostRecord does have NS_INLINE_DECL_THREADSAFE_REFCOUNTING and is used with a RefPtr.
But I think I found the issue. In nsHostResolver::ResolveHost we don't get a lock before nulling/changing addr_info.

Marcia Knous [:marcia]

Comment 5

•

7 years ago

155 crashes/109 installs so far. There are some UAF addresses so marking this as security sensitive.

Group: network-core-security

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 6

•

7 years ago

Attached patch Use lock when changing nsHostRecord.addr_info — Details — Splinter Review

MozReview-Commit-ID: A6Qk6G3Y1SY

Attachment #8933627 - Flags: review?(honzab.moz)

Daniel Stenberg [:bagder]

Comment 7

•

7 years ago

Seems like a totally reasonable and likely explanation!

Honza Bambas (:mayhemer)

Comment 8

•

7 years ago

Comment on attachment 8933627 [details] [diff] [review]
Use lock when changing nsHostRecord.addr_info

Review of attachment 8933627 [details] [diff] [review]:
-----------------------------------------------------------------

OK, but there are at least 3 other places we access that member w/o the lock being held (like branch on presence out the lock, later access under the lock w/o rechecking; unlocked check+dereference).  We should probably audit this code in another bug.  privatizing addr_info (any other members?) on nsHostRecord and have an inline accessor asserting the lock thread ownership would be a wise thing to do.

Note that we already do keep nsHostResolver mLock and nsHostRecord addr_info_lock in a cascade.

Attachment #8933627 - Flags: review?(honzab.moz) → review+

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 9

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/253690f0ab21b7f963cf2aaef144279d4318795d
Bug 1422173 - Use lock when changing nsHostRecord.addr_info r=mayhemer

jthemphill

Comment 10

•

7 years ago

> we already do keep nsHostResolver mLock and nsHostRecord addr_info_lock in a cascade

What is a cascade? Googling "multithreaded cascade" gives me nothing.

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Comment 11

•

7 years ago

(In reply to jthemphill from comment #10)
> > we already do keep nsHostResolver mLock and nsHostRecord addr_info_lock in a cascade
> 
> What is a cascade? Googling "multithreaded cascade" gives me nothing.

Keeping both:

{ 
  AutoLock lock(outer);
  {
     AutoLock lock2(inner);
     do something...
  }
}

If you do it in reverse order on some other thread:

{ 
  AutoLock lock(inner);
  {
     AutoLock lock2(outer);
     do something...
  }
}

you deadlock.

Flags: needinfo?(honzab.moz)

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox57: --- → unaffected

status-firefox58: --- → unaffected

status-firefox-esr52: --- → unaffected

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 12

•

7 years ago

https://hg.mozilla.org/mozilla-central/rev/253690f0ab21

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox59: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla59

Calixte Denizet (:calixte)

Comment 13

•

7 years ago

There still are 177 crashes in nightly 59 starting with buildid 20171202100103.

Status: RESOLVED → REOPENED

status-firefox59: fixed → affected

Resolution: FIXED → ---

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 14

•

7 years ago

I really expected the patch would have fixed it.
At this point, there are two possibilities:
 1. this is a race where the socket thread is calling GetNextAddr and some other thread is nulling/changing a smart pointer that leads to the destruction of the objects holding the nsDNSRecord and nsHostRecord.
 2. There is an unpaired NS_RELEASE for nsHostRecord that is leading to a double free.
 3. [unlikely] some addon or DLL is doing some bad things

I will try one more fix; if that doesn't work, we probably should back out all the patches to avoid alienating users.

jthemphill

Comment 15

•

7 years ago

There’s still a data race; I have a (somewhat large) patch in the works to harden the code with an RWLock. I’ll postthe patch when I get home, and if you want to push your own more conservative fix that’s fine.

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 16

•

7 years ago

Did you find the data race? I'm quite curious to see what that is.

jthemphill

Comment 17

•

7 years ago

Attached patch sample-fix.patch — Details — Splinter Review

Attachment #8934043 - Flags: review?(valentin.gosu)

Attachment #8934043 - Flags: review?(honzab.moz)

jthemphill

Comment 18

•

7 years ago

Yeah, so there are a few places where we look at addr_info but don't grab the lock. Look at the places in this patch where I didn't remove an existing lock - those are the places you can fix in a more conservative patch. There's a pretty I want to RWLock the entire nsHostResolver struct  The short answer is that nsHostResolver::HasUsableResult needs a lock, but doesn't look like it. This patch converts addr_info into a private member and uses public getters which assert the lock, per Honza's suggestion. Note the places where I didn't remove a lock; I think all of them are missing because HasUsableResult doesn't look like it needs a lock.

For https://hg.mozilla.org/mozilla-central/file/709f355a7a8c/netwerk/dns/nsHostResolver.cpp#l860 , the only options were to add a helper functions or add another nested if. I detest heavily-nested if blocks, so my patch opts for the helper function.

I don't like grabbing locks multiple times within a function. I'd probably recommend redesigning this code so that we apply RWLock to the entire nsHostRecord, and grab the lock as few times as possible. I think the old code existed before RWLock - addr_info was the only mutable data member, and the only tool the original authors had was a full-fledged Mutex. So they put this heavyweight lock on addr_info to emulate RWLock semantics.

I don't know much about writing multithreaded code. If this were Rust, I'd just put the resource inside an RWLock object, but that doesn't seem possible with the C++ XPCOM API. Is there an accepted pattern for locking resources? Is my getter-with-a-useless-argument pattern okay?

Try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8df0c521d1c14e7ed325c0ff306a520b8cee85f

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 19

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/90b7c07853acfea969f8834d6f6a9a3b554e695b
Bug 1422173 - Backed out changeset 34cfc821e335 (Bug 1417827) r=me

https://hg.mozilla.org/integration/mozilla-inbound/rev/c954378b4af38b10562e82161c3d20c6d229cd9e
Bug 1422173 - Backed out changeset 27719294cb73 (Bug 1420677) r=me

https://hg.mozilla.org/integration/mozilla-inbound/rev/6a51d6a8008168e80742c57eef235bd82998d691
Bug 1422173 - Add comment mentioning leaked addr_info r=me

jthemphill

Updated

•

7 years ago

Attachment #8934043 - Flags: review?(valentin.gosu)

Attachment #8934043 - Flags: review?(honzab.moz)

jthemphill

Comment 20

•

7 years ago

Cool. I'll file another bug for the race condition, and polish my patch up a bit more.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 21

•

7 years ago

https://hg.mozilla.org/mozilla-central/rev/c954378b4af3
https://hg.mozilla.org/mozilla-central/rev/90b7c07853ac
https://hg.mozilla.org/mozilla-central/rev/6a51d6a80081

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

status-firefox59: affected → fixed

Resolution: --- → FIXED

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: network-core-security → core-security-release

Calixte Denizet (:calixte)

Comment 22

•

7 years ago

No more crashes in nightly 59.

Status: RESOLVED → VERIFIED

status-firefox59: fixed → verified

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security-release

You need to log in before you can comment on or make changes to this bug.