Closed
Bug 1422173
Opened 7 years ago
Closed 7 years ago
Crash in nsDNSRecord::GetNextAddr
Categories
(Core :: Networking: DNS, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla59
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox57 | --- | unaffected |
firefox58 | --- | unaffected |
firefox59 | --- | verified |
People
(Reporter: valentin, Assigned: valentin)
References
Details
(Keywords: crash, Whiteboard: [necko-triaged])
Crash Data
Attachments
(2 files)
1.64 KB,
patch
|
mayhemer
:
review+
|
Details | Diff | Splinter Review |
25.41 KB,
patch
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-3ffb699b-8630-466b-9fc5-8c7d90171130. ============================================================= Top 10 frames of crashing thread: 0 xul.dll nsDNSRecord::GetNextAddr netwerk/dns/nsDNSService2.cpp:128 1 xul.dll mozilla::net::nsSocketTransport::RecoverFromError netwerk/base/nsSocketTransport2.cpp:1793 2 xul.dll mozilla::net::nsSocketTransport::OnSocketDetached netwerk/base/nsSocketTransport2.cpp:2332 3 xul.dll mozilla::net::nsSocketTransportService::DetachSocket netwerk/base/nsSocketTransportService2.cpp:259 4 xul.dll mozilla::net::nsSocketTransportService::DoPollIteration netwerk/base/nsSocketTransportService2.cpp:1190 5 xul.dll mozilla::net::nsSocketTransportService::Run netwerk/base/nsSocketTransportService2.cpp:921 6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1033 7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:508 8 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:334 9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:319 ============================================================= From what I can tell, this occurs when nsHostRecord::~nsHostRecord() is called on one thread, and GetNextAddr on another. We can confirm by adding a lock to ~nsHostRecord when deleting addr_info.
Comment 1•7 years ago
|
||
What's the user impact? Are we clear to fix forward, or should we undo our bug 1420677 patch?
Comment 2•7 years ago
|
||
Also, does this mean that we revealed a use-after-free bug? This code is full of surprises :P
Comment 3•7 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #0) > From what I can tell, this occurs when nsHostRecord::~nsHostRecord() is > called on one thread, and GetNextAddr on another. > We can confirm by adding a lock to ~nsHostRecord when deleting addr_info. Yes, because nsHostRecord is not refcounted!
Assignee | ||
Comment 4•7 years ago
|
||
Actually, nsHostRecord does have NS_INLINE_DECL_THREADSAFE_REFCOUNTING and is used with a RefPtr. But I think I found the issue. In nsHostResolver::ResolveHost we don't get a lock before nulling/changing addr_info.
Comment 5•7 years ago
|
||
155 crashes/109 installs so far. There are some UAF addresses so marking this as security sensitive.
Group: network-core-security
Assignee | ||
Comment 6•7 years ago
|
||
MozReview-Commit-ID: A6Qk6G3Y1SY
Attachment #8933627 -
Flags: review?(honzab.moz)
Comment 7•7 years ago
|
||
Seems like a totally reasonable and likely explanation!
Comment 8•7 years ago
|
||
Comment on attachment 8933627 [details] [diff] [review] Use lock when changing nsHostRecord.addr_info Review of attachment 8933627 [details] [diff] [review]: ----------------------------------------------------------------- OK, but there are at least 3 other places we access that member w/o the lock being held (like branch on presence out the lock, later access under the lock w/o rechecking; unlocked check+dereference). We should probably audit this code in another bug. privatizing addr_info (any other members?) on nsHostRecord and have an inline accessor asserting the lock thread ownership would be a wise thing to do. Note that we already do keep nsHostResolver mLock and nsHostRecord addr_info_lock in a cascade.
Attachment #8933627 -
Flags: review?(honzab.moz) → review+
Assignee | ||
Comment 9•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/253690f0ab21b7f963cf2aaef144279d4318795d Bug 1422173 - Use lock when changing nsHostRecord.addr_info r=mayhemer
Comment 10•7 years ago
|
||
> we already do keep nsHostResolver mLock and nsHostRecord addr_info_lock in a cascade
What is a cascade? Googling "multithreaded cascade" gives me nothing.
Flags: needinfo?(honzab.moz)
Comment 11•7 years ago
|
||
(In reply to jthemphill from comment #10) > > we already do keep nsHostResolver mLock and nsHostRecord addr_info_lock in a cascade > > What is a cascade? Googling "multithreaded cascade" gives me nothing. Keeping both: { AutoLock lock(outer); { AutoLock lock2(inner); do something... } } If you do it in reverse order on some other thread: { AutoLock lock(inner); { AutoLock lock2(outer); do something... } } you deadlock.
Flags: needinfo?(honzab.moz)
Updated•7 years ago
|
status-firefox57:
--- → unaffected
status-firefox58:
--- → unaffected
status-firefox-esr52:
--- → unaffected
Comment 12•7 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/253690f0ab21
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Comment 13•7 years ago
|
||
There still are 177 crashes in nightly 59 starting with buildid 20171202100103.
Assignee | ||
Comment 14•7 years ago
|
||
I really expected the patch would have fixed it. At this point, there are two possibilities: 1. this is a race where the socket thread is calling GetNextAddr and some other thread is nulling/changing a smart pointer that leads to the destruction of the objects holding the nsDNSRecord and nsHostRecord. 2. There is an unpaired NS_RELEASE for nsHostRecord that is leading to a double free. 3. [unlikely] some addon or DLL is doing some bad things I will try one more fix; if that doesn't work, we probably should back out all the patches to avoid alienating users.
Comment 15•7 years ago
|
||
There’s still a data race; I have a (somewhat large) patch in the works to harden the code with an RWLock. I’ll postthe patch when I get home, and if you want to push your own more conservative fix that’s fine.
Assignee | ||
Comment 16•7 years ago
|
||
Did you find the data race? I'm quite curious to see what that is.
Comment 17•7 years ago
|
||
Attachment #8934043 -
Flags: review?(valentin.gosu)
Attachment #8934043 -
Flags: review?(honzab.moz)
Comment 18•7 years ago
|
||
Yeah, so there are a few places where we look at addr_info but don't grab the lock. Look at the places in this patch where I didn't remove an existing lock - those are the places you can fix in a more conservative patch. There's a pretty I want to RWLock the entire nsHostResolver struct The short answer is that nsHostResolver::HasUsableResult needs a lock, but doesn't look like it. This patch converts addr_info into a private member and uses public getters which assert the lock, per Honza's suggestion. Note the places where I didn't remove a lock; I think all of them are missing because HasUsableResult doesn't look like it needs a lock. For https://hg.mozilla.org/mozilla-central/file/709f355a7a8c/netwerk/dns/nsHostResolver.cpp#l860 , the only options were to add a helper functions or add another nested if. I detest heavily-nested if blocks, so my patch opts for the helper function. I don't like grabbing locks multiple times within a function. I'd probably recommend redesigning this code so that we apply RWLock to the entire nsHostRecord, and grab the lock as few times as possible. I think the old code existed before RWLock - addr_info was the only mutable data member, and the only tool the original authors had was a full-fledged Mutex. So they put this heavyweight lock on addr_info to emulate RWLock semantics. I don't know much about writing multithreaded code. If this were Rust, I'd just put the resource inside an RWLock object, but that doesn't seem possible with the C++ XPCOM API. Is there an accepted pattern for locking resources? Is my getter-with-a-useless-argument pattern okay? Try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8df0c521d1c14e7ed325c0ff306a520b8cee85f
Assignee | ||
Comment 19•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/90b7c07853acfea969f8834d6f6a9a3b554e695b Bug 1422173 - Backed out changeset 34cfc821e335 (Bug 1417827) r=me https://hg.mozilla.org/integration/mozilla-inbound/rev/c954378b4af38b10562e82161c3d20c6d229cd9e Bug 1422173 - Backed out changeset 27719294cb73 (Bug 1420677) r=me https://hg.mozilla.org/integration/mozilla-inbound/rev/6a51d6a8008168e80742c57eef235bd82998d691 Bug 1422173 - Add comment mentioning leaked addr_info r=me
Updated•7 years ago
|
Attachment #8934043 -
Flags: review?(valentin.gosu)
Attachment #8934043 -
Flags: review?(honzab.moz)
Comment 20•7 years ago
|
||
Cool. I'll file another bug for the race condition, and polish my patch up a bit more.
Comment 21•7 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/c954378b4af3 https://hg.mozilla.org/mozilla-central/rev/90b7c07853ac https://hg.mozilla.org/mozilla-central/rev/6a51d6a80081
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Group: network-core-security → core-security-release
Updated•6 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•