Closed Bug 1368268 Opened 9 years ago Closed 8 years ago

Crash in `anonymous namespace''::ActiveVerifier::StartTracking

Categories

(Core :: Security: Process Sandboxing, defect, P1)

x86
Windows 7
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox53 --- wontfix
firefox54 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- fixed
firefox62 --- fixed

People

(Reporter: jesup, Assigned: bobowen)

Details

(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [post-critsmash-triage][adv-main61+] sb+)

Crash Data

This bug was filed from the Socorro interface and is report bp-0523557b-76a6-4075-8847-5e6e00170527. ============================================================= This is crashing in the sandbox code with a UAF. Frequency is high. Tempted by sec-critical since it's crashing the sandbox with a UAF; this might be expoitable into a sandbox escape (just guessing, but sandbox UAFs make me wary.) URLs are all over the place
NI to abillings for sec level assessment, jimm for triage
Flags: needinfo?(jmathies)
Flags: needinfo?(abillings)
Flags: needinfo?(abillings)
Keywords: sec-highsec-critical
I'm going to default to "critical" here to be safe.
Flags: needinfo?(jmathies)
Whiteboard: sb?
Bob, any thoughts here? Odd that the crash happens off main thread.
Flags: needinfo?(bobowencode)
I wonder what happened in Fx53 which has made this spike up. It seems to have dropped right back down again in 54 beta (when compared to 53 beta) and there are no instances in 55. Fx55 has an update to the chromium sandbox code, so that might explain why it has disappeared. The only likely changeset appears to be: https://chromium.googlesource.com/chromium/src/+/678736a103c3e64acdce5e76910ff077b17cc6c3%5E%21/ Although it's not clear it would have resulted in a UAF. This still doesn't explain why 53 appears to be particularly bad for this, I wonder if we've improved our shared memory usage in 54. Could the spike in 53 have been the GPU process triggering the underlying problem?
Flags: needinfo?(bobowencode)
Whiteboard: sb? → sblc5, sbwc3, sbmc3
This is Windows only.
Whiteboard: sblc5, sbwc3, sbmc3 → sbwc3
A superficially modified version of the chromium patch I mentioned in comment 4, applies, builds and seems to run without any issues for Beta 54. Not sure what we want to do here given that it is late in Beta, appears to be very rare (in 54) and we don't really know if that patch will fix the issue anyway. dveditz - what do you think?
Flags: needinfo?(dveditz)
Wontfix for 53. There are about 300 crashes per week on release. We're going to build RC 54 today. So if we do end up wanting this in 54 it would take spinning a second RC. Maybe it's best to wait one more cycle.
The patch fixes a race--perhaps fx54 changes things around such that the bad timing happens less often? I don't think we need to respin 54 to pick this up, though we should keep an eye on the numbers as we release to see if we need to get this into a point-release ride-along. I'm guessing we'll be fine until 55.
Flags: needinfo?(dveditz)
Keywords: sec-criticalsec-high
(In reply to Daniel Veditz [:dveditz] from comment #8) > The patch fixes a race--perhaps fx54 changes things around such that the bad > timing happens less often? Yes it should stop the race, but as far as I can tell the issue would have resulted in an extra "leaked" ActiveVerifier, although I don't think the other one ever gets deleted either. I'll keep an eye on this as we release.
Group: core-security → dom-core-security
We've only had 4 of these reported in Fx54 so far, so this has definitely dropped back down to a very low level. Unfortunately, we've also had one in 56, so it hasn't gone away completely.
(In reply to Bob Owen (:bobowen) from comment #10) > We've only had 4 of these reported in Fx54 so far, so this has definitely > dropped back down to a very low level. > > Unfortunately, we've also had one in 56, so it hasn't gone away completely. What do you recommend to happen next?
Flags: needinfo?(bobowencode)
(In reply to Frederik Braun [:freddyb] from comment #11) > (In reply to Bob Owen (:bobowen) from comment #10) > > We've only had 4 of these reported in Fx54 so far, so this has definitely > > dropped back down to a very low level. > > > > Unfortunately, we've also had one in 56, so it hasn't gone away completely. > > What do you recommend to happen next? The problem is I don't really see how the crash is occurring. There is a change to the way the lock is created in an update to the chromium code I am working on, although there's nothing in the comments to suggest it was being done because of this sort of problem. I don't think I'll land that update until 57 now.
Flags: needinfo?(bobowencode)
Priority: -- → P3
Whiteboard: sbwc3 → sb+
Component: Security → Security: Process Sandboxing
This is still happening, though I only see a handful of crashes in the last week: bp-a472925f-b7e3-4bf7-ad17-875f40171018
Haven't seen any in 58.0 nightly, but then again I only saw one in 57.0 nightly. There are several in 57 beta but not high volume.
Hi Jim: Please reassign this bug to appropriate developer. Thanks! Wennie
Assignee: nobody → jmathies
Flags: needinfo?(jmathies)
I'll take another look at this, but I couldn't see how this could happen before. I'm also working on an update to the sandbox code, which could help in theory.
Assignee: jmathies → bobowencode
Flags: needinfo?(jmathies) → needinfo?(bobowencode)
I re-looked at this, I thought I'd found something to do with our pre-compile options, but turned out to be confusion caused by inlining. Bug 1366701, which I plan to land early in Fx59 has a small change to this code (which I mentioned in Comment 12, but have only just sorted out all the issues), so there's a chance it might resolve this. As Dan says this hasn't happened in 58a1 at all, be interesting to see what happens on Beta. I'll leave the NI for now.
This is still happening in 58: https://crash-stats.mozilla.com/report/index/ea4ac1ea-3f3d-469d-97d3-d4fd50171220 None so far in 59 but of course less users.
I think I may have a bead on this. If so, this is a gnarly bug caused by the fact that modules mean there are multiple "global" namespaces. This would be a problem upstream. Before the bug, I should point out that the underlying Chromium code has changed again -- they changed the static lock (that I'm conjecturing is at the root of the issue) from a LazyInstance to a proper singleton [1]. The move appears to be inspired by a desire to remove LazyInstance, not by a bug, but we also still don't see _any_ crashes from this in 59, which is about the time it should have hit Firefox (Bob merged it in November). So maybe all of this will be wrong and it was just some esoteric C++ crap that got resolved by the cosmetic change. But I don't think so. The deal is, the crash happens when accessing a map that is shared concurrently. The map is guarded by this lock [2]. Again, previously, it was a LazyInstance, which, for our purposes, can just be viewed as a static value like the current one. Now, the code is supposed to be robust to the fact that different modules have global namespaces. It tries to make sure that the same instance of the ActiveVerifier is shared across modules in the current process -- the main_module_verifier in [3]. This looks correct. They forgot, however, to share the _lock_, which isn't part of the ActiveVerifier instance (maybe for good reason -- it may not be safe to get the instance so that we can fetch the lock from it... without already owning the lock). So each module in the process will operate on the same ActiveVerifier, but will "lock" its contents with the module's own lock. Which isn't "locking" anything. Here's the part I'm unsure of -- we need multiple modules working on this object to run into a problem. Without fully understanding the usage of this code, I'm conjecturing that firefox.exe (which should be the module returned by GetModuleHandle(NULL)) and xul.dll must both be using this, inspiring a race. I think the answer is just to share the lock across modules the same way it currently shares the ActiveVerifier. I can hack that up pretty quick -- I'm still looking at validating some of my assumptions tho. --------------------------- [1] https://codereview.chromium.org/2667513003/ [2] https://searchfox.org/mozilla-central/rev/9f2721b19b10a9a441fa5ca7116720d2fcebcc4a/security/sandbox/chromium/base/win/scoped_handle.cc#45 [3] https://searchfox.org/mozilla-central/rev/9f2721b19b10a9a441fa5ca7116720d2fcebcc4a/security/sandbox/chromium/base/win/scoped_handle.cc#139
Assignee: bobowencode → davidp99
Damn. Missed this. https://searchfox.org/mozilla-central/rev/9f2721b19b10a9a441fa5ca7116720d2fcebcc4a/security/sandbox/chromium/base/win/scoped_handle.cc#80 // The methods required by HandleTraits. They are virtual because we need to // forward the call execution to another module, instead of letting the // compiler call the version that is linked in the current module.
We still have no sightings of this in 59. Maybe it _was_ fixed by the upstream change somehow. Until we see it again, we're going to continue to monitor but operate on the assumption that things are ok.
Bad news here, looks like this is still happening in Beta 59 and Release 58. The signature had changed slightly, so I've removed the anonymous namespace part, which also caused an issue on crash-stats because of the space.
Crash Signature: [@ `anonymous namespace''::ActiveVerifier::StartTracking] → [@ ActiveVerifier::StartTracking]
Assignee: davidp99 → nobody
Assigning unowned critical/high security bugs to triage owner. Please find an appropriate assignee for this bug.
Assignee: nobody → jmathies
I think I've finally worked out what might be going on here. After comparing some dumps with the assembly of a working version, it looked like the internal _Myhead element of the internal list of ActiveVerifier::map_ [1] was getting deleted. This element is for a null handle, so I couldn't see how until I realised that _Myhead is the dummy element used for map_.end(). So what I think is happening is either we're getting an untracked handle or more likely a double closed handle at [2]. The element isn't found and so it returns end(), an iterator pointing to _Myhead. Those CHECKs should crash the process but they don't because we've never defined OFFICIAL_BUILD and our logging is stubbed. So, we hit the map_.erase() and delete the head of map_'s interal list. Which means operations on map_ are going to crash pretty soon. We should probably define OFFICIAL_BUILD as MOZILLA_OFFICIAL and at least make the logging crash when the severity is LOG_FATAL. [1] https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/security/sandbox/chromium/base/win/scoped_handle.cc#101 [2] https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/security/sandbox/chromium/base/win/scoped_handle.cc#210
Assignee: jmathies → bobowencode
Status: NEW → ASSIGNED
Flags: needinfo?(bobowencode)
Priority: P3 → P1
If comment 24 is correct, then the initial fix (to at least make the crash safe) is to make those CHECKs crash. I plan to do that as a non-sec bug blocker to bug 1055227, as it is a perfectly normal patch to land and then means it's not obviously linked to a sec bug, does that sound OK to you?
Flags: needinfo?(dveditz)
That sounds fine.
Flags: needinfo?(dveditz)
(In reply to Bob Owen (:bobowen) from comment #25) > If comment 24 is correct, then the initial fix (to at least make the crash > safe) is to make those CHECKs crash. > I plan to do that as a non-sec bug blocker to bug 1055227, as it is a > perfectly normal patch to land and then means it's not obviously linked to a > sec bug, does that sound OK to you? Which bug is it? :)
Flags: needinfo?(bobowencode)
(In reply to Frederik Braun [:freddyb] from comment #27) ... > Which bug is it? :) Bug 1445167
Flags: needinfo?(bobowencode)
i.e., we can close this bug here?
We've not had it since 60b5 (I think the fix landed in 60b7), so it's looking hopeful. I was going to wait until release, but I guess we can close it. This change will only have turned it into a safe crash, probably in StopTracking. Bug 1452090 landed a change that hooks CloseHandle and DuplicateHandle to try and catch where the real problem. That's only on Nightly for now, so we may not see it and might need to turn it on for beta for a couple of cycles.
We've still had 4 of these so far in Fx60, so it hasn't gone away completely. However, the ActiveVerifier is now turned off in late Beta and Release (in Fx61), so we certainly shouldn't get any then. As this is so rare in Nightly and (early) Beta, where the Verifier is still on, I think we can close this. Just realised that I should get that change (bug 1452090) uplifted to ESR.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Group: dom-core-security → core-security-release
Whiteboard: sb+ → [adv-main61+] sb+
Flags: qe-verify-
Whiteboard: [adv-main61+] sb+ → [post-critsmash-triage][adv-main61+] sb+
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.