1368268 - Crash in `anonymous namespace''::ActiveVerifier::StartTracking

Reporter

Description

•

9 years ago

This bug was filed from the Socorro interface and is report bp-0523557b-76a6-4075-8847-5e6e00170527. ============================================================= This is crashing in the sandbox code with a UAF. Frequency is high. Tempted by sec-critical since it's crashing the sandbox with a UAF; this might be expoitable into a sandbox escape (just guessing, but sandbox UAFs make me wary.) URLs are all over the place

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 1

•

9 years ago

NI to abillings for sec level assessment, jimm for triage

Flags: needinfo?(jmathies)

Flags: needinfo?(abillings)

Al Billings [:abillings - ex-MoCo]

Updated

•

9 years ago

Flags: needinfo?(abillings)

Keywords: sec-high → sec-critical

Al Billings [:abillings - ex-MoCo]

Comment 2

•

9 years ago

I'm going to default to "critical" here to be safe.

Jim Mathies [:jimm]

Updated

•

9 years ago

Flags: needinfo?(jmathies)

Whiteboard: sb?

Jim Mathies [:jimm]

Comment 3

•

9 years ago

Bob, any thoughts here? Odd that the crash happens off main thread.

Flags: needinfo?(bobowencode)

Bob Owen (:bobowen)

Assignee

Comment 4

•

9 years ago

I wonder what happened in Fx53 which has made this spike up. It seems to have dropped right back down again in 54 beta (when compared to 53 beta) and there are no instances in 55. Fx55 has an update to the chromium sandbox code, so that might explain why it has disappeared. The only likely changeset appears to be: https://chromium.googlesource.com/chromium/src/+/678736a103c3e64acdce5e76910ff077b17cc6c3%5E%21/ Although it's not clear it would have resulted in a UAF. This still doesn't explain why 53 appears to be particularly bad for this, I wonder if we've improved our shared memory usage in 54. Could the spike in 53 have been the GPU process triggering the underlying problem?

Flags: needinfo?(bobowencode)

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

status-firefox53: --- → affected

status-firefox54: --- → affected

status-firefox55: --- → ?

status-firefox-esr52: --- → affected

Jim Mathies [:jimm]

Updated

•

9 years ago

Whiteboard: sb? → sblc5, sbwc3, sbmc3

Bob Owen (:bobowen)

Assignee

Comment 5

•

9 years ago

This is Windows only.

Whiteboard: sblc5, sbwc3, sbmc3 → sbwc3

Bob Owen (:bobowen)

Assignee

Comment 6

•

9 years ago

A superficially modified version of the chromium patch I mentioned in comment 4, applies, builds and seems to run without any issues for Beta 54. Not sure what we want to do here given that it is late in Beta, appears to be very rare (in 54) and we don't really know if that patch will fix the issue anyway. dveditz - what do you think?

Flags: needinfo?(dveditz)

Liz Henry (:lizzard)

Comment 7

•

9 years ago

Wontfix for 53. There are about 300 crashes per week on release. We're going to build RC 54 today. So if we do end up wanting this in 54 it would take spinning a second RC. Maybe it's best to wait one more cycle.

status-firefox53: affected → wontfix

Daniel Veditz [:dveditz]

Comment 8

•

9 years ago

The patch fixes a race--perhaps fx54 changes things around such that the bad timing happens less often? I don't think we need to respin 54 to pick this up, though we should keep an eye on the numbers as we release to see if we need to get this into a point-release ride-along. I'm guessing we'll be fine until 55.

Flags: needinfo?(dveditz)

Keywords: sec-critical → sec-high

Bob Owen (:bobowen)

Assignee

Comment 9

•

9 years ago

(In reply to Daniel Veditz [:dveditz] from comment #8) > The patch fixes a race--perhaps fx54 changes things around such that the bad > timing happens less often? Yes it should stop the race, but as far as I can tell the issue would have resulted in an extra "leaked" ActiveVerifier, although I don't think the other one ever gets deleted either. I'll keep an eye on this as we release.

Daniel Veditz [:dveditz]

Updated

•

9 years ago

Group: core-security → dom-core-security

Bob Owen (:bobowen)

Assignee

Comment 10

•

9 years ago

We've only had 4 of these reported in Fx54 so far, so this has definitely dropped back down to a very low level. Unfortunately, we've also had one in 56, so it hasn't gone away completely.

Frederik Braun [:freddy]

Comment 11

•

8 years ago

(In reply to Bob Owen (:bobowen) from comment #10) > We've only had 4 of these reported in Fx54 so far, so this has definitely > dropped back down to a very low level. > > Unfortunately, we've also had one in 56, so it hasn't gone away completely. What do you recommend to happen next?

Flags: needinfo?(bobowencode)

Bob Owen (:bobowen)

Assignee

Comment 12

•

8 years ago

(In reply to Frederik Braun [:freddyb] from comment #11) > (In reply to Bob Owen (:bobowen) from comment #10) > > We've only had 4 of these reported in Fx54 so far, so this has definitely > > dropped back down to a very low level. > > > > Unfortunately, we've also had one in 56, so it hasn't gone away completely. > > What do you recommend to happen next? The problem is I don't really see how the crash is occurring. There is a change to the way the lock is created in an update to the chromium code I am working on, although there's nothing in the comments to suggest it was being done because of this sort of problem. I don't think I'll land that update until 57 now.

Flags: needinfo?(bobowencode)

Jim Mathies [:jimm]

Updated

•

8 years ago

Priority: -- → P3

Whiteboard: sbwc3 → sb+

Daniel Veditz [:dveditz]

Updated

•

8 years ago

Component: Security → Security: Process Sandboxing

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

8 years ago

status-firefox54: affected → wontfix

status-firefox55: ? → ---

status-firefox56: --- → affected

status-firefox57: --- → affected

status-firefox58: --- → ?

Andrew McCreight [:mccr8]

Comment 13

•

8 years ago

This is still happening, though I only see a handful of crashes in the last week: bp-a472925f-b7e3-4bf7-ad17-875f40171018

Daniel Veditz [:dveditz]

Comment 14

•

8 years ago

Haven't seen any in 58.0 nightly, but then again I only saw one in 57.0 nightly. There are several in 57 beta but not high volume.

status-firefox56: affected → wontfix

Wennie

Comment 15

•

8 years ago

Hi Jim: Please reassign this bug to appropriate developer. Thanks! Wennie

Assignee: nobody → jmathies

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

8 years ago

Flags: needinfo?(jmathies)

Bob Owen (:bobowen)

Assignee

Comment 16

•

8 years ago

I'll take another look at this, but I couldn't see how this could happen before. I'm also working on an update to the sandbox code, which could help in theory.

Assignee: jmathies → bobowencode

Flags: needinfo?(jmathies) → needinfo?(bobowencode)

Bob Owen (:bobowen)

Assignee

Comment 17

•

8 years ago

I re-looked at this, I thought I'd found something to do with our pre-compile options, but turned out to be confusion caused by inlining. Bug 1366701, which I plan to land early in Fx59 has a small change to this code (which I mentioned in Comment 12, but have only just sorted out all the issues), so there's a chance it might resolve this. As Dan says this hasn't happened in 58a1 at all, be interesting to see what happens on Beta. I'll leave the NI for now.

Gian-Carlo Pascutto [:gcp]

Comment 18

•

8 years ago

This is still happening in 58: https://crash-stats.mozilla.com/report/index/ea4ac1ea-3f3d-469d-97d3-d4fd50171220 None so far in 59 but of course less users.

David Parks [:handyman]

Comment 19

•

8 years ago

I think I may have a bead on this. If so, this is a gnarly bug caused by the fact that modules mean there are multiple "global" namespaces. This would be a problem upstream. Before the bug, I should point out that the underlying Chromium code has changed again -- they changed the static lock (that I'm conjecturing is at the root of the issue) from a LazyInstance to a proper singleton [1]. The move appears to be inspired by a desire to remove LazyInstance, not by a bug, but we also still don't see _any_ crashes from this in 59, which is about the time it should have hit Firefox (Bob merged it in November). So maybe all of this will be wrong and it was just some esoteric C++ crap that got resolved by the cosmetic change. But I don't think so. The deal is, the crash happens when accessing a map that is shared concurrently. The map is guarded by this lock [2]. Again, previously, it was a LazyInstance, which, for our purposes, can just be viewed as a static value like the current one. Now, the code is supposed to be robust to the fact that different modules have global namespaces. It tries to make sure that the same instance of the ActiveVerifier is shared across modules in the current process -- the main_module_verifier in [3]. This looks correct. They forgot, however, to share the _lock_, which isn't part of the ActiveVerifier instance (maybe for good reason -- it may not be safe to get the instance so that we can fetch the lock from it... without already owning the lock). So each module in the process will operate on the same ActiveVerifier, but will "lock" its contents with the module's own lock. Which isn't "locking" anything. Here's the part I'm unsure of -- we need multiple modules working on this object to run into a problem. Without fully understanding the usage of this code, I'm conjecturing that firefox.exe (which should be the module returned by GetModuleHandle(NULL)) and xul.dll must both be using this, inspiring a race. I think the answer is just to share the lock across modules the same way it currently shares the ActiveVerifier. I can hack that up pretty quick -- I'm still looking at validating some of my assumptions tho. --------------------------- [1] https://codereview.chromium.org/2667513003/ [2] https://searchfox.org/mozilla-central/rev/9f2721b19b10a9a441fa5ca7116720d2fcebcc4a/security/sandbox/chromium/base/win/scoped_handle.cc#45 [3] https://searchfox.org/mozilla-central/rev/9f2721b19b10a9a441fa5ca7116720d2fcebcc4a/security/sandbox/chromium/base/win/scoped_handle.cc#139

Assignee: bobowencode → davidp99

David Parks [:handyman]

Comment 20

•

8 years ago

Damn. Missed this. https://searchfox.org/mozilla-central/rev/9f2721b19b10a9a441fa5ca7116720d2fcebcc4a/security/sandbox/chromium/base/win/scoped_handle.cc#80 // The methods required by HandleTraits. They are virtual because we need to // forward the call execution to another module, instead of letting the // compiler call the version that is linked in the current module.

David Parks [:handyman]

Comment 21

•

8 years ago

We still have no sightings of this in 59. Maybe it _was_ fixed by the upstream change somehow. Until we see it again, we're going to continue to monitor but operate on the assumption that things are ok.

Bob Owen (:bobowen)

Assignee

Comment 22

•

8 years ago

Bad news here, looks like this is still happening in Beta 59 and Release 58. The signature had changed slightly, so I've removed the anonymous namespace part, which also caused an issue on crash-stats because of the space.

Crash Signature: [@ `anonymous namespace''::ActiveVerifier::StartTracking] → [@ ActiveVerifier::StartTracking]

David Parks [:handyman]

Updated

•

8 years ago

Assignee: davidp99 → nobody

Daniel Veditz [:dveditz]

Comment 23

•

8 years ago

Assigning unowned critical/high security bugs to triage owner. Please find an appropriate assignee for this bug.

Assignee: nobody → jmathies

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

8 years ago

status-firefox57: affected → wontfix

status-firefox58: ? → fix-optional

status-firefox59: --- → affected

status-firefox60: --- → ?

Bob Owen (:bobowen)

Assignee

Comment 24

•

8 years ago

I think I've finally worked out what might be going on here. After comparing some dumps with the assembly of a working version, it looked like the internal _Myhead element of the internal list of ActiveVerifier::map_ [1] was getting deleted. This element is for a null handle, so I couldn't see how until I realised that _Myhead is the dummy element used for map_.end(). So what I think is happening is either we're getting an untracked handle or more likely a double closed handle at [2]. The element isn't found and so it returns end(), an iterator pointing to _Myhead. Those CHECKs should crash the process but they don't because we've never defined OFFICIAL_BUILD and our logging is stubbed. So, we hit the map_.erase() and delete the head of map_'s interal list. Which means operations on map_ are going to crash pretty soon. We should probably define OFFICIAL_BUILD as MOZILLA_OFFICIAL and at least make the logging crash when the severity is LOG_FATAL. [1] https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/security/sandbox/chromium/base/win/scoped_handle.cc#101 [2] https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/security/sandbox/chromium/base/win/scoped_handle.cc#210

Assignee: jmathies → bobowencode

Status: NEW → ASSIGNED

Flags: needinfo?(bobowencode)

Bob Owen (:bobowen)

Assignee

Updated

•

8 years ago

Priority: P3 → P1

Bob Owen (:bobowen)

Assignee

Comment 25

•

8 years ago

If comment 24 is correct, then the initial fix (to at least make the crash safe) is to make those CHECKs crash. I plan to do that as a non-sec bug blocker to bug 1055227, as it is a perfectly normal patch to land and then means it's not obviously linked to a sec bug, does that sound OK to you?

Flags: needinfo?(dveditz)

Daniel Veditz [:dveditz]

Comment 26

•

8 years ago

That sounds fine.

Flags: needinfo?(dveditz)

Frederik Braun [:freddy]

Comment 27

•

8 years ago

(In reply to Bob Owen (:bobowen) from comment #25) > If comment 24 is correct, then the initial fix (to at least make the crash > safe) is to make those CHECKs crash. > I plan to do that as a non-sec bug blocker to bug 1055227, as it is a > perfectly normal patch to land and then means it's not obviously linked to a > sec bug, does that sound OK to you? Which bug is it? :)

Flags: needinfo?(bobowencode)

Bob Owen (:bobowen)

Assignee

Comment 28

•

8 years ago

(In reply to Frederik Braun [:freddyb] from comment #27) ... > Which bug is it? :) Bug 1445167

Flags: needinfo?(bobowencode)

Frederik Braun [:freddy]

Comment 29

•

8 years ago

i.e., we can close this bug here?

Bob Owen (:bobowen)

Assignee

Comment 30

•

8 years ago

We've not had it since 60b5 (I think the fix landed in 60b7), so it's looking hopeful. I was going to wait until release, but I guess we can close it. This change will only have turned it into a safe crash, probably in StopTracking. Bug 1452090 landed a change that hooks CloseHandle and DuplicateHandle to try and catch where the real problem. That's only on Nightly for now, so we may not see it and might need to turn it on for beta for a couple of cycles.

Bob Owen (:bobowen)

Assignee

Comment 31

•

8 years ago

We've still had 4 of these so far in Fx60, so it hasn't gone away completely. However, the ActiveVerifier is now turned off in late Beta and Release (in Fx61), so we certainly shouldn't get any then. As this is so rare in Nightly and (early) Beta, where the Verifier is still on, I think we can close this. Just realised that I should get that change (bug 1452090) uplifted to ESR.

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox58: fix-optional → wontfix

status-firefox59: affected → wontfix

status-firefox60: ? → wontfix

status-firefox61: --- → fixed

status-firefox62: --- → fixed

Resolution: --- → FIXED

Daniel Veditz [:dveditz]

Updated

•

8 years ago

Group: dom-core-security → core-security-release

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

status-firefox-esr52: affected → unaffected

status-firefox-esr60: --- → wontfix

Al Billings [:abillings - ex-MoCo]

Updated

•

8 years ago

Whiteboard: sb+ → [adv-main61+] sb+

Andrei Vaida [:avaida]

Updated

•

8 years ago

Flags: qe-verify-

Whiteboard: [adv-main61+] sb+ → [post-critsmash-triage][adv-main61+] sb+

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security-release