<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Andrew Overholt [:overholt]

Updated

•

8 years ago

Flags: needinfo?

Daniel Veditz [:dveditz]

Updated

•

8 years ago

Group: core-security → crypto-core-security

Comment 1

•

8 years ago

David, can you provide some guidance about next steps or help move this forward?

Flags: needinfo?(dkeeler)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 2

•

8 years ago

Bas, is this what you're seeing in your branch?

Flags: needinfo?(bas)

Bas Schouten (:bas.schouten)

Comment 3

•

8 years ago

(In reply to Milan Sreckovic [:milan] from comment #2) > Bas, is this what you're seeing in your branch? Similar, yes (as in, there's a UAF of a hash table), however the UAF I'm seeing is in a completely different object and not just on shutdown. So highly unlikely to be related to this bug in any way.

Flags: needinfo?(bas)

Andrew McCreight [:mccr8]

Comment 4

•

8 years ago

I'll mark this as sec-moderate, as it is unlikely to be exploitable at this point in shutdown.

Keywords: csectype-uaf, sec-moderate

Comment 5

•

8 years ago

I've looked at this a couple of times and hit a bit of a wall. The code isn't obviously wrong and I haven't been able to reproduce this. My current theory is that maybe there's a rarely-triggered bug in the hash table code that leaves dangling pointers to hash table buckets? (see e.g. crashes like bp-c8fecd6c-bcf5-4a4c-afef-080a32161213 ). I think to move forward we either need more eyes or more expertise getting pertinent details out of crash reports (e.g. it seems to happen more on 32-bit builds).

Flags: needinfo?(dkeeler)

Comment 6

•

8 years ago

I also took a look at these crashes in the hopes of discovering something... (In reply to David Keeler [:keeler] (use needinfo?) from comment #5) > I've looked at this a couple of times and hit a bit of a wall. The code > isn't obviously wrong and I haven't been able to reproduce this. My current > theory is that maybe there's a rarely-triggered bug in the hash table code > that leaves dangling pointers to hash table buckets? (see e.g. crashes like > bp-c8fecd6c-bcf5-4a4c-afef-080a32161213 ). AFAIK 0xe5e5e5e5 is jemalloc's freed memory poison value. It's a bit worrying that in all of these crashes only the lower 32 bits are poisoned (there's a 100% correlation with crash address being 0xffffffffe5e5e5e5), but I think the crash report above is happening because the freed memory is reallocated again maybe? I discovered something interesting in some of these crash reports (for example the one in comment 0). Sometimes on another thread, ExceptionHandler::HandlePureVirtualCall() is on the stack! This means that we're crashing because of a pure virtual function call, I think. Ted, Nick, do you have any idea if we can extract some info about which virtual function the crash is originating from? Also, have you looked at such crashes before? Is it possible that the crash can be misattributed to some other code for some reason?

Flags: needinfo?(ted)

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 7

•

8 years ago

Following the same train of thought, it's very curious that I discovered that a few calls before calling NSS_Shutdown, we call evaporateAllNSSResources() <http://searchfox.org/mozilla-central/rev/4f5f9f3222f35fa4635ea96a0c963c130854ef19/security/manager/ssl/nsNSSComponent.cpp#1963> which gets to <http://searchfox.org/mozilla-central/rev/4f5f9f3222f35fa4635ea96a0c963c130854ef19/security/manager/ssl/nsNSSShutDown.cpp#176> which calls a pure virtual function! <http://searchfox.org/mozilla-central/rev/4f5f9f3222f35fa4635ea96a0c963c130854ef19/security/manager/ssl/nsNSSShutDown.h#222> Reading through the comments about this function, it seems like there may be some callers who don't adhere to the contract precisely. For example based on a cursory look, <http://searchfox.org/mozilla-central/rev/4f5f9f3222f35fa4635ea96a0c963c130854ef19/security/manager/ssl/CryptoTask.cpp#16> seems to not call ReleaseNSSReferences() when it should, but the practice seems to be common place... I couldn't really find anything concrete down this line of investigation, but perhaps David can think of something?

Comment 8

•

8 years ago

I don't have any additional insight here, sorry.

Flags: needinfo?(n.nethercote)

(not currently active) Ted Mielczarek

Comment 9

•

8 years ago

I'm pretty sure CryptoTask does the right thing, but it's possible that other implementations of nsNSSShutDownObject don't (it has definitely happened in the past). I'll take a closer look tomorrow.

Comment 10

•

8 years ago

(In reply to :Ehsan Akhgari from comment #6) > I also took a look at these crashes in the hopes of discovering something... > > (In reply to David Keeler [:keeler] (use needinfo?) from comment #5) > > I've looked at this a couple of times and hit a bit of a wall. The code > > isn't obviously wrong and I haven't been able to reproduce this. My current > > theory is that maybe there's a rarely-triggered bug in the hash table code > > that leaves dangling pointers to hash table buckets? (see e.g. crashes like > > bp-c8fecd6c-bcf5-4a4c-afef-080a32161213 ). > > AFAIK 0xe5e5e5e5 is jemalloc's freed memory poison value. It's a bit > worrying that in all of these crashes only the lower 32 bits are poisoned > (there's a 100% correlation with crash address being 0xffffffffe5e5e5e5), > but I think the crash report above is happening because the freed memory is > reallocated again maybe? This is just Breakpad sign-extending the 32-bit crash address to 64-bits, FYI. If you look in the "raw dump" tab you can see the register values for the crashing frame. For the crash in comment 0 it has: "eax": "0xe5e5e5e5", "ebp": "0xe5e5e5e5",

Flags: needinfo?(ted)

(not currently active) Ted Mielczarek

Comment 11

•

8 years ago

(In reply to :Ehsan Akhgari from comment #6) > I discovered something interesting in some of these crash reports (for > example the one in comment 0). Sometimes on another thread, > ExceptionHandler::HandlePureVirtualCall() is on the stack! This means that > we're crashing because of a pure virtual function call, I think. I think this is a red herring. Frame 12 of Thread 2 in that crash is MiniDumpWriteDump, which wouldn't call back into that function. Additionally, if you look at the raw dump, that frame was found by stack scanning.

(not currently active) Ted Mielczarek

Comment 12

•

8 years ago

Attached file crash disassembly for d01bf5ef-0c13-45ca-8d03-fa15d2161122 — Details

Here's the disassembly from MSVC. It's crashing at `5D3C8EF3 mov ebp,dword ptr [eax] ` because eax contains the poison pattern `e5e5e5e5`. The locals in `PL_HashTableDestroy` according to MSVC have he = 0xe5e5e5 (which I believe is what's in eax), n = 64, and i = 2.

Comment 13

•

8 years ago

Thanks Ted, so we're definitely dealing with a freed hash table here. David, is that something that can happen if my theory is true?

Comment 14

•

8 years ago

AFAICT, the hash table in question is NSS' dynamic object id (OID) table. It gets created on the first call to SECOID_AddEntry[0] (secoid_HashDynamicOiddata, really, but that's only called from SECOID_AddEntry). It's protected by a lock in a manner that appears to be correct (it's possible that if the process has forked, acquiring the lock when destroying the table will be skipped, but I don't think that's causing this because a) we should be in the parent process anyway and b) even if the lock were skipped, I'm fairly sure all of this is happening on the main thread, so the lock is maybe irrelevant). This suggests that either the hash table implementation is buggy or the way the secoid_* implementation uses it is incorrect in some way that isn't immediately obvious. Note for reference that we only ever change the table by calling SECOID_AddEntry from RegisterDynamicOids[1] and LoadExtendedValidationInfo[2] (each of those functions should only run once per run of Firefox). The table is destroyed only once, in SECOID_Shutdown [3]. But to answer your question, if the issue is the that the hash table has already been freed or has been corrupted by buggy code, it wouldn't be a result of incorrectly implementing nsNSSShutDownObject. [0] https://dxr.mozilla.org/mozilla-central/rev/f179934df0c1bab590c558485d419c7910e41325/security/nss/lib/util/secoid.c#1887 [1] https://dxr.mozilla.org/mozilla-central/rev/f179934df0c1bab590c558485d419c7910e41325/security/manager/ssl/nsNSSCertHelper.cpp#1779 [2] https://dxr.mozilla.org/mozilla-central/rev/f179934df0c1bab590c558485d419c7910e41325/security/certverifier/ExtendedValidation.cpp#1271 [3] https://dxr.mozilla.org/mozilla-central/rev/f179934df0c1bab590c558485d419c7910e41325/security/nss/lib/util/secoid.c#2262

Comment 15

•

8 years ago

Sorry, I have no idea what's going on here. :( The hashtable usage looks correct as you said, and I personally think it's extremely unlikely that we're hitting a bug in the hashtable code that only manifests here, since this code doesn't even seem to do anything fancy with the hashtable. The reliability of the garbage value we're seeing here across all crashes with this signature also makes theories such as memory corruption extremely unlikely. I'm out of ideas.

Comment 16

•

8 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #12) > Created attachment 8821192 [details] > crash disassembly for d01bf5ef-0c13-45ca-8d03-fa15d2161122 > > Here's the disassembly from MSVC. It's crashing at `5D3C8EF3 mov > ebp,dword ptr [eax] ` because eax contains the poison pattern `e5e5e5e5`. > > The locals in `PL_HashTableDestroy` according to MSVC have he = 0xe5e5e5 > (which I believe is what's in eax), n = 64, and i = 2. My presumption from he=e5e5 would be that something removed an entry from the hashtable while we were destroying it... presumably from another thread. Even if that seems impossible.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 17

•

8 years ago

"(it's possible that if the process has forked, acquiring the lock when destroying the table will be skipped, but I don't think that's causing this because a) we should be in the parent process anyway and b) even if the lock were skipped, I'm fairly sure all of this is happening on the main thread, so the lock is maybe irrelevant)" The crash is happening in MainThread. In other threads I see DecodePool threads and DOM workers, though I imagine they exist in Master and child processes - but I think profile-before-change only happens in the Master, IIRC. What is the point of the "SKIP_IF_FORKED"?

Comment 18

•

8 years ago

(In reply to Randell Jesup [:jesup] from comment #16) > ... > > My presumption from he=e5e5 would be that something removed an entry from > the hashtable while we were destroying it... presumably from another thread. > Even if that seems impossible. Or added an entry, which made the hash table grow, which meant there was some freeing going on.

Comment 19

•

8 years ago

Comment 14 suggests that the only accesses to this hashtable come from the main thread. :(

Comment 20

•

8 years ago

Still no reports of this beyond Fx50 either.

[:philipp]

Reporter

Comment 21

•

8 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #20) > Still no reports of this beyond Fx50 either. so far it was only common in rc builds on beta and pretty much non-existent on other pre-release builds though: https://crash-stats.mozilla.com/search/?signature=%3DPL_HashTableDestroy%20%7C%20SECOID_Shutdown&release_channel=beta&release_channel=aurora&release_channel=nightly&date=%3E%3D2016-06-01T00%3A00%3A00.000Z&_facets=version&_facets=user_comments&_facets=uptime&_facets=install_time&_facets=platform_pretty_version&_facets=useragent_locale&_columns=date&_columns=version&_columns=platform#facet-version

Comment 22

•

8 years ago

We're seeing some crashes in 51.0b builds now. Still a high crasher in release. :-(

status-firefox51: ? → affected

Comment 23

•

8 years ago

No reports on 52 yet, though.

status-firefox50: affected → wontfix

Comment 24

•

8 years ago

Looks like the Fx52 RC build is hitting this :(

status-firefox51: affected → wontfix

status-firefox52: ? → affected

status-firefox54: --- → ?

status-firefox-esr52: --- → affected

Liz Henry (:lizzard) (relman/hg->git project)

Comment 25

•

8 years ago

And a single hit on 53.0b6 now :\

status-firefox52: affected → wontfix

status-firefox53: ? → affected

tracking-firefox50: ? → ---

Updated

•

8 years ago

status-firefox53: affected → wontfix

Updated

•

8 years ago

Priority: -- → P3

Whiteboard: [psm-backlog]

Comment 26

•

7 years ago

Still happening at significant volume (36/week in 57 and 58) -- 1500(!) per week in all versions (mostly 56 release). High volume UAF, though on shutdown, which I believe is why it's moderate. Affects (shutdown) stability too. Is P3 still appropriate?

status-firefox54: ? → affected

status-firefox56: --- → affected

status-firefox57: --- → affected

status-firefox58: --- → ?

Flags: needinfo?(dkeeler)

Comment 27

•

7 years ago

It's possible bug 1372656 made this more frequent (LoadExtendedValidationInfo is now being called off the main thread), but since that landed in 57 it wouldn't account for the increased occurrences in 56 (unless that's due to 56 having a much higher volume of users?). Realistically the only way this is going to get "fixed" is by us never shutting down NSS, which depends on first moving to the sqlite db format (bug 783994).

Flags: needinfo?(dkeeler)

Julien Cristau [:jcristau]

Comment 28

•

7 years ago

> Realistically the only way this is going to get "fixed" is by us never > shutting down NSS, which depends on first moving to the sqlite db format > (bug 783994). This is software... unless there's a horribly-bad impossible-to-rescue design bug, this should be "fixable" - if we can figure out the cause. And at 1500 a week, it would seem amenable to attack by landing safety checks (even expensive ones) in Nightly, even if we have to make them Nightly-only. And it's a shutdown crash, so slower shutdown isn't a huge problem (nor are RELEASE_ASSERTs). We just have to want to resolve it enough to commit some resources to do so.

Updated

•

7 years ago

status-firefox54: affected → wontfix

status-firefox56: affected → wontfix

status-firefox57: affected → wontfix

status-firefox58: ? → ---

Andrew McCreight [:mccr8]

Comment 29

•

7 years ago

I guess my point was mostly that we're already working on (and reasonably close to) fixing all bugs in the category of "something goes wrong when we try to shut NSS down" and that it doesn't make sense to divert resources to this one bug in the meantime.

Updated

•

7 years ago

Crash Signature: [@ PL_HashTableDestroy | SECOID_Shutdown] → [@ PL_HashTableDestroy | SECOID_Shutdown] [@ arena_t::DallocSmall | static void arena_dalloc ]

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Liz Henry (:lizzard) (relman/hg->git project)

Comment 32

•

7 years ago

This is showing only a couple of crashes in 61 and we're close to the end of the release cycle. No crashes showing up in 62 or 63. Calling this a wontfix for 61.

status-firefox61: --- → wontfix

status-firefox62: --- → ?

[:philipp]

Reporter

Updated

•

7 years ago

status-firefox62: ? → affected

status-firefox63: --- → affected

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

6 years ago

status-firefox62: affected → wontfix

status-firefox63: affected → fix-optional

status-firefox64: --- → affected

Daniel Veditz [:dveditz]

Comment 34

•

6 years ago

I'm not sure how this bug is different than bug 721710 and not a dupe. It's basically the same set of crashes.

status-firefox-esr52: affected → wontfix

status-firefox-esr60: --- → affected

Keywords: stalled

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: crypto-core-security

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → INACTIVE

Comment 35

•

6 years ago

Marking fix-optional for 64. We could still take a patch for 65, and if it's verified and doesn't seem risky, could still take fixes for 64 as well.

status-firefox64: affected → fix-optional

BugBot [:suhaib / :marco/ :calixte]

Updated

•

6 years ago

Keywords: stalled