Closed
Bug 1319454
Opened 8 years ago
Closed 6 years ago
Crash in PL_HashTableDestroy | SECOID_Shutdown
Categories
(Core :: Security: PSM, defect, P3)
Tracking
()
RESOLVED
INACTIVE
Tracking | Status | |
---|---|---|
firefox50 | --- | wontfix |
firefox51 | --- | wontfix |
firefox52 | --- | wontfix |
firefox-esr52 | --- | wontfix |
firefox-esr60 | --- | wontfix |
firefox53 | --- | wontfix |
firefox54 | --- | wontfix |
firefox56 | --- | wontfix |
firefox57 | --- | wontfix |
firefox61 | --- | wontfix |
firefox62 | --- | wontfix |
firefox63 | --- | wontfix |
firefox64 | --- | wontfix |
People
(Reporter: philipp, Unassigned)
References
Details
(4 keywords, Whiteboard: [psm-backlog])
Crash Data
Attachments
(1 file)
2.86 KB,
text/plain
|
Details |
This bug was filed from the Socorro interface and is
report bp-d01bf5ef-0c13-45ca-8d03-fa15d2161122.
=============================================================
Crashing Thread (0)
Frame Module Signature Source
0 nss3.dll PL_HashTableDestroy nsprpub/lib/ds/plhash.c:115
1 nss3.dll SECOID_Shutdown security/nss/lib/util/secoid.c:2239
2 nss3.dll nss_Shutdown security/nss/lib/nss/nssinit.c:1126
3 nss3.dll NSS_Shutdown security/nss/lib/nss/nssinit.c:1188
4 xul.dll nsNSSComponent::ShutdownNSS() security/manager/ssl/nsNSSComponent.cpp:1873
5 xul.dll nsNSSComponent::DoProfileBeforeChange() security/manager/ssl/nsNSSComponent.cpp:2102
6 xul.dll nsNSSComponent::Observe(nsISupports*, char const*, char16_t const*) security/manager/ssl/nsNSSComponent.cpp:1938
7 xul.dll nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) xpcom/ds/nsObserverList.cpp:112
8 xul.dll nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) xpcom/ds/nsObserverService.cpp:305
9 xul.dll nsXREDirProvider::DoShutdown() toolkit/xre/nsXREDirProvider.cpp:1228
10 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup() toolkit/xre/nsAppRunner.cpp:1335
11 xul.dll mozilla::UniquePtr<ScopedXPCOMStartup, mozilla::DefaultDelete<ScopedXPCOMStartup> >::reset(ScopedXPCOMStartup*) obj-firefox/dist/include/mozilla/UniquePtr.h:345
12 xul.dll XREMain::XRE_main(int, char** const, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp:4452
13 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:4515
14 @0xc4419f
15 @0x18f34b
[Tracking Requested - why for this release]:
this shutdown-crash signature has been around for a while (also see bug 721710) but it has become much more frequent in firefox 49 & 50: https://crash-stats.mozilla.com/signature/?product=Firefox&signature=PL_HashTableDestroy | SECOID_Shutdown&date=>%3D2016-05-22T16%3A28%3A48.000Z&date=<2016-11-22T16%3A28%3A48.000Z#graphs
on release this is currently accounting for 0.7% of all browser crashes.
(100.0% in signature vs 04.06% overall) shutdown_progress = profile-before-change
(96.06% in signature vs 00.87% overall) address = 0xffffffffe5e5e5e5
(99.21% in signature vs 37.04% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ
(marking this as security sensitive as a precaution)
Flags: needinfo?
Updated•8 years ago
|
Flags: needinfo?
Updated•8 years ago
|
Group: core-security → crypto-core-security
Comment 1•8 years ago
|
||
David, can you provide some guidance about next steps or help move this forward?
Flags: needinfo?(dkeeler)
Bas, is this what you're seeing in your branch?
Flags: needinfo?(bas)
Comment 3•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #2)
> Bas, is this what you're seeing in your branch?
Similar, yes (as in, there's a UAF of a hash table), however the UAF I'm seeing is in a completely different object and not just on shutdown. So highly unlikely to be related to this bug in any way.
Flags: needinfo?(bas)
Comment 4•8 years ago
|
||
I'll mark this as sec-moderate, as it is unlikely to be exploitable at this point in shutdown.
Keywords: csectype-uaf,
sec-moderate
![]() |
||
Comment 5•8 years ago
|
||
I've looked at this a couple of times and hit a bit of a wall. The code isn't obviously wrong and I haven't been able to reproduce this. My current theory is that maybe there's a rarely-triggered bug in the hash table code that leaves dangling pointers to hash table buckets? (see e.g. crashes like bp-c8fecd6c-bcf5-4a4c-afef-080a32161213 ).
I think to move forward we either need more eyes or more expertise getting pertinent details out of crash reports (e.g. it seems to happen more on 32-bit builds).
Flags: needinfo?(dkeeler)
Comment 6•8 years ago
|
||
I also took a look at these crashes in the hopes of discovering something...
(In reply to David Keeler [:keeler] (use needinfo?) from comment #5)
> I've looked at this a couple of times and hit a bit of a wall. The code
> isn't obviously wrong and I haven't been able to reproduce this. My current
> theory is that maybe there's a rarely-triggered bug in the hash table code
> that leaves dangling pointers to hash table buckets? (see e.g. crashes like
> bp-c8fecd6c-bcf5-4a4c-afef-080a32161213 ).
AFAIK 0xe5e5e5e5 is jemalloc's freed memory poison value. It's a bit worrying that in all of these crashes only the lower 32 bits are poisoned (there's a 100% correlation with crash address being 0xffffffffe5e5e5e5), but I think the crash report above is happening because the freed memory is reallocated again maybe?
I discovered something interesting in some of these crash reports (for example the one in comment 0). Sometimes on another thread, ExceptionHandler::HandlePureVirtualCall() is on the stack! This means that we're crashing because of a pure virtual function call, I think.
Ted, Nick, do you have any idea if we can extract some info about which virtual function the crash is originating from? Also, have you looked at such crashes before? Is it possible that the crash can be misattributed to some other code for some reason?
Flags: needinfo?(ted)
Flags: needinfo?(n.nethercote)
Comment 7•8 years ago
|
||
Following the same train of thought, it's very curious that I discovered that a few calls before calling NSS_Shutdown, we call evaporateAllNSSResources() <http://searchfox.org/mozilla-central/rev/4f5f9f3222f35fa4635ea96a0c963c130854ef19/security/manager/ssl/nsNSSComponent.cpp#1963> which gets to <http://searchfox.org/mozilla-central/rev/4f5f9f3222f35fa4635ea96a0c963c130854ef19/security/manager/ssl/nsNSSShutDown.cpp#176> which calls a pure virtual function! <http://searchfox.org/mozilla-central/rev/4f5f9f3222f35fa4635ea96a0c963c130854ef19/security/manager/ssl/nsNSSShutDown.h#222>
Reading through the comments about this function, it seems like there may be some callers who don't adhere to the contract precisely. For example based on a cursory look, <http://searchfox.org/mozilla-central/rev/4f5f9f3222f35fa4635ea96a0c963c130854ef19/security/manager/ssl/CryptoTask.cpp#16> seems to not call ReleaseNSSReferences() when it should, but the practice seems to be common place... I couldn't really find anything concrete down this line of investigation, but perhaps David can think of something?
![]() |
||
Comment 8•8 years ago
|
||
I don't have any additional insight here, sorry.
Flags: needinfo?(n.nethercote)
![]() |
||
Comment 9•8 years ago
|
||
I'm pretty sure CryptoTask does the right thing, but it's possible that other implementations of nsNSSShutDownObject don't (it has definitely happened in the past). I'll take a closer look tomorrow.
Comment 10•8 years ago
|
||
(In reply to :Ehsan Akhgari from comment #6)
> I also took a look at these crashes in the hopes of discovering something...
>
> (In reply to David Keeler [:keeler] (use needinfo?) from comment #5)
> > I've looked at this a couple of times and hit a bit of a wall. The code
> > isn't obviously wrong and I haven't been able to reproduce this. My current
> > theory is that maybe there's a rarely-triggered bug in the hash table code
> > that leaves dangling pointers to hash table buckets? (see e.g. crashes like
> > bp-c8fecd6c-bcf5-4a4c-afef-080a32161213 ).
>
> AFAIK 0xe5e5e5e5 is jemalloc's freed memory poison value. It's a bit
> worrying that in all of these crashes only the lower 32 bits are poisoned
> (there's a 100% correlation with crash address being 0xffffffffe5e5e5e5),
> but I think the crash report above is happening because the freed memory is
> reallocated again maybe?
This is just Breakpad sign-extending the 32-bit crash address to 64-bits, FYI. If you look in the "raw dump" tab you can see the register values for the crashing frame. For the crash in comment 0 it has:
"eax": "0xe5e5e5e5", "ebp": "0xe5e5e5e5",
Flags: needinfo?(ted)
Comment 11•8 years ago
|
||
(In reply to :Ehsan Akhgari from comment #6)
> I discovered something interesting in some of these crash reports (for
> example the one in comment 0). Sometimes on another thread,
> ExceptionHandler::HandlePureVirtualCall() is on the stack! This means that
> we're crashing because of a pure virtual function call, I think.
I think this is a red herring. Frame 12 of Thread 2 in that crash is MiniDumpWriteDump, which wouldn't call back into that function. Additionally, if you look at the raw dump, that frame was found by stack scanning.
Comment 12•8 years ago
|
||
Here's the disassembly from MSVC. It's crashing at `5D3C8EF3 mov ebp,dword ptr [eax] ` because eax contains the poison pattern `e5e5e5e5`.
The locals in `PL_HashTableDestroy` according to MSVC have he = 0xe5e5e5 (which I believe is what's in eax), n = 64, and i = 2.
Comment 13•8 years ago
|
||
Thanks Ted, so we're definitely dealing with a freed hash table here.
David, is that something that can happen if my theory is true?
![]() |
||
Comment 14•8 years ago
|
||
AFAICT, the hash table in question is NSS' dynamic object id (OID) table. It gets created on the first call to SECOID_AddEntry[0] (secoid_HashDynamicOiddata, really, but that's only called from SECOID_AddEntry). It's protected by a lock in a manner that appears to be correct (it's possible that if the process has forked, acquiring the lock when destroying the table will be skipped, but I don't think that's causing this because a) we should be in the parent process anyway and b) even if the lock were skipped, I'm fairly sure all of this is happening on the main thread, so the lock is maybe irrelevant). This suggests that either the hash table implementation is buggy or the way the secoid_* implementation uses it is incorrect in some way that isn't immediately obvious. Note for reference that we only ever change the table by calling SECOID_AddEntry from RegisterDynamicOids[1] and LoadExtendedValidationInfo[2] (each of those functions should only run once per run of Firefox). The table is destroyed only once, in SECOID_Shutdown [3].
But to answer your question, if the issue is the that the hash table has already been freed or has been corrupted by buggy code, it wouldn't be a result of incorrectly implementing nsNSSShutDownObject.
[0] https://dxr.mozilla.org/mozilla-central/rev/f179934df0c1bab590c558485d419c7910e41325/security/nss/lib/util/secoid.c#1887
[1] https://dxr.mozilla.org/mozilla-central/rev/f179934df0c1bab590c558485d419c7910e41325/security/manager/ssl/nsNSSCertHelper.cpp#1779
[2] https://dxr.mozilla.org/mozilla-central/rev/f179934df0c1bab590c558485d419c7910e41325/security/certverifier/ExtendedValidation.cpp#1271
[3] https://dxr.mozilla.org/mozilla-central/rev/f179934df0c1bab590c558485d419c7910e41325/security/nss/lib/util/secoid.c#2262
Comment 15•8 years ago
|
||
Sorry, I have no idea what's going on here. :( The hashtable usage looks correct as you said, and I personally think it's extremely unlikely that we're hitting a bug in the hashtable code that only manifests here, since this code doesn't even seem to do anything fancy with the hashtable. The reliability of the garbage value we're seeing here across all crashes with this signature also makes theories such as memory corruption extremely unlikely. I'm out of ideas.
Comment 16•8 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #12)
> Created attachment 8821192 [details]
> crash disassembly for d01bf5ef-0c13-45ca-8d03-fa15d2161122
>
> Here's the disassembly from MSVC. It's crashing at `5D3C8EF3 mov
> ebp,dword ptr [eax] ` because eax contains the poison pattern `e5e5e5e5`.
>
> The locals in `PL_HashTableDestroy` according to MSVC have he = 0xe5e5e5
> (which I believe is what's in eax), n = 64, and i = 2.
My presumption from he=e5e5 would be that something removed an entry from the hashtable while we were destroying it... presumably from another thread. Even if that seems impossible.
Comment 17•8 years ago
|
||
"(it's possible that if the process has forked, acquiring the lock when destroying the table will be skipped, but I don't think that's causing this because a) we should be in the parent process anyway and b) even if the lock were skipped, I'm fairly sure all of this is happening on the main thread, so the lock is maybe irrelevant)"
The crash is happening in MainThread. In other threads I see DecodePool threads and DOM workers, though I imagine they exist in Master and child processes - but I think profile-before-change only happens in the Master, IIRC.
What is the point of the "SKIP_IF_FORKED"?
(In reply to Randell Jesup [:jesup] from comment #16)
> ...
>
> My presumption from he=e5e5 would be that something removed an entry from
> the hashtable while we were destroying it... presumably from another thread.
> Even if that seems impossible.
Or added an entry, which made the hash table grow, which meant there was some freeing going on.
Comment 19•8 years ago
|
||
Comment 14 suggests that the only accesses to this hashtable come from the main thread. :(
Comment 20•8 years ago
|
||
Still no reports of this beyond Fx50 either.
Reporter | ||
Comment 21•8 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)
> Still no reports of this beyond Fx50 either.
so far it was only common in rc builds on beta and pretty much non-existent on other pre-release builds though:
https://crash-stats.mozilla.com/search/?signature=%3DPL_HashTableDestroy%20%7C%20SECOID_Shutdown&release_channel=beta&release_channel=aurora&release_channel=nightly&date=%3E%3D2016-06-01T00%3A00%3A00.000Z&_facets=version&_facets=user_comments&_facets=uptime&_facets=install_time&_facets=platform_pretty_version&_facets=useragent_locale&_columns=date&_columns=version&_columns=platform#facet-version
Comment 22•8 years ago
|
||
We're seeing some crashes in 51.0b builds now. Still a high crasher in release. :-(
Comment 23•8 years ago
|
||
No reports on 52 yet, though.
Comment 24•8 years ago
|
||
Looks like the Fx52 RC build is hitting this :(
status-firefox54:
--- → ?
status-firefox-esr52:
--- → affected
Comment 25•8 years ago
|
||
And a single hit on 53.0b6 now :\
Updated•8 years ago
|
![]() |
||
Updated•8 years ago
|
Priority: -- → P3
Whiteboard: [psm-backlog]
Comment 26•7 years ago
|
||
Still happening at significant volume (36/week in 57 and 58) -- 1500(!) per week in all versions (mostly 56 release). High volume UAF, though on shutdown, which I believe is why it's moderate. Affects (shutdown) stability too. Is P3 still appropriate?
status-firefox56:
--- → affected
status-firefox57:
--- → affected
status-firefox58:
--- → ?
Flags: needinfo?(dkeeler)
![]() |
||
Comment 27•7 years ago
|
||
It's possible bug 1372656 made this more frequent (LoadExtendedValidationInfo is now being called off the main thread), but since that landed in 57 it wouldn't account for the increased occurrences in 56 (unless that's due to 56 having a much higher volume of users?).
Realistically the only way this is going to get "fixed" is by us never shutting down NSS, which depends on first moving to the sqlite db format (bug 783994).
Flags: needinfo?(dkeeler)
Comment 28•7 years ago
|
||
> Realistically the only way this is going to get "fixed" is by us never
> shutting down NSS, which depends on first moving to the sqlite db format
> (bug 783994).
This is software... unless there's a horribly-bad impossible-to-rescue design bug, this should be "fixable" - if we can figure out the cause. And at 1500 a week, it would seem amenable to attack by landing safety checks (even expensive ones) in Nightly, even if we have to make them Nightly-only. And it's a shutdown crash, so slower shutdown isn't a huge problem (nor are RELEASE_ASSERTs). We just have to want to resolve it enough to commit some resources to do so.
Updated•7 years ago
|
![]() |
||
Comment 29•7 years ago
|
||
I guess my point was mostly that we're already working on (and reasonably close to) fixing all bugs in the category of "something goes wrong when we try to shut NSS down" and that it doesn't make sense to divert resources to this one bug in the meantime.
Updated•7 years ago
|
Crash Signature: [@ PL_HashTableDestroy | SECOID_Shutdown] → [@ PL_HashTableDestroy | SECOID_Shutdown]
[@ arena_t::DallocSmall | static void arena_dalloc ]
Updated•7 years ago
|
Crash Signature: [@ PL_HashTableDestroy | SECOID_Shutdown]
[@ arena_t::DallocSmall | static void arena_dalloc ] → [@ PL_HashTableDestroy | SECOID_Shutdown]
[@ arena_t::DallocSmall | static void arena_dalloc ] [@ arena_dalloc | je_free | DefaultFreeEntry]
[@ arena_dalloc | DefaultFreeEntry]
Comment 32•7 years ago
|
||
This is showing only a couple of crashes in 61 and we're close to the end of the release cycle. No crashes showing up in 62 or 63. Calling this a wontfix for 61.
status-firefox61:
--- → wontfix
status-firefox62:
--- → ?
Reporter | ||
Updated•7 years ago
|
Crash Signature: [@ PL_HashTableDestroy | SECOID_Shutdown]
[@ arena_t::DallocSmall | static void arena_dalloc ] [@ arena_dalloc | je_free | DefaultFreeEntry]
[@ arena_dalloc | DefaultFreeEntry] → [@ PL_HashTableDestroy | SECOID_Shutdown]
[@ arena_t::DallocSmall | static void arena_dalloc ]
[@ arena_dalloc | je_free | DefaultFreeEntry]
[@ arena_dalloc | DefaultFreeEntry]
[@ arena_t::DallocSmall | arena_dalloc | DefaultFreeEntry ]
[@ arena_t::…
status-firefox63:
--- → affected
Updated•6 years ago
|
Comment 34•6 years ago
|
||
I'm not sure how this bug is different than bug 721710 and not a dupe. It's basically the same set of crashes.
Updated•6 years ago
|
Group: crypto-core-security
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
Comment 35•6 years ago
|
||
Marking fix-optional for 64. We could still take a patch for 65, and if it's verified and doesn't seem risky, could still take fixes for 64 as well.
Updated•5 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•