Crash in mozilla::recordreplay::PLDHashTableClearEntry

RESOLVED FIXED in Firefox 65

Status

()

defect
--
critical
RESOLVED FIXED
11 months ago
9 months ago

People

(Reporter: marcia, Assigned: bhackett)

Tracking

({crash, regression})

Trunk
mozilla65
Unspecified
macOS
Points:
---

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox62 unaffected, firefox63 wontfix, firefox64 wontfix, firefox65 fixed)

Details

(crash signature)

Attachments

(2 attachments)

This bug was filed from the Socorro interface and is
report bp-772c4a66-d777-4ce2-9086-f80060180903.
=============================================================

Seen while looking at Mac specific crash stats - single crash which started in 20180902220539. 

Possible regression range based on build id: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6c83f735355d19458caa7ff34069b5676c062228&tochange=634b562ae2c3af1c07e12cca895796cc501f343d

Looks as if some stuff web replay landed in that timeframe. ni on :bhackett

Top 10 frames of crashing thread:

0 XUL mozilla::recordreplay::PLDHashTableClearEntry toolkit/recordreplay/HashTable.cpp:179
1 XUL nsHashPropertyBag::~nsHashPropertyBag xpcom/ds/PLDHashTable.cpp:335
2 XUL nsBaseChannel::~nsBaseChannel netwerk/base/nsBaseChannel.cpp:75
3 XUL <name omitted> netwerk/base/nsInputStreamChannel.h:28
4 XUL <name omitted> xpcom/ds/nsHashPropertyBag.cpp:250
5 XUL mozilla::SegmentedVector<nsCOMPtr<nsISupports>, 4096ul, mozilla::MallocAllocPolicy>::PopLastN xpcom/base/nsCOMPtr.h:313
6 XUL mozilla::dom::DeferredFinalizerImpl<nsISupports>::DeferredFinalize dom/bindings/BindingUtils.h:2937
7 XUL mozilla::IncrementalFinalizeRunnable::ReleaseNow xpcom/base/CycleCollectedJSRuntime.cpp:1398
8 XUL mozilla::CycleCollectedJSRuntime::OnGC xpcom/base/CycleCollectedJSRuntime.cpp:1540
9 XUL js::gc::GCRuntime::maybeCallGCCallback js/src/gc/GC.cpp:1794

=============================================================
Flags: needinfo?(bhackett1024)
Web replay is experimental, mac and nightly only, marking as wontfix for 63.
It's hard to be sure what's going on here, but one definite problem is that we're not using release mode assertions in the hashtable code (like we do elsewhere in toolkit/recordreplay), which will make it harder to diagnose problems here.
Assignee: nobody → bhackett1024
Flags: needinfo?(bhackett1024)
Attachment #9018819 - Flags: review?(continuation)
Attachment #9018819 - Flags: review?(continuation) → review+
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/16d26c24fa86
Use release mode assertions in HashTable, r=mccr8.
https://hg.mozilla.org/mozilla-central/rev/16d26c24fa86
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
After looking more closely at these crashes and the associated code, I think the problem is a double free of the StableHashTableInfo associated with PLDHashTables.  When destroying this info we do some checks to see that the ops appear valid (this is done so we handle cases where the table was statically allocated and don't have record/replay information).  These checks have the effect that we shouldn't (though we might) double free the underlying allocated data, but it is possible that the second free could destroy the info for a random other PLDHashTable which was allocated in the same place.  When that other table is used in the future, its code stubs could still exist and invoke callbacks operating on a now deleted StableHashTableInfo and crash (as in https://crash-stats.mozilla.com/report/index/c81f9525-9fe9-4b14-bd54-f532b0180929) or they may have been overwritten with code stubs for a second random other PLDHashtable (as in https://crash-stats.mozilla.com/report/index/2648d6f4-2b93-41d0-a317-df8c10180904 and https://crash-stats.mozilla.com/report/index/ed1ab729-709f-4811-a29b-e4b320180929).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The attached patch makes a couple changes to the HashTable code:

- The pretty goofy mechanism for checking that a StableHashTableInfo pointer is valid based on a magic number in one of its fields has been replaced with a set that keeps track of all the valid StableHashTableInfo pointers.

- Instead of deleting StableHashTableInfos, they are marked as destroyed, and we check whenever using or trying to destroy them that we are not operating on a table that has been destroyed.  This shouldn't leak much memory (the StableHashTableInfo should be empty when we try to destroy it) and is intended to be temporary until we figure out what is causing the problem here.
Attachment #9020876 - Flags: review?(nfroyd)
Comment on attachment 9020876 [details] [diff] [review]
make sure there are no UAF or double frees in HashTable code

Review of attachment 9020876 [details] [diff] [review]:
-----------------------------------------------------------------

This is a little hackish, but for fixing crashes, sure, we can do this.

::: toolkit/recordreplay/HashTable.cpp
@@ +116,5 @@
>    ~StableHashTableInfo() {
>      MOZ_RELEASE_ASSERT(mHashToKey.empty());
>      DeallocateMemory(mCallbackStorage, CallbackStorageCapacity, MemoryKind::Tracked);
> +
> +    UnmarkValid();

AFAICT, since we're not deleting the hash tables anymore, this code is never called, and therefore all tables are valid.  Is that intentional?  Or are we just preparing for the day when we add the `delete` calls back in?
Attachment #9020876 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #7)
> Comment on attachment 9020876 [details] [diff] [review]
> make sure there are no UAF or double frees in HashTable code
> 
> Review of attachment 9020876 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is a little hackish, but for fixing crashes, sure, we can do this.
> 
> ::: toolkit/recordreplay/HashTable.cpp
> @@ +116,5 @@
> >    ~StableHashTableInfo() {
> >      MOZ_RELEASE_ASSERT(mHashToKey.empty());
> >      DeallocateMemory(mCallbackStorage, CallbackStorageCapacity, MemoryKind::Tracked);
> > +
> > +    UnmarkValid();
> 
> AFAICT, since we're not deleting the hash tables anymore, this code is never
> called, and therefore all tables are valid.  Is that intentional?  Or are we
> just preparing for the day when we add the `delete` calls back in?

Yeah, I was trying here to keep the logic consistent in anticipation of when we are able to delete these structures again.
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f6e29ae6529
Make sure there are no UAF or double frees in HashTable code, r=froydnj.
https://hg.mozilla.org/mozilla-central/rev/3f6e29ae6529
Status: REOPENED → RESOLVED
Closed: 9 months ago9 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.