Closed Bug 1378219 Opened 2 years ago Closed 2 years ago

Crash in nsTHashtable<T>::Contains | mozilla::RestyleManager::ProcessRestyledFrames

Categories

(Core :: Layout, defect, P1, critical)

55 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 - fixed
firefox56 --- fixed

People

(Reporter: philipp, Assigned: jfkthame)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 3 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-bb471a03-b3e1-4f26-adad-0c4200170704.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsTHashtable<detail::VoidPtrHashKey>::Contains(void const*) 	obj-firefox/dist/include/nsTHashtable.h:144
1 	xul.dll 	mozilla::RestyleManager::ProcessRestyledFrames(nsStyleChangeList&) 	layout/base/RestyleManager.cpp:1469
2 	xul.dll 	mozilla::GeckoRestyleManager::ComputeAndProcessStyleChange(nsIFrame*, nsChangeHint, mozilla::RestyleTracker&, nsRestyleHint, mozilla::RestyleHintData const&) 	layout/base/GeckoRestyleManager.cpp:3473
3 	xul.dll 	PLDHashTable::SearchTable<0>(void const*, unsigned int) 	xpcom/ds/PLDHashTable.cpp:366
4 	xul.dll 	mozilla::GeckoRestyleManager::RestyleElement(mozilla::dom::Element*, nsIFrame*, nsChangeHint, mozilla::RestyleTracker&, nsRestyleHint, mozilla::RestyleHintData const&) 	layout/base/GeckoRestyleManager.cpp:149
5 	xul.dll 	mozilla::RestyleTracker::ProcessOneRestyle(mozilla::dom::Element*, nsRestyleHint, nsChangeHint, mozilla::RestyleHintData const&) 	layout/base/RestyleTracker.cpp:95
6 	xul.dll 	nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*) 	layout/style/nsRuleNode.cpp:2606
7 	xul.dll 	nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*) 	layout/style/nsRuleNode.cpp:2518

reports with this signature are currently low level but regressing with firefox 55 & look related to bug 1365982.
Flags: needinfo?(jfkthame)
It looks to me like the mDestroyedFrames pointer in RestyleManager must be unexpectedly null here. AFAICS the only way that could happen is if ProcessRestyledFrames gets called recursively (so the inner call sets up a new mDestroyedFrames, and then clears it on exit, wiping it out for the outer call).

I didn't think ProcessRestyledFrames was used recursively (hence the sanity-check assertion on entry, at https://hg.mozilla.org/releases/mozilla-beta/annotate/bfd96be70455/layout/base/RestyleManager.cpp#l1341), but perhaps this isn't guaranteed to be true? If that's the case, we'd be liable to hit a crash like this, and need to use an alternate strategy; e.g. we could save and restore the mDestroyedFrames pointer in ProcessRestyledFrames, instead of assuming it is null on entry and clearing it on exit.
Flags: needinfo?(jfkthame)
If the issue is that there are cases where we end up recursively calling ProcessRestyledFrames, this should help.
Attachment #8884786 - Flags: review?(mats)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Priority: -- → P1
Comment on attachment 8884786 [details] [diff] [review]
Save and restore mDestroyedFrames pointer in ProcessRestyledFrames, rather than assuming it is initially null

Hmm, I see :mats is on vacation; dholbert, could you take the review here?
Attachment #8884786 - Flags: review?(mats) → review?(dholbert)
(In reply to Jonathan Kew (:jfkthame) from comment #1)
> It looks to me like the mDestroyedFrames pointer in RestyleManager must be
> unexpectedly null here. AFAICS the only way that could happen is if
> ProcessRestyledFrames gets called recursively (so the inner call sets up a
> new mDestroyedFrames, and then clears it on exit, wiping it out for the
> outer call).

Agreed.

> I didn't think ProcessRestyledFrames was used recursively (hence the
> sanity-check assertion on entry

Agreed -- it looks like this reentrancy is unexpected.  The other assertion at the beginning of this function (about having a script-blocker) means we shouldn't be reentrant via triggering arbitrary script, too, so that can't be the reason...

Superficially, the current patch seems like it *would* likely prevent this particular crash, but I'm not sure it'll actually be an improvement.  If the reentrancy is indeed unexpected & maybe potentially-dangerous, then this patch might make us proceed to crash somewhere else (perhaps in a more-exploitable way).

Two alternate solutions that come to mind which might be safer, in light of possible badness from reentrancy:
 (1) Maybe we should upgrade the current MOZ_ASSERT(!mDestroyedFrames) into a runtime assertion,  and watch for crash reports with that in the signature, to figure out how this even happens?  All of those crash reports would represent users that were potentially about to crash anyway... (via dereffing mDestroyedFrames in the outer call after it's been cleared by some unexpected inner call)
...or:
 (2) Maybe we should upgrade the current MOZ_ASSERT(!mDestroyedFrames) into an "if (!mDestroyedFrames) { MOZ_ASSERT_UNRACHABLE(...); return; }" so that opt builds will just skip the reentrant call? [not sure if that's better or worse]

I'm not confident enough in the expectations around ProcessRestyledFrames to know how best to proceed -- I'd probably defer to dbaron if he has cycles to think about this & has a stronger feeling about this, or alternately I could increase my confidence by doing a bit of code-reading/archeology, as-needed.
Flags: needinfo?(jfkthame)
:dbaron, do you have time to look at this and offer us any insight (see comment 4)?
Flags: needinfo?(jfkthame) → needinfo?(dbaron)
I vaguely recall (maybe from bug 960465 or bug 1115812) stacks in which ProcessRestyledFrames is called recursively, but I don't really remember what they are.  It shouldn't be hard to figure out by adding an assertion, assuming we hit them in our test suite (which I suspect we do).  Perhaps XBL triggered from frame construction was involved?

It would also be nice to use an RAII class like mozilla::AutoRestore rather than restoring manually.
Flags: needinfo?(dbaron)
We can't be hitting this in our test suite, AFAICT, because in that case the MOZ_ASSERT(!mDestroyedFrames) at the top of the method would have fired.
AFAICT, we can't use the actual mozilla::AutoRestore with a UniquePtr (because it wants to copy the targeted variable), so here I've provided a local version that uses Move(); that's OK in this case because we're immediately going to assign a new mDestroyedFrames after setting up the AutoRestore, so it doesn't matter that the original value has been moved out.
Attachment #8885815 - Flags: review?(dbaron)
Attachment #8884786 - Attachment is obsolete: true
Attachment #8884786 - Flags: review?(dholbert)
Comment on attachment 8885815 [details] [diff] [review]
Save and restore mDestroyedFrames pointer in ProcessRestyledFrames, rather than assuming it is initially null

Review of attachment 8885815 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/RestyleManager.cpp
@@ -1336,5 @@
>  RestyleManager::ProcessRestyledFrames(nsStyleChangeList& aChangeList)
>  {
>    NS_ASSERTION(!nsContentUtils::IsSafeToRunScript(),
>                 "Someone forgot a script blocker");
> -  MOZ_ASSERT(!mDestroyedFrames);

Rather than getting rid of this reentrancy assertion (the one you mentioned in comment 7), could we upgrade it to MOZ_DIAGNOSTIC_ASSERT (which is fatal in Nightly/DevEdition builds, and otherwise debug-only)?  That'll hopefully get us some crash reports from prerelease users [without harming release-build users], so we can look at the backtrace & figure out how this happens & whether it's bad & whether we can avoid it.

@@ +1367,5 @@
> +      mLocation = Move(mValue);
> +    }
> +  };
> +
> +  AutoRestoreDF restore(mDestroyedFrames);

I'm not sure the current patch's strategy is safe (temporarily-replacing mDestroyedFrames with a brand-new hash table) -- because, maybe we'll mistakenly think a frame isn't destroyed (even though it's in our replaced mDestroyedFrames map!) and do something bad with it, right?  [Let me know if this fear is unfounded -- but for now, it seems worth worrying about...]

Assuming this fear is worth worrying about -- it seems like it'd be better to do a pseudo-refcounting thing, maybe by adding a "size_t mDestroyedFramesReentrantUsages" member-var which gets incremented at the beginning of this function and decremented at the end (with an AutoRestore).  And we'd only allocate a new mDestroyedFrames if that variable is 0 at the start of the function, and we'd only destroy it if that variable is 0 at the end of the function.  Or something like that... What do you think?

(Or instead of a size_t, you could wrap mDestroyedFrames in an actually-refcounted helper struct, but I can't immediately think of a way to make that work elegantly -- if mDestroyedFrames becomes a RefPtr with local RAII RefPtr<> instances on the stack, that wouldn't quite do it, because even when the last of those goes out of scope, we'd still need to know whether that was the outermost level of recursion & whether we should release & null out mDestroyedFrames's own RefPtr.  Maybe we could keep mDestroyedFrames a raw pointer and do "NS_ADDREF(mDestroyedFrames); ... NS_RELEASE2(mDestroyedFrames);" at the start/end of the function, and rely on the fact that NS_RELEASE2 only nulls when the reference count hits 0 -- but that's not very RAII and feels unsatisfying.  There's perhaps a good way of doing this that I'm not thinking of, though.)
Comment on attachment 8885815 [details] [diff] [review]
Save and restore mDestroyedFrames pointer in ProcessRestyledFrames, rather than assuming it is initially null

r=dbaron, although please file a followup on getting a Move() version of AutoRestore into AutoRestore.h?
Attachment #8885815 - Flags: review?(dbaron) → review+
Though I missed comment 9; probably that approach is better?  I just reviewed this without looking into what mDestroyedFrames is...
Flags: needinfo?(jfkthame)
Comment on attachment 8885815 [details] [diff] [review]
Save and restore mDestroyedFrames pointer in ProcessRestyledFrames, rather than assuming it is initially null

Oh, right, it's the thing from bug 1365982.  Yeah, it's not clear that this is safe.  dholbert's approach is probably better.
Attachment #8885815 - Flags: review+ → review-
OK, it probably makes more sense to use the same mDeletedFrames table in any recursive calls, so that we continue to accumulate the list until the outermost call completes. We can do that without needing to do actual refcounting by using an RAII-style helper that will reset mDeletedFrames on exit only if it was initially null, and otherwise leave it untouched.
Flags: needinfo?(jfkthame)
I've also made the assertion into a MOZ_DIAGNOSTIC_ASSERT, to try and get some more understanding of what's actually going on; but we'll probably want to remove it if/when we've seen some stacks and figured out if they're legitimate.
Attachment #8886131 - Flags: review?(dholbert)
Attachment #8885815 - Attachment is obsolete: true
Comment on attachment 8886131 [details] [diff] [review]
Don't clear and recreate mDestroyedFrames hashtable in recursive calls to ProcessRestyledFrames, just continue to use the table from the outer scope

Review of attachment 8886131 [details] [diff] [review]:
-----------------------------------------------------------------

Nice! I knew I was overthinking something with my reference counting scheme.

One suggestion, though, to make this easier to follow / reason about:

::: layout/base/RestyleManager.cpp
@@ +1357,5 @@
> +  typedef decltype(mDestroyedFrames) DestroyedFramesT;
> +  class MOZ_RAII MaybeClearDestroyedFrames
> +  {
> +  private:
> +    DestroyedFramesT* mTargetPtr;

"DestroyedFramesT* const"

(for extra confidence that we're still operating on the same value in the constructor & in the destructor)

Though: this is hard to reason about, with all the levels of indirection (an outer pointer, which may or may not be null, pointing to a UniquePtr, which itself contains a pointer, which may or may not be null -- and there's an inverse relationship between their nullness, sorta.  And it's hard to remember whether mTargetPtr here is the pointer to the hashmap vs. the pointer to mDestroyedFrames.)

I think this would be easier to follow if you spent an additional bool (which is basically free since we're not expecting to be recursive) and represented this like so:
  DestroyedFramesT& mDestroyedFramesRef; // Reference to caller's mDestroyedFrames
  const bool mResetOnDestruction;

And then the constructor would have an init list like this:
  : mDestroyedFramesRef(aTarget),
    mResetOnDestruction(!aTarget) // reset only if target starts out null

What do you think?
(In reply to Jonathan Kew (:jfkthame) from comment #14)
> I've also made the assertion into a MOZ_DIAGNOSTIC_ASSERT, to try and get
> some more understanding of what's actually going on; but we'll probably want
> to remove it if/when we've seen some stacks and figured out if they're
> legitimate.

Thanks! Agreed -- I think we'll want to remove *either* the diagnostic assert *or* this recursion-friendly RAII struct, once we understand what's causing the recursion and have figured out whether/how we can prevent it.
Comment on attachment 8886131 [details] [diff] [review]
Don't clear and recreate mDestroyedFrames hashtable in recursive calls to ProcessRestyledFrames, just continue to use the table from the outer scope

Review of attachment 8886131 [details] [diff] [review]:
-----------------------------------------------------------------

I'd feel slightly better about the suggested "bool" version for readability, but I'll call the current strategy r=me (because it's not *too* confusing :)) in case you'd rather just stick with this way.

(But if you are interested to switch to using a bool helper, it'd probably merit one more look-over on the updated patch before landing.)
Attachment #8886131 - Flags: review?(dholbert) → review+
(In reply to Daniel Holbert [:dholbert] from comment #15)
> Comment on attachment 8886131 [details] [diff] [review]
> Don't clear and recreate mDestroyedFrames hashtable in recursive calls to
> ProcessRestyledFrames, just continue to use the table from the outer scope
> 
> Review of attachment 8886131 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Nice! I knew I was overthinking something with my reference counting scheme.
> 
> One suggestion, though, to make this easier to follow / reason about:
> 
> ::: layout/base/RestyleManager.cpp
> @@ +1357,5 @@
> > +  typedef decltype(mDestroyedFrames) DestroyedFramesT;
> > +  class MOZ_RAII MaybeClearDestroyedFrames
> > +  {
> > +  private:
> > +    DestroyedFramesT* mTargetPtr;
> 
> "DestroyedFramesT* const"
> 
> (for extra confidence that we're still operating on the same value in the
> constructor & in the destructor)
> 
> Though: this is hard to reason about, with all the levels of indirection (an
> outer pointer, which may or may not be null, pointing to a UniquePtr, which
> itself contains a pointer, which may or may not be null -- and there's an
> inverse relationship between their nullness, sorta.  And it's hard to
> remember whether mTargetPtr here is the pointer to the hashmap vs. the
> pointer to mDestroyedFrames.)
> 
> I think this would be easier to follow if you spent an additional bool
> (which is basically free since we're not expecting to be recursive) and
> represented this like so:
>   DestroyedFramesT& mDestroyedFramesRef; // Reference to caller's
> mDestroyedFrames
>   const bool mResetOnDestruction;
> 
> And then the constructor would have an init list like this:
>   : mDestroyedFramesRef(aTarget),
>     mResetOnDestruction(!aTarget) // reset only if target starts out null
> 
> What do you think?

Funnily enough, I wrote it that way before deciding the bool was redundant, and updating to this version! But I'm happy to revert to that approach for readability. Will post a fresh patch shortly.
Attachment #8886131 - Attachment is obsolete: true
Comment on attachment 8886278 [details] [diff] [review]
Don't clear and recreate mDestroyedFrames hashtable in recursive calls to ProcessRestyledFrames, just continue to use the table from the outer scope

Review of attachment 8886278 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! r=me
Attachment #8886278 - Flags: review?(dholbert) → review+
[Tracking Requested - why for this release]: In tracking of uplift request and approval.
Jonathan, as we are approaching merge day(8/2), is there anything left to do in this bug before landing?
Flags: needinfo?(jfkthame)
No, I think we're ready to land this. We may want to do something further after we see what the diagnostic assertion comes up with (see comment 16), but we should handle that as a followup bug.
Flags: needinfo?(jfkthame)
https://hg.mozilla.org/integration/mozilla-inbound/rev/86c771ec0f6c9de8659ffad5abf1517d0a67629e
Bug 1378219 - Don't clear and recreate mDestroyedFrames hashtable in recursive calls to ProcessRestyledFrames, just continue to use the table from the outer scope. r=dholbert
Low volume, I don't think I need to track this.
https://hg.mozilla.org/mozilla-central/rev/86c771ec0f6c
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Should we consider this for backport or let it ride the 56 train?
Flags: needinfo?(jfkthame)
I think it would make sense to backport to 55, and avoid shipping the potential crash here to release-channel users. The patch shouldn't change behavior in any way *except* in the case where we're about to crash, so the risk of regressing something else is minimal.
Flags: needinfo?(jfkthame)
Comment on attachment 8886278 [details] [diff] [review]
Don't clear and recreate mDestroyedFrames hashtable in recursive calls to ProcessRestyledFrames, just continue to use the table from the outer scope

Approval Request Comment
[Feature/Bug causing the regression]: 1365982
[User impact if declined]: possible null-deref crash, as seen in crashstats
[Is this code covered by automated tests?]: normal use of the code is exercised in reftests etc, but we do not currently have a testcase that reproduces the crash
[Has the fix been verified in Nightly?]: landed in Nightly, but in the absence of STR, we can only "verify" by checking crashstats
[Needs manual test from QE? If yes, steps to reproduce]: no known STR
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: changes behavior only in the case where we would otherwise be about to crash
[String changes made/needed]: none
Attachment #8886278 - Flags: approval-mozilla-beta?
Comment on attachment 8886278 [details] [diff] [review]
Don't clear and recreate mDestroyedFrames hashtable in recursive calls to ProcessRestyledFrames, just continue to use the table from the outer scope

avoid crash + add diagnostic assert to try and figure out what's going on, beta55+

Should be in 55.0b11
Attachment #8886278 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.