Closed Bug 1822907 Opened 1 year ago Closed 1 year ago

Crash in [@ mozilla::LinkedListElement<T>::setPreviousUnsafe]

Categories

(Core :: DOM: Animation, defect)

Unspecified
All
defect

Tracking

()

VERIFIED FIXED
113 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox111 --- disabled
firefox112 --- disabled
firefox113 + verified

People

(Reporter: RyanVM, Assigned: dholbert)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(3 files)

Crash report: https://crash-stats.mozilla.org/report/index/d9087aa1-9d06-493b-97a5-7fe0d0230316

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(!listElem->isInList())

Top 10 frames of crashing thread:

0  xul.dll  mozilla::LinkedListElement<mozilla::dom::Animation>::setPreviousUnsafe  mfbt/LinkedList.h:333
0  xul.dll  mozilla::LinkedList<mozilla::dom::Animation>::insertBack  mfbt/LinkedList.h:482
0  xul.dll  mozilla::dom::AnimationTimeline::NotifyAnimationUpdated  dom/animation/AnimationTimeline.cpp:84
0  xul.dll  mozilla::dom::DocumentTimeline::NotifyAnimationUpdated  dom/animation/DocumentTimeline.cpp:157
0  xul.dll  mozilla::dom::Animation::UpdateTiming  dom/animation/Animation.cpp:1745
0  xul.dll  mozilla::dom::CSSTransition::UpdateTiming  dom/animation/CSSTransition.cpp:56
1  xul.dll  mozilla::dom::Animation::Tick  dom/animation/Animation.cpp:959
2  xul.dll  mozilla::dom::CSSTransition::Tick  dom/animation/CSSTransition.cpp:193
3  xul.dll  mozilla::dom::AnimationTimeline::Tick  dom/animation/AnimationTimeline.cpp:64
4  xul.dll  mozilla::dom::DocumentTimeline::MostRecentRefreshTimeUpdated  dom/animation/DocumentTimeline.cpp:178

One possibly-helpful comment from the crash reports?

Messing around with ontain: content and content-visibility: auto. Crash occurred after disabling content-visibility: auto in the style pane of the inspector dev tool and clicking back into the website

Maybe a regression from bug 1820058?

Flags: needinfo?(dholbert)

Looking at the revisions for the files near the top of the stack, AnimationTimeline.cpp changed in bug 1814786, which landed about a week earlier, but maybe it is related? Though that specific change looks like something intended to just fix build errors, so maybe not.

(In reply to Andrew McCreight [:mccr8] from comment #3)

Looking at the revisions for the files near the top of the stack, AnimationTimeline.cpp changed in bug 1814786, which landed about a week earlier, but maybe it is related? Though that specific change looks like something intended to just fix build errors, so maybe not.

Probably not. Bug 1814786 changed the ScrollTimeline for CSSAnimations, and the preference of ScrollTimeline is off. However, the call stack looks like this is related to CSSTransition and DocumentTimeline, so bug 1814786 shouldn't be related to this one.

However, the caller of LinkedList<mozilla::dom::Animation>::insertBack checks aAnimation.IsHiddenByContentVisibility() first. Maybe this is related to content-visibility and we forgot to check some other things.

Bug 1820058 enabled the preference, so we start to crash because of this assertion on Nightly. Perhaps this is related to https://phabricator.services.mozilla.com/D150764.

Tracking for 113 as the volume of crashes is very high for Nightly.

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

:boris, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(boris.chiou)
Keywords: topcrash

The bug is marked as tracked for firefox113 (nightly). However, the bug still isn't assigned.

:fgriffith, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(fgriffith)

Daniel, I know made some tweaks to content-visibility over the past week. Can you take a look at this to see if there is a connection to that work?

Flags: needinfo?(fgriffith)

(In reply to Pascal Chevrel:pascalc from comment #7)

Tracking for 113 as the volume of crashes is very high for Nightly.

Yeah, let's call this S2 severity. We should definitely address this to avoid crashing our Nightly users, but we don't need to track for the actual release version 113, since this seems to be associated with a feature that's guarded to be Nightly-only (not riding the trains) for now, via Bug 1820058.

pascalc I think this means we can mark this tracking:- for 113, right? (I don't seem to be able to set that value, but perhaps you can.)

Severity: -- → S2
Flags: needinfo?(pascalc)
Flags: needinfo?(dholbert)
Flags: needinfo?(boris.chiou)

Let's keep the tracking flag, we ship 113 to real users on the nightly channel and we can set 113 to disabled when we reach beta and can confirm we no longer crash.

Flags: needinfo?(pascalc)
Keywords: regression
Regressed by: 1820058

I found a URL that reproduces the issue reliably for me (not sharing publicly for now). I successfully recorded a trace with rr & submitted to pernosco; once it's processed, we can investigate further. I also confirmed that the issue goes away (for me, with that URL) if I disable the content-visibility pref in about:config.

(Also notable: lots of the crash report have hulu.com URLs; I don't have a login there, so I wasn't able to test that site locally, but it's a fairly prominent video platform; definitely concerning if we're triggering a crash on pages there.)

So in my pernosco recording, it seems like we're inserting the same animation (actually a CSSTransition instance) into the same list, via mAnimationOrder.insertBack(...). First here:

https://searchfox.org/mozilla-central/rev/137075514eddc08c68ff652e9899da82e8043574/dom/animation/AnimationTimeline.cpp#97-102

void AnimationTimeline::NotifyAnimationContentVisibilityChanged(
    Animation* aAnimation, bool visible) {
  bool inList =
      static_cast<LinkedListElement<Animation>*>(aAnimation)->isInList();
  if (visible && !inList) {
    mAnimationOrder.insertBack(aAnimation);

...and then here:

https://searchfox.org/mozilla-central/rev/137075514eddc08c68ff652e9899da82e8043574/dom/animation/AnimationTimeline.cpp#78-79,83-84

void AnimationTimeline::NotifyAnimationUpdated(Animation& aAnimation) {
  if (mAnimations.EnsureInserted(&aAnimation)) {
...
    if (!aAnimation.IsHiddenByContentVisibility()) {
      mAnimationOrder.insertBack(&aAnimation);

So we're not quite managing that list properly. Notably the first snippet there (in NotifyAnimationContentVisibilityChanged) has a inList check to avoid redundant insertion, but the second snippet (in NotifyAnimationUpdated) does not. Presumably we could just add the same check, to avoid this issue, but I'm not sure yet whether the lack-of-that-check here was just an oversight vs. an unstated invariant that we're expecting to maintain.

Aha, so indeed we're not supposed to need to check whether the animation is in the list in that second instance. If the mAnimations.EnsureInserted() call succeeds, then I think we assume that aAnimation will not be in the list, since the list is a strict subset of the mAnimations hash-set.

In this case, the problem is that our first insertion in NotifyAnimationContentVisibilityChanged inserted into the mAnimationOrder list, at a time when we were not present in the mAnimations hash-set. So we violated that invariant of keeping the list as being a strict subset of the hash-set.

I think the point of NotifyAnimationContentVisibilityChanged is just to maintain the invariant described at
https://searchfox.org/mozilla-central/rev/137075514eddc08c68ff652e9899da82e8043574/dom/animation/AnimationTimeline.h#123-133
(" Animations that are hidden by content-visibility are not sampled and will only be in the hashset.") [i.e. they'll be excluded from the list]

This seems to be a case where an animation is becoming un-hidden so we're proactively inserting it into the list; but in fact it is not in the hash-set, so we shouldn't be inserting it into the list. So: we could just add a check for whether we're in the hash-set before we do that insertion.

(Having said that, maybe we have an unstated-but-legitimate expectation that we're in the hash-set if we manage to get here... not sure. We do arrive here via the animation calling GetTimeline()->NotifyAnimationContentVisibilityChanged(this, !hidden) on itself. I don't know whether we expect to guarantee that this GetTimeline() is keeping track of the animation in that hash-set.)

(In reply to Daniel Holbert [:dholbert] from comment #19)

(Having said that, maybe we have an unstated-but-legitimate expectation that we're in the hash-set if we manage to get here... not sure. We do arrive here via the animation calling GetTimeline()->NotifyAnimationContentVisibilityChanged(this, !hidden) on itself. I don't know whether we expect to guarantee that this GetTimeline() is keeping track of the animation in that hash-set.)

It looks like this^ is fine. Our aAnimation (really a CSSTransition) was in the timeline's hash-set, but it was removed due to reaching AnimationPlayState::Finished, which caused it to return false from NeedsTicks(), which gets us to DocumentTimeline::RemoveAnimation().

After that point, we recreate frames for the frame-subtree that this transition lives inside of, for some reason (maybe due to content-visibility changing on an ancestor?); and inside of InitPrimaryFrame, we call UpdateAnimationVisibility which notices that we have a content-visibility:hidden ancestor, and that ends up triggering a call to NotifyAnimationContentVisibilityChanged for our animation, for now with visible==false. Then content-visibility:hidden gets removed from our ancestor, so we get another call to NotifyAnimationContentVisibilityChanged, now with visible==true, and we end up inserting into mAnimationOrder via the first chunk in comment 17. BUT this is unnecessary & ~bad, because our animation is not tracked by mAnimations since it's in a "Finished" state. So we end up violating the invariant of having mAnimationOrder be a subset of mAnimations, and we abort later when we traverse mAnimationOrder in AnimationTimeline::Tick() and call Tick() on the animation, which makes us insert it into mAnimations and then discover that it's already present in the linked list.

So: the tl;dr is that I'm pretty sure NotifyAnimationContentVisibilityChanged just needs to check whether the animation is present in the hash set before inserting it into the list. (And if it's not in the hash set, that's fine; it just means the animation could be in a "finished" state or has otherwise removed itself from the timeline, and we don't need to worry about it.)

Here's a reduced testcase that triggers the crash for me.

Assignee: nobody → dholbert
Status: NEW → ASSIGNED

Specifically:

  • We had an obsolete reference to an array (which is now a LinkedList).
  • We had an assertion which could benefit from a message for clarity.
  • We had an arg whose name was missing the standard "a" prefix.

We have an invariant that the mAnimationOrder LinkedList is a subset of the
mAnimations hashset (omitting any animations that are hidden due to
content-visibility). This patch corrects one case where we were incorrectly
inserting an animation into the linked list when it wasn't present in the
hashset (because the animation had completed).

This patch also adds some documentation to mention this invariant, and some
assertions to enforce it in several places.

Depends on D173332

Try run with the patches:
https://treeherder.mozilla.org/jobs?repo=try&revision=a69d86dc87958c7a29f123b9f09ea1b2d8e6fbe7

Try run with a targeted revert of the fix which hopefully should trigger a crash (or trip one of my new fatal assertions) during the included WPT crashtest:
https://treeherder.mozilla.org/jobs?repo=try&revision=5c214e676ad220d8d64eff1d62588caeb4fdac63

(In reply to Boris Chiou [:boris] from comment #6)

Perhaps this is related to https://phabricator.services.mozilla.com/D150764.

This^ was correct, BTW. This was effectively just a bug in the AnimationTimeline::NotifyAnimationContentVisibilityChanged implementation that was added in that patch.

Confirmed with mozregression (using my attached testcase), for good measure -- the attached testcase starts crashing with this regression range (with the content-visibility pref manually enabled):
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=23fe4f24d483d144c31dbaad09b6d9b51969672d&tochange=ce0a678336fb78e569dd73ef5b522253e982a10c

So: this was a regression from bug 1777478, though it was only a problem if you enable the content-visiblity pref (recently enabled-by-default on Nightly as noted above).

Regressed by: 1777478

So: builds since 104 aren't quite "unaffected"; they had the crash-inducing code. Setting status to "disabled" for those releases (since the content-visibility pref defaulted to false).

Pre-104 builds are indeed "unaffected" though.

Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/88807aeb7ddc
part 1: Make some documentation and naming cleanups related to content-visibility and AnimationTimeline. r=hiro
https://hg.mozilla.org/integration/autoland/rev/74d3d5d35e77
part 2: When handling a content-visibility change, don't insert already-completed animations into the timeline's sampling-order list. r=hiro
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/39156 for changes under testing/web-platform/tests
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch
Upstream PR merged by moz-wptsync-bot

No crashes in today's Nightly. Thanks!

Status: RESOLVED → VERIFIED
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: