Open Bug 1703040 Opened 4 years ago Updated 1 month ago

MOZ_ALWAYS_TRUE(isNodeContainedInRange) diagnostic assertion crash in [@ mozilla::ContentSubtreeIterator::InitWithRange]

Categories

(Core :: DOM: Selection, defect, P3)

Firefox 89
defect

Tracking

()

Tracking Status
firefox-esr78 --- wontfix
firefox88 --- wontfix
firefox89 --- wontfix
firefox109 --- wontfix

People

(Reporter: woyece, Unassigned)

References

Details

(Keywords: assertion, Whiteboard: [not-a-fission-bug])

Crash Data

Attachments

(2 files)

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: crash-stats .mozilla .org/ report/ index/ d14f3daf-1067-4b90-8e88-669a60210330

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(false) (isNodeContainedInRange)

Top 10 frames of crashing thread:

0 libxul.so mozilla::ContentSubtreeIterator::InitWithRange dom/base/ContentIterator.cpp:932
1 libxul.so mozilla::ContentSubtreeIterator::Init dom/base/ContentIterator.cpp:776
2 libxul.so mozilla::dom::Selection::Clear dom/base/Selection.cpp:1109
3 libxul.so mozilla::dom::Selection::RemoveAllRanges dom/base/Selection.cpp:1865
4 libxul.so nsFocusManager::MoveCaretToFocus dom/base/nsFocusManager.cpp:2975
5 libxul.so nsFocusManager::UpdateCaret dom/base/nsFocusManager.cpp:2945
6 libxul.so nsFocusManager::Focus dom/base/nsFocusManager.cpp:2601
7 libxul.so nsFocusManager::SetFocusInner dom/base/nsFocusManager.cpp:1718
8 libxul.so mozilla::dom::Element::Focus dom/base/Element.cpp:462
9 libxul.so mozilla::dom::HTMLElement_Binding::focus dom/bindings/HTMLElementBinding.cpp:9239

Mirko, you added this MOZ_ALWAYS_TRUE(isNodeContainedInRange) diagnostic assertion in copy&paste bug 1649121 six months ago. What conditions might cause isNodeContainedInRange to be null?

Status: UNCONFIRMED → NEW
Component: General → DOM: Selection
Ever confirmed: true
Flags: needinfo?(mbrodesser)
Keywords: assertion
Product: Firefox → Core
See Also: → 1649121
Summary: Crash in [@ mozilla::ContentSubtreeIterator::InitWithRange] → MOZ_ALWAYS_TRUE(isNodeContainedInRange) diagnostic assertion crash in [@ mozilla::ContentSubtreeIterator::InitWithRange]
Whiteboard: [not-a-fission-bug]

(In reply to Chris Peterson [:cpeterson] from comment #1)

Mirko, you added this MOZ_ALWAYS_TRUE(isNodeContainedInRange) diagnostic assertion in copy&paste bug 1649121 six months ago.

That patch removed only redundant code, an equivalent assertion existed before.

What conditions might cause isNodeContainedInRange to be null?

Checking the implementation:

When https://searchfox.org/mozilla-central/rev/d7e344e956d9da2808ea33e1fe0f963ed10c142d/dom/base/RangeUtils.cpp#99,107 returns Nothing().
This happens if one of the lines of https://searchfox.org/mozilla-central/rev/d7e344e956d9da2808ea33e1fe0f963ed10c142d/dom/base/RangeUtils.cpp#121,130,179,191 is executed.
I guess in this case it's one of the latter two lines which is executed. Not knowing more about the context of the bug, I guess it could be because of anonymous elements or Shadow DOM elements.

Flags: needinfo?(mbrodesser)

Mirko, can you take this bug?

Severity: -- → S3
Flags: needinfo?(mbrodesser)
Priority: -- → P3

I've enough work for the coming weeks assigned to myself. I can keep the ni?-request and have a closer look afterwards.

This assertion's crash volume is very low (and it's not related to Fission), so this bug is not a high priority.

There are a decent number of crashes with this signature, where the crash reason is MOZ_RELEASE_ASSERT(isSome()), rather than the diagnostic assert in comment 0. Bug 1754305 turned this into a release assert, and is now on ESR91 and 98 so the total volume might be higher.

Crash Signature: [@ mozilla::ContentSubtreeIterator::InitWithRange] → [@ mozilla::ContentSubtreeIterator::DetermineFirstContent] [@ mozilla::ContentSubtreeIterator::InitWithRange]

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 content process crashes on release

:edgar, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(echen)
Keywords: topcrash

Increase the severity to S2 due to the total volume becomes higher.

Severity: S3 → S2
Flags: needinfo?(echen)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

The only way I can see this happening is if IsPositioned() is false for the range. We assert against that here, but maybe we should upgrade it to a diagnostic assert to be sure. If it is, it means we're winding up with unpositioned ranges in Selection::mStyledRanges.mRanges, and should probably figure out why that's happening.

But we could also just safely return null if !isNodeContainedInRange

This is actually somewhat weirder than I thought. We actually check !aRange->IsPositioned() higher in the call chain and return early. The only way this should happen if the node we're comparing is in a different tree from the start or end containers of the range. But the node comes from DetermineCandidateForFirstContent(), which looks for a node by walking the tree relative to the range's start point. And the start and end container of the range should also always be in the same tree.

My only two guesses so far are that a) we either somehow wind up with mRef for the range's start position in a different tree than mParent, or b) this has something to do with ranges crossing into anonymous content.

I haven't managed to find a way to trigger possibility a. I haven't tried to find a way to trigger b yet.

That said, the two signatures attached to this bug are possibly unrelated crashes. I haven't actually figured out where we're crashing in any of the reports for the mozilla::ContentSubtreeIterator::InitWithRange signature yet. They all report crashing on the line CacheInclusiveAncestorsOfEndContainer(); with MOZ_RELEASE_ASSERT(isSome())

(In reply to Kris Maglione [:kmag] from comment #11)

That said, the two signatures attached to this bug are possibly unrelated crashes. I haven't actually figured out where we're crashing in any of the reports for the mozilla::ContentSubtreeIterator::InitWithRange signature yet. They all report crashing on the line CacheInclusiveAncestorsOfEndContainer(); with MOZ_RELEASE_ASSERT(isSome())

On second thought, they probably are just the same crash manifesting on release builds when we try to dereference the Maybe instead of asserting. I have spent too much time debugging the other signature and my brain has turned to mush.

I'm running out of ideas for this, at the moment. It does look like there is code in Selection which could potentially create a range which spans regular content and multiple shadow roots/native anonymous roots, but I don't see any callers which might call it that way.

My best lead right now is that a majority of the crashes seem to be on ESR builds, and the majority of those seem to have the "Amazon Enterprise Access" extension installed. The only one with a URL is also at https://sellercentral.amazon.co.jp/. So this might be triggered by something that Amazon enterprise/seller code does. But I don't know how to get access to any such code, so I can't do much more with it.

In any case, I'm downgrading to S3 since the crash volume seems fairly low at this point, and seems mostly related to a particular enterprise configuration. If someone can provide me steps to reproduce, I'm happy to look into it further.

Severity: S2 → S3

Thank you, kmag. One of possible solution here is, we could add MOZ_DIAGNOSTIC_ASSERT at nsRange::DoSetRange or somewhere, then, crash it when the range crosses a boundary of shadow DOM or native anonymous subtree only in Nightly (fortunately, there are some reports from the Nightly users).

Although I'd be happy if there were a way to collect stack without crash for avoiding inconvenience of nightly and beta testers.

(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900) from comment #14)

Although I'd be happy if there were a way to collect stack without crash for avoiding inconvenience of nightly and beta testers.

I think we'll probably just be moving the crashes closer to the point where the actual problem occurs rather than adding new crashes most of the time, so hopefully it shouldn't be too much of an issue in practice.

Performance might be more of a concern. Hopefully that code path shouldn't be particularly hot, but it perhaps could be if we have a lot of search highlights, and there are mutations that affect their boundary data.

I think I will try to add an appropriate assertion, though. It seems worth the risk.

We are currently winding up with crashes that seem to be caused by ranges in a
selection having start and end containers in disconnected trees. Most likely
this has something to do with the ranges crossing shadow or native anonymous
boundaries, but it isn't clear.

This patch changes DoSetRange so we should hopefully assert as soon as we
get into that situation, rather than when we try to inspect the range. The
most obvious way to do this check would be to upgrade
AssertIfMismatchRootAndRangeBoundaries from a debug assertion to a
diagnostic assertion. Unfortunately, those checks are rather expensive, and
this code should be fairly hot.

Fortunately, the existing !mIsPositioned assertion should have the same
effect in all of the situations I've been able to think of. I've re-checked
the code a few times since I wrote the patch, and I believe it is sound. We
should hopefully find out within a couple of weeks if we need to do more.

Assignee: nobody → kmaglione+bmo
Status: NEW → ASSIGNED

I've managed to hit this twice, on two different platforms (Linux and macOS), by using a web GUI for Stable Diffusion and its Image Viewer extension: https://github.com/AUTOMATIC1111/stable-diffusion-webui/ and the extension is https://github.com/yfszzx/stable-diffusion-webui-images-browser

There's no solid STR here though. But maybe the extension code gives insight what kind of manipulation it is doing? It's basically a single JS file: https://raw.githubusercontent.com/yfszzx/stable-diffusion-webui-images-browser/main/javascript/images_history.js

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:kmag, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.

Flags: needinfo?(masayuki)
Flags: needinfo?(kmaglione+bmo)

Sorry, I thought this had landed weeks ago.

Flags: needinfo?(masayuki)
Flags: needinfo?(kmaglione+bmo)
Pushed by maglione.k@gmail.com: https://hg.mozilla.org/integration/autoland/rev/c746a700bc16 Upgrade !mIsPositioned assertion to diagnostic assert. r=masayuki
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch

The patch that landed was mostly for diagnostics, though it should change the crash signature so it happens earlier.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

These new crashes just popped up hitting this assertion. There's other crashes under this signature but they appear unrelated.

I've been looking into this, but still haven't been able to figure out a way we could wind up with disconnected start and end positions no matter how much I look at the code.

The best lead I have so far is that nearly all of the crashes with the new signature are in the privilegedabout process, and nearly all of those are from a replaceWith call in about:logins to hide the revealed password input after a timeout. That page does make fairly heavy use of shadow DOM, but the boundary is pretty far away from the element being replaced.

I've been trying to reproduce that crash, manually selecting various things and triggering the hiding code, but I still haven't had any success. I'm going to try constructing some selections from JS and triggering it again, but I still don't have a clear idea of where the boundaries would need to be in order for it to work.

Crash Signature: [@ mozilla::ContentSubtreeIterator::DetermineFirstContent] [@ mozilla::ContentSubtreeIterator::InitWithRange] → [@ mozilla::ContentSubtreeIterator::DetermineFirstContent] [@ mozilla::ContentSubtreeIterator::InitWithRange] [@ nsRange::DoSetRange<T>]

I think that the crash reports about mozilla::ContentSubtreeIterator::DetermineFirstContent is fixed (hidden) by the patch for bug 1820445 because it makes Selection::SelectFrames abort in odd cases.

Most crash reports about nsRange::DoSetRange<T> are caused during replacing a node. Could be the new node comes from different document or replaced node was the container of the range and moving to different document?

Oops, it checks MOZ_DIAGNOSTIC_ASSERT(!mIsPositioned, "unexpected disconnected nodes"). So, this is the case when the range has valid boundaries but the common ancestor is nullptr.

aRootNode is considered by RangeUtils::ComputeRootNode. However, newCommonAncestor is considered by nsContentUtils::GetClosestCommonInclusiveAncestor which climbs up the DOM true with nsINode::GetParentNode().

aRootNode can be a shadow root or a native anonymous subtree root.

I guess that if newCommonAncestor is considered by nsContentUtils::GetCommonFlattenedTreeAncestor which uses nsINode::GetFlattenedTreeParent(), newCommonAncestor never be nullptr if aRootNode is not nullptr.

However, I have no idea how to make the situation that aRootNode is not nullptr but newCommonAncestor is nullptr.

FYI: For setting newCommonAncestor to nullptr, the containers must be different and must be reached different root nodes with nsINode::GetParentNode().

Maybe interesting - a number of crash reports show about:logins as the crashing URL. If it would be helpful to look at those reports in particular, let me know.

Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:

  • Top 10 desktop browser crashes on nightly
  • Top 10 content process crashes on beta

For more information, please visit BugBot documentation.

Keywords: topcrash

[@ mozilla::dom::AbstractRange::UpdateCommonAncestorIfNecessary] - signature change from bug 1828469.

Crash Signature: [@ mozilla::ContentSubtreeIterator::DetermineFirstContent] [@ mozilla::ContentSubtreeIterator::InitWithRange] [@ nsRange::DoSetRange<T>] → [@ mozilla::ContentSubtreeIterator::DetermineFirstContent] [@ mozilla::ContentSubtreeIterator::InitWithRange] [@ mozilla::dom::AbstractRange::UpdateCommonAncestorIfNecessary] [@ nsRange::DoSetRange<T>]

It seems that all crash reports are:

mozilla::dom::AbstractRange::UpdateCommonAncestorIfNecessary()
nsRange::DoSetRange<nsINode*, nsIContent*, nsINode*, nsIContent*>(mozilla::RangeBoundaryBase<nsINode*, nsIContent*> const&, mozilla::RangeBoundaryBase<nsINode*, nsIContent*> const&, nsINode*, bool)
nsRange::ContentRemoved(nsIContent*, nsIContent*)

When mIsPositioned is set to false, aStartBoundary.IsSetAndValid() or aEndBoundary.IsSetAndValid() may return false.
https://searchfox.org/mozilla-central/rev/cde3d4a8d228491e8b7f1bd94c63bbe039850696/dom/base/nsRange.cpp#850-851

If old container points second or later (i.e., offfset is 1 or larger), IsSetAndValid checks whether the previous sibling of pointing node is still in same parent and the previous sibling is not removed.
https://searchfox.org/mozilla-central/rev/cde3d4a8d228491e8b7f1bd94c63bbe039850696/dom/base/RangeBoundary.h#365

If the previous sibling of pointing child is being removed, nsRange::ContentRemoved considers new points here:
https://searchfox.org/mozilla-central/rev/cde3d4a8d228491e8b7f1bd94c63bbe039850696/dom/base/nsRange.cpp#585,600

So, if aPreviousSibling has already been removed from the parent, I think this could occur. I'll try to write a testcase.

Hmm, I cannot reproduce it even with mutation events or events which are fired at inserting specific element.

Olli, do you have any ideas to remove aPreviousSibling or AbstractRange::mStart.mRef (Previous sibling of pointing child node) immediately before the call of nsRange::ContentRemoved?

Flags: needinfo?(smaug)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash
Status: REOPENED → NEW
Target Milestone: 109 Branch → ---

Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on beta
  • Top 10 content process crashes on beta

For more information, please visit BugBot documentation.

Keywords: topcrash

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: kmaglione+bmo → nobody
Attached image logins_crash.gif

Hello! I'm hitting this crash signature on Windows 10x64 on the about:logins page after clicking the Show Password button and entering the Windows PIN. This is intermittent and it can happen randomly after many tries or on the first try. This was tested with Firefox 122.0a1 (2023-12-10).
Steps:

  1. Open Firefox with a new profile and set a password for a random webpage.
  2. Go to about:logins page and click the Show password button a few times.

If the issue does not reproduce refresh the page and repeat the above steps. Attached a screen recording.

I can't see any recent crashes for this.

Flags: needinfo?(smaug)

There is still a few crashes like this.

Hmm, that has very different stack trace.
I think I'm a bit confused what this bug is about these days.

Hmm, is this because of https://searchfox.org/mozilla-central/rev/bf8c7a7d47debb1c22b51a72184d5c703ae57588/dom/base/RangeUtils.cpp#61-73
If there is NAC inside shadow DOM, don't we get wrong root?

Hmm, though, I'm still not sure how that could lead to the assertion failure.

Flags: needinfo?(masayuki)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #42)

Hmm, that has very different stack trace.
I think I'm a bit confused what this bug is about these days.

The root cause is, somebody (now, they should only be Selection or nsFrameSelection) update the range outside the DOM tree which the Selection belong to.

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #43)

Hmm, is this because of https://searchfox.org/mozilla-central/rev/bf8c7a7d47debb1c22b51a72184d5c703ae57588/dom/base/RangeUtils.cpp#61-73
If there is NAC inside shadow DOM, don't we get wrong root?

Hmm, though, I'm still not sure how that could lead to the assertion failure.

I also don't have more idea about this. I fixed the issues:

  • Directly updating nsRange retrieved by Selection::GetRangeAt to outside the DOM tree
  • nsIFrame digging into a native anonymous subtree if Selection::Modify or nsFrameSelection retrieves next point from anonymous subtree host

There are a lot of paths which may run by Selection::Modify may still have some bugs, though.

Flags: needinfo?(masayuki)

But one should be able to pass whatever random things to Selection::Modify and Range code should be able to tolerate that, I think.

See Also: → 1890899

The volume is increasing here, especially on nightly and beta. The comments and URLs suggest this keeps happening only within about:logins, here's a few recent comments:

trying to create a username and password inside Nightly passwords for a site I am just starting to use

was editing my passwords

double clicked on a password

This signature seems related too, plenty of crashes in about:logins and even more user comments describing the crash as related to dealing with passwords. Masayuki would it be appropriate to add it to the other signatures here?

Flags: needinfo?(masayuki)

Unfortunately, I'm not sure. The difficult issue of this bug is, the root causes happen before starting the stack at crash (i.e., things already happened and the crash is just a symptom). However, it seems that the range came from Selection and failed to compute the common ancestor, so, could be caused by same issue, indeed.

Flags: needinfo?(masayuki)

Ah, but it's a crash in release channel, so, filing a new bug and apply a null-check must be better. Currently, we do so for avoiding the crash in the release channel.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: