MOZ_ALWAYS_TRUE(isNodeContainedInRange) diagnostic assertion crash in [@ mozilla::ContentSubtreeIterator::InitWithRange]
Categories
(Core :: DOM: Selection, defect, P3)
Tracking
()
People
(Reporter: woyece, Unassigned)
References
Details
(Keywords: assertion, Whiteboard: [not-a-fission-bug])
Crash Data
Attachments
(2 files)
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: crash-stats .mozilla .org/ report/ index/ d14f3daf-1067-4b90-8e88-669a60210330
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(false) (isNodeContainedInRange)
Top 10 frames of crashing thread:
0 libxul.so mozilla::ContentSubtreeIterator::InitWithRange dom/base/ContentIterator.cpp:932
1 libxul.so mozilla::ContentSubtreeIterator::Init dom/base/ContentIterator.cpp:776
2 libxul.so mozilla::dom::Selection::Clear dom/base/Selection.cpp:1109
3 libxul.so mozilla::dom::Selection::RemoveAllRanges dom/base/Selection.cpp:1865
4 libxul.so nsFocusManager::MoveCaretToFocus dom/base/nsFocusManager.cpp:2975
5 libxul.so nsFocusManager::UpdateCaret dom/base/nsFocusManager.cpp:2945
6 libxul.so nsFocusManager::Focus dom/base/nsFocusManager.cpp:2601
7 libxul.so nsFocusManager::SetFocusInner dom/base/nsFocusManager.cpp:1718
8 libxul.so mozilla::dom::Element::Focus dom/base/Element.cpp:462
9 libxul.so mozilla::dom::HTMLElement_Binding::focus dom/bindings/HTMLElementBinding.cpp:9239
Comment 1•4 years ago
|
||
Mirko, you added this MOZ_ALWAYS_TRUE(isNodeContainedInRange) diagnostic assertion in copy&paste bug 1649121 six months ago. What conditions might cause isNodeContainedInRange to be null?
(In reply to Chris Peterson [:cpeterson] from comment #1)
Mirko, you added this
MOZ_ALWAYS_TRUE(isNodeContainedInRange)diagnostic assertion in copy&paste bug 1649121 six months ago.
That patch removed only redundant code, an equivalent assertion existed before.
What conditions might cause
isNodeContainedInRangeto be null?
Checking the implementation:
When https://searchfox.org/mozilla-central/rev/d7e344e956d9da2808ea33e1fe0f963ed10c142d/dom/base/RangeUtils.cpp#99,107 returns Nothing().
This happens if one of the lines of https://searchfox.org/mozilla-central/rev/d7e344e956d9da2808ea33e1fe0f963ed10c142d/dom/base/RangeUtils.cpp#121,130,179,191 is executed.
I guess in this case it's one of the latter two lines which is executed. Not knowing more about the context of the bug, I guess it could be because of anonymous elements or Shadow DOM elements.
Mirko, can you take this bug?
I've enough work for the coming weeks assigned to myself. I can keep the ni?-request and have a closer look afterwards.
Comment 5•4 years ago
|
||
This assertion's crash volume is very low (and it's not related to Fission), so this bug is not a high priority.
Comment 6•3 years ago
|
||
There are a decent number of crashes with this signature, where the crash reason is MOZ_RELEASE_ASSERT(isSome()), rather than the diagnostic assert in comment 0. Bug 1754305 turned this into a release assert, and is now on ESR91 and 98 so the total volume might be higher.
Updated•3 years ago
|
Comment 7•3 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on release
:edgar, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 8•3 years ago
|
||
Increase the severity to S2 due to the total volume becomes higher.
Comment 9•3 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 10•2 years ago
|
||
The only way I can see this happening is if IsPositioned() is false for the range. We assert against that here, but maybe we should upgrade it to a diagnostic assert to be sure. If it is, it means we're winding up with unpositioned ranges in Selection::mStyledRanges.mRanges, and should probably figure out why that's happening.
But we could also just safely return null if !isNodeContainedInRange
Comment 11•2 years ago
|
||
This is actually somewhat weirder than I thought. We actually check !aRange->IsPositioned() higher in the call chain and return early. The only way this should happen if the node we're comparing is in a different tree from the start or end containers of the range. But the node comes from DetermineCandidateForFirstContent(), which looks for a node by walking the tree relative to the range's start point. And the start and end container of the range should also always be in the same tree.
My only two guesses so far are that a) we either somehow wind up with mRef for the range's start position in a different tree than mParent, or b) this has something to do with ranges crossing into anonymous content.
I haven't managed to find a way to trigger possibility a. I haven't tried to find a way to trigger b yet.
That said, the two signatures attached to this bug are possibly unrelated crashes. I haven't actually figured out where we're crashing in any of the reports for the mozilla::ContentSubtreeIterator::InitWithRange signature yet. They all report crashing on the line CacheInclusiveAncestorsOfEndContainer(); with MOZ_RELEASE_ASSERT(isSome())
Comment 12•2 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #11)
That said, the two signatures attached to this bug are possibly unrelated crashes. I haven't actually figured out where we're crashing in any of the reports for the
mozilla::ContentSubtreeIterator::InitWithRangesignature yet. They all report crashing on the lineCacheInclusiveAncestorsOfEndContainer();withMOZ_RELEASE_ASSERT(isSome())
On second thought, they probably are just the same crash manifesting on release builds when we try to dereference the Maybe instead of asserting. I have spent too much time debugging the other signature and my brain has turned to mush.
Comment 13•2 years ago
|
||
I'm running out of ideas for this, at the moment. It does look like there is code in Selection which could potentially create a range which spans regular content and multiple shadow roots/native anonymous roots, but I don't see any callers which might call it that way.
My best lead right now is that a majority of the crashes seem to be on ESR builds, and the majority of those seem to have the "Amazon Enterprise Access" extension installed. The only one with a URL is also at https://sellercentral.amazon.co.jp/. So this might be triggered by something that Amazon enterprise/seller code does. But I don't know how to get access to any such code, so I can't do much more with it.
In any case, I'm downgrading to S3 since the crash volume seems fairly low at this point, and seems mostly related to a particular enterprise configuration. If someone can provide me steps to reproduce, I'm happy to look into it further.
Thank you, kmag. One of possible solution here is, we could add MOZ_DIAGNOSTIC_ASSERT at nsRange::DoSetRange or somewhere, then, crash it when the range crosses a boundary of shadow DOM or native anonymous subtree only in Nightly (fortunately, there are some reports from the Nightly users).
Although I'd be happy if there were a way to collect stack without crash for avoiding inconvenience of nightly and beta testers.
Comment 15•2 years ago
|
||
(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900) from comment #14)
Although I'd be happy if there were a way to collect stack without crash for avoiding inconvenience of nightly and beta testers.
I think we'll probably just be moving the crashes closer to the point where the actual problem occurs rather than adding new crashes most of the time, so hopefully it shouldn't be too much of an issue in practice.
Performance might be more of a concern. Hopefully that code path shouldn't be particularly hot, but it perhaps could be if we have a lot of search highlights, and there are mutations that affect their boundary data.
I think I will try to add an appropriate assertion, though. It seems worth the risk.
Comment 16•2 years ago
|
||
We are currently winding up with crashes that seem to be caused by ranges in a
selection having start and end containers in disconnected trees. Most likely
this has something to do with the ranges crossing shadow or native anonymous
boundaries, but it isn't clear.
This patch changes DoSetRange so we should hopefully assert as soon as we
get into that situation, rather than when we try to inspect the range. The
most obvious way to do this check would be to upgrade
AssertIfMismatchRootAndRangeBoundaries from a debug assertion to a
diagnostic assertion. Unfortunately, those checks are rather expensive, and
this code should be fairly hot.
Fortunately, the existing !mIsPositioned assertion should have the same
effect in all of the situations I've been able to think of. I've re-checked
the code a few times since I wrote the patch, and I believe it is sound. We
should hopefully find out within a couple of weeks if we need to do more.
Updated•2 years ago
|
Comment 17•2 years ago
•
|
||
I've managed to hit this twice, on two different platforms (Linux and macOS), by using a web GUI for Stable Diffusion and its Image Viewer extension: https://github.com/AUTOMATIC1111/stable-diffusion-webui/ and the extension is https://github.com/yfszzx/stable-diffusion-webui-images-browser
There's no solid STR here though. But maybe the extension code gives insight what kind of manipulation it is doing? It's basically a single JS file: https://raw.githubusercontent.com/yfszzx/stable-diffusion-webui-images-browser/main/javascript/images_history.js
Comment 18•2 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:kmag, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.
Comment 19•2 years ago
|
||
Sorry, I thought this had landed weeks ago.
Comment 20•2 years ago
|
||
Comment 21•2 years ago
|
||
| bugherder | ||
Comment 22•2 years ago
|
||
The patch that landed was mostly for diagnostics, though it should change the crash signature so it happens earlier.
Updated•2 years ago
|
Comment 23•2 years ago
|
||
These new crashes just popped up hitting this assertion. There's other crashes under this signature but they appear unrelated.
Comment 24•2 years ago
|
||
I've been looking into this, but still haven't been able to figure out a way we could wind up with disconnected start and end positions no matter how much I look at the code.
The best lead I have so far is that nearly all of the crashes with the new signature are in the privilegedabout process, and nearly all of those are from a replaceWith call in about:logins to hide the revealed password input after a timeout. That page does make fairly heavy use of shadow DOM, but the boundary is pretty far away from the element being replaced.
I've been trying to reproduce that crash, manually selecting various things and triggering the hiding code, but I still haven't had any success. I'm going to try constructing some selections from JS and triggering it again, but I still don't have a clear idea of where the boundaries would need to be in order for it to work.
Updated•2 years ago
|
Comment 25•2 years ago
•
|
||
I think that the crash reports about mozilla::ContentSubtreeIterator::DetermineFirstContent is fixed (hidden) by the patch for bug 1820445 because it makes Selection::SelectFrames abort in odd cases.
Most crash reports about nsRange::DoSetRange<T> are caused during replacing a node. Could be the new node comes from different document or replaced node was the container of the range and moving to different document?
Oops, it checks MOZ_DIAGNOSTIC_ASSERT(!mIsPositioned, "unexpected disconnected nodes"). So, this is the case when the range has valid boundaries but the common ancestor is nullptr.
Hmm, couldn't reproduce this with:
aRootNode is considered by RangeUtils::ComputeRootNode. However, newCommonAncestor is considered by nsContentUtils::GetClosestCommonInclusiveAncestor which climbs up the DOM true with nsINode::GetParentNode().
aRootNode can be a shadow root or a native anonymous subtree root.
I guess that if newCommonAncestor is considered by nsContentUtils::GetCommonFlattenedTreeAncestor which uses nsINode::GetFlattenedTreeParent(), newCommonAncestor never be nullptr if aRootNode is not nullptr.
However, I have no idea how to make the situation that aRootNode is not nullptr but newCommonAncestor is nullptr.
FYI: For setting newCommonAncestor to nullptr, the containers must be different and must be reached different root nodes with nsINode::GetParentNode().
Comment 30•2 years ago
|
||
Maybe interesting - a number of crash reports show about:logins as the crashing URL. If it would be helpful to look at those reports in particular, let me know.
Comment 31•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:
- Top 10 desktop browser crashes on nightly
- Top 10 content process crashes on beta
For more information, please visit BugBot documentation.
Comment 32•2 years ago
|
||
[@ mozilla::dom::AbstractRange::UpdateCommonAncestorIfNecessary] - signature change from bug 1828469.
It seems that all crash reports are:
mozilla::dom::AbstractRange::UpdateCommonAncestorIfNecessary()
nsRange::DoSetRange<nsINode*, nsIContent*, nsINode*, nsIContent*>(mozilla::RangeBoundaryBase<nsINode*, nsIContent*> const&, mozilla::RangeBoundaryBase<nsINode*, nsIContent*> const&, nsINode*, bool)
nsRange::ContentRemoved(nsIContent*, nsIContent*)
When mIsPositioned is set to false, aStartBoundary.IsSetAndValid() or aEndBoundary.IsSetAndValid() may return false.
https://searchfox.org/mozilla-central/rev/cde3d4a8d228491e8b7f1bd94c63bbe039850696/dom/base/nsRange.cpp#850-851
If old container points second or later (i.e., offfset is 1 or larger), IsSetAndValid checks whether the previous sibling of pointing node is still in same parent and the previous sibling is not removed.
https://searchfox.org/mozilla-central/rev/cde3d4a8d228491e8b7f1bd94c63bbe039850696/dom/base/RangeBoundary.h#365
If the previous sibling of pointing child is being removed, nsRange::ContentRemoved considers new points here:
https://searchfox.org/mozilla-central/rev/cde3d4a8d228491e8b7f1bd94c63bbe039850696/dom/base/nsRange.cpp#585,600
So, if aPreviousSibling has already been removed from the parent, I think this could occur. I'll try to write a testcase.
Hmm, I cannot reproduce it even with mutation events or events which are fired at inserting specific element.
Olli, do you have any ideas to remove aPreviousSibling or AbstractRange::mStart.mRef (Previous sibling of pointing child node) immediately before the call of nsRange::ContentRemoved?
Comment 35•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Updated•2 years ago
|
Comment 36•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on beta
- Top 10 content process crashes on beta
For more information, please visit BugBot documentation.
Comment 37•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 38•1 year ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Comment 39•1 year ago
•
|
||
Hello! I'm hitting this crash signature on Windows 10x64 on the about:logins page after clicking the Show Password button and entering the Windows PIN. This is intermittent and it can happen randomly after many tries or on the first try. This was tested with Firefox 122.0a1 (2023-12-10).
Steps:
- Open Firefox with a new profile and set a password for a random webpage.
- Go to about:logins page and click the Show password button a few times.
If the issue does not reproduce refresh the page and repeat the above steps. Attached a screen recording.
There is still a few crashes like this.
Comment 42•1 year ago
•
|
||
Hmm, that has very different stack trace.
I think I'm a bit confused what this bug is about these days.
Comment 43•1 year ago
•
|
||
Hmm, is this because of https://searchfox.org/mozilla-central/rev/bf8c7a7d47debb1c22b51a72184d5c703ae57588/dom/base/RangeUtils.cpp#61-73
If there is NAC inside shadow DOM, don't we get wrong root?
Hmm, though, I'm still not sure how that could lead to the assertion failure.
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #42)
Hmm, that has very different stack trace.
I think I'm a bit confused what this bug is about these days.
The root cause is, somebody (now, they should only be Selection or nsFrameSelection) update the range outside the DOM tree which the Selection belong to.
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #43)
Hmm, is this because of https://searchfox.org/mozilla-central/rev/bf8c7a7d47debb1c22b51a72184d5c703ae57588/dom/base/RangeUtils.cpp#61-73
If there is NAC inside shadow DOM, don't we get wrong root?Hmm, though, I'm still not sure how that could lead to the assertion failure.
I also don't have more idea about this. I fixed the issues:
- Directly updating
nsRangeretrieved bySelection::GetRangeAtto outside the DOM tree nsIFramedigging into a native anonymous subtree ifSelection::ModifyornsFrameSelectionretrieves next point from anonymous subtree host
There are a lot of paths which may run by Selection::Modify may still have some bugs, though.
Comment 45•1 year ago
|
||
But one should be able to pass whatever random things to Selection::Modify and Range code should be able to tolerate that, I think.
Comment 46•1 year ago
|
||
The volume is increasing here, especially on nightly and beta. The comments and URLs suggest this keeps happening only within about:logins, here's a few recent comments:
trying to create a username and password inside Nightly passwords for a site I am just starting to use
was editing my passwords
double clicked on a password
Comment 47•1 year ago
|
||
This signature seems related too, plenty of crashes in about:logins and even more user comments describing the crash as related to dealing with passwords. Masayuki would it be appropriate to add it to the other signatures here?
Unfortunately, I'm not sure. The difficult issue of this bug is, the root causes happen before starting the stack at crash (i.e., things already happened and the crash is just a symptom). However, it seems that the range came from Selection and failed to compute the common ancestor, so, could be caused by same issue, indeed.
Ah, but it's a crash in release channel, so, filing a new bug and apply a null-check must be better. Currently, we do so for avoiding the crash in the release channel.
Description
•