Closed Bug 1904682 Opened 5 months ago Closed 5 months ago

Hang in `gatherDataText` or `nsNavHistoryContainerResultNode::ReverseUpdateStats` when dragging and dropping bookmarks folders in bookmark sidebar

Categories

(Toolkit :: Places, defect)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: dholbert, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: perf, Whiteboard: [sng])

Attachments

(2 files)

[NOTE: I'm updating this initial comment with clearer STR details from comment 2, comment 4, and comment 5]

STR:

  1. Start with a fresh profile.
    2. Bookmark 10-20 links (e.g. right-click links and press "B" and then "Enter")
    3. Open the bookmarks sidebar and rapidly/randomly click and drag inside that sidebar, to drag-and-drop bookmarks to rearrange them.
  2. Use drag-and-drop to create a recursive loop in the folder structure -- e.g.:
  • Click and drag "Bookmarks Menu" to "Bookmarks Toolbar", and then click and drag "Bookmarks Toolbar" to "Bookmarks Menu"
  • [if you're on linux] Click and drag "Bookmarks Menu" and drop it onto itself. (which creates a nested loop via bug 1904705)
  1. Click and drag "Bookmarks Menu" a few more times.

ACTUAL RESULTS:
At first, Firefox janks for several seconds, in a deeply-nested call to PlacesUtils function gatherDataText. I captured this in profiler:
https://share.firefox.dev/3VIj4Vt

And then (if you try to drag a few more times) Firefox hangs indefinitely, for many minutes at least, long enough for me to give up on it. kill -11 shows that we're doing infinite recursion in nsNavHistoryContainerResultNode::ReverseUpdateStats(int)

EXPECTED RESULTS:
No such jank/hang.

I think I ended up with a copy of the Bookmarks Menu inside of the Bookmarks Menu, which creates some sort of infinite-depth recursive folder hierarchy; and so we trigger some sort of recursive death-spiral when trying to do any operations on that entry.

Better STR here:

  1. Open bookmarks sidebar (Ctrl B) in a fresh profile.
  2. Click and drag "Bookmarks Menu" to "Bookmarks Toolbar"
  3. Expand the Bookmarks Toolbar to see its contents.
  4. Repeat step 2, either dropping the Bookmarks Menu into the Bookmarks toolbar, or (ideally) onto the Bookmarks Menu "alias" that exists there.

ACTUAL RESULTS:
Hangs which get progressively worse, as you attempt to drag. Eventually Firefox stops responding entirely.

Attached video screencast of bug

In my attached screencast (which starts with a fresh profile), the first sign of trouble is at around t=9s where I try to start a drag operation, but we don't actually show anything being dragged for a few seconds. Then that eventually gets resolved and the thing-being-dragged shows up under my cursor. But after just a couple more drag-and-drops, Firefox is permanently pegged, as shown by the "Firefox Nightly is Not Responding" dialog that appears. (The screencast ends at that point, but Firefox remained unresponsive with 100% CPU for 5+ minutes -- long enough for me to give up and kill Firefox.)

Here's the crash report that was generated when I killed Firefox during the hang - it shows that we've got infinite recursion in nsNavHistoryContainerResultNode::ReverseUpdateStats(int): bp-bb510e42-d0e2-4594-9580-ad5df0240625

Summary: Multi-second hang in `gatherDataText` when dragging and dropping bookmarks in sidebar → Multi-second hang in `gatherDataText` or `nsNavHistoryContainerResultNode::ReverseUpdateStats` when dragging and dropping bookmarks in sidebar
Summary: Multi-second hang in `gatherDataText` or `nsNavHistoryContainerResultNode::ReverseUpdateStats` when dragging and dropping bookmarks in sidebar → Hang in `gatherDataText` or `nsNavHistoryContainerResultNode::ReverseUpdateStats` when dragging and dropping bookmarks folders in bookmark sidebar

I'm testing with today's Nightly, 129.0a1 (2024-06-25) (64-bit).

I also tried testing with an old Nightly, 2019-01-01. That Nightly wouldn't let me create a direct recursive loop -- it prevents me from dropping the Bookmarks Menu in the Bookmarks Menu. But I can do a ping-pong recursive loop -- i.e. I can put the Bookmarks Menu in the Bookmarks Toolbar, and then put the Bookmarks Toolbar in the Bookmarks Menu. After I do that: if I start a drag operation on the Bookmarks Menu, I get an immediate hang in that old Nightly.

In current Nightly, these same STR (dragging one to the other, and then vice versa) doesn't seem to create as substantial of a hang, but it does make drag operations require a second or two in order to render the dragged thing under my cursor. And if I drag the Bookmarks Menu into a blank spot a few times in a row, I end up getting an indefinite hang.

(In reply to Daniel Holbert [:dholbert] from comment #5)

I'm testing with today's Nightly, 129.0a1 (2024-06-25) (64-bit).

I also tried testing with an old Nightly, 2019-01-01. That Nightly wouldn't let me create a direct recursive loop -- it prevents me from dropping the Bookmarks Menu in the Bookmarks Menu. But I can do a ping-pong recursive loop

Two additional notes:

  • The fact that we can create direct recursive loops, i.e. let a bookmark folder be directly droppped onto itself to create a nested recursive alias, appears to be a Linux-only issue and seems to be a regression that I'll file separately.
  • However, the "ping-pong" recursive loop (and resulting hang) is not a regression -- at least, it's older than 2019 -- and is not linux-specific (I confirmed I can trigger that badness on macOS, by doing the same process as shown in comment 6's screencast.
Depends on: 1904705

:mak, do you know if there's anything we can do to mitigate this? It's surprisingly easy to trigger by inadvertent dragging in the bookmarks sidebar (particularly on linux via bug 1904705), and it essentially takes down the whole browser.

Seems like perhaps we should reject the subfolder/alias-creation if we can tell that it forms a cycle? (Hopefully that's something we can detect. I'm assuming that that's the root issue here.)

Flags: needinfo?(mak)

Thanks Daniel for investigating this slowdown.

Unfortunately it may not always be trivial to detect if we're creating a loop.
One side we'd need the list of all the ancestors, and that info must be available synchronously as DnD is a synchronous operation. That means sync I/O that we prefer to avoid. We could query the ancestors when we build the view, but then we'd have to keep that list updated, that sounds like a nightmare.

The alternative is to check on drop/paste, though it's also non-trivial because the node can represent a complex query and then we'd have to execute it fully to see all of its contents. Now, imagine it's something containing 40k bookmarks and maybe even other queries that we must execute to check their children.
And then we should still gather the list of all the ancestors.
In certain views (full trees) getting ancestors is trivial because we have the list of ancestors available in the tree itself, so we can just walk up the tree, and that's what we do here for example. But flat views don't allow us to easily check the ancestors without querying them.

That means slowing down any operation while we do those checks. Or we should keep a full copy of the bookmarks tree in memory, that is likely what other browsers are doing (though I wonder if they have users with 80k bookmarks like us).
We implemented some checks that won't slow down the process, for example one can't drop a container on itself normally.
If the widget code has bugs that prevent us from disallowing a drop, that makes things even worse, as our few protection mechanisms are also broken.

Could we do better? Yes, but how expensive would it be to code that protection, versus the fact one can just remove the recursion by deleting the wrongly nested node?
As I think that will be very time consuming, we should probably instead concentrate on the ReverseUpdateStats recursion. I think in this case we could avoid it but we need to study the situation better.

Folders and queries always show 0 visits in the number of visits field, so why are we trying to update mAccessCount of ancestors at all? What would break if we not do it? It's possible in the past folders use to show the number of visits of all the children? Or maybe it is the group by date containers (but why would we care about the number of visits in a month)? Overall I don't understand why we update mAccessCount of ancestors.

The other thing ReverseUpdateStats does is updating mTime of the ancestors... That seems to make more sense, as it's the last modified time, for example adding a bookmark to a folder, will also update the last modified time of that folder and its ancestors.
Though again, query-like nodes (included folder shortcuts) aren't really "modified" by their children changing, they are just queries.

To summarize, I think (but must be verified!) we can change ReverseUpdateStats to UpdateModifedTimeOfAncestors, and the recursion can stop as soon as we hit a query-like node. Currently that means only call recursively if the parent is a real bookmark folder... and maybe also tag containers.
Updating anything else looks like a waste of time and resources.

I'll keep the needinfo oper for now, as this looks like an interesting win.

Keywords: perf
Flags: needinfo?(mak)
Whiteboard: [sng]
Flags: needinfo?(mak)

The more I look at this code, the more I think ReverseUpdateStats() is a waste of resources.
Its primary scope is to update ancestors of a container when the mTime or mAccessCount of the container change.
Though, so far, I couldn't find a single case where that is necessary.

Indeed, if the parent of the current container is a folder of bookmarks, it's obvious from the Library itself that the number of visits and Most recent visit is always reported as empty. Who cares how bookmarks inside a folder where visited as a whole, at a maximum it's interesting to know about each bookmark. Indeed we don't query for that data even initially.
If the parent is a query, there is no reason for which a query should keep track of the number of visits or the most recent visit inside it, since the query definition is, by its own nature, independent from the contained results, and dynamic.

The only remaining reason to keep that data updated, is to sort containers by most recent visit or most visited, but when we query we flatten down the list, we only return pre-sorted containers in rare cases (history grouped by host or date, tags) and in those cases we don't allow re-sorting the list.
Even if in the future we should not flat down the lists (allow searching folders, for example) sorting folders or tags by total number of visits or last visit date has few purposes, it's merely a statistical data or something for a poor kind of "clean-up" task, that would be better covered by dedicated tools (a stats dashboard or a bookmarks clean-up tool).

I'll test a patch that completely removes ReverseUpdateStats(). If in the future we'll need something like it, then we must implement it more carefully, and have tests covering it, as afaict no test today is checking it.

Depends on: 1904905

The other stack here points to GatherDataText where we are building data about a node to wrap it and rebuild it elsewhere (after drop or paste).
Here we are trying to builld a text representation of a node, and when we hit a container our representation is a list of URLs...
That's handy cause one can copy a query result or a bookmark folder, and paste a list of URIs elsewhere.
GatherDataHTML works similarly.

There is no protection against building a too large string or recursing too much here. I think at first stance we could allow recursion only for real folders, as you can't build infinite structures with folders.
That will limit a bit the reach of the feature, but I think it's not too bad, one can still copy the first level of a query or a tag and get a full representation. And when one is copying a root, we'll allow to recurse into folders, but not into other queries or shortcuts.
One can always export bookmarks as json or html anyway.

I understand these two fixes don't completely prevent the problem from returning in the future, but we can keep open an investigation to prevent loops from happening, if we ever find a way to do that efficiently.

Flags: needinfo?(mak)
Depends on: 1904909

Daniel, would you have a chance to test Nightly after the most recent fixes from Bug 1904905 and Bug 1904909?

Flags: needinfo?(dholbert)

Yup - testing in 129.0a1 (2024-07-08) (64-bit), I'm not able to trigger any noticeable hangs or trouble when following the STR here (or stress-test aggressive stepped-up versions of the STR).

I was easily able to create a horrifically tangled[1] folder structure without any hangs, and here's a brief profile that I captured of clicking-and-dragging bookmarks folders around in the bookmark sidebar with that setup:
https://share.firefox.dev/3zCDPus

No jank or noticeable delays at all. Hooray!

Happy to call this "fixed" or duped to one of those bugs or whatever makes sense; mak, I'll leave it to you to resolve this however you see fit. Thanks for the quick fixes!

[1] ("Bookmarks Toolbar" having 10 aliases of Bookmarks Menu inside of it, and ~5 aliases each for itself and for Other Bookmarks inside of it; "Bookmarks Menu" having ~10 Bookmarks Toolbar aliases, ~8 Other Bookmarks aliases, and one self-alias; and "Other Bookmarks" having 14 aliases to Bookmarks Menu and 3 aliases to Bookmarks Toolbar)

Flags: needinfo?(dholbert) → needinfo?(mak)

Thank you. I don't see much more to handle in the profile.
As I previously said, while we could somehow avoid creating loops, doing it is not trivial as it would require an always up-to-date memory graph, or paying for late checks. There's no short term plan for that, just some thoughts.
I think I'm ok resolving this for now, and if eventually a better hierarchy handling will appear in the future these issues would be surely handled by it.

Status: NEW → RESOLVED
Closed: 5 months ago
Flags: needinfo?(mak)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: