Open Bug 1824872 Opened 2 years ago Updated 8 months ago

Firefox takes much longer to complete a history-deletion operation, if the user presses and holds `Del` key (instead of pressing once), with lots of history entries selected

Categories

(Toolkit :: Places, defect, P3)

defect

Tracking

()

Performance Impact low

People

(Reporter: dholbert, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: perf, Whiteboard: [snt-scrubbed][places-performance])

STR:

  1. Copy a giant places.sqlite into a fresh profile (so as not to corrupt your regular profile). I'm using my regular browsing profile's places.sqlite file.
  2. Start Firefox using that fresh profile.
  3. History | Show All HIstory
  4. Click "Older than 6 months"
  5. Select-all in the right window. For me, this shows "27000 results" or so.
  6. Now press-and-hold your delete key on your keyboard (or alternately: press it rapidly & repeatedly). Keep doing this, for tens of seconds. Watch how Firefox performs, watch its memory-usage in "top".

ACTUAL RESULTS:

  • Firefox consumes ever-increasing amounts of memory while you have Del pressed.
  • If you hold Del for long enough, Firefox crashes with OOM. ("long enough" depends on how much memory you have). When you come back from OOM, you'll find that no history has actually been deleted.
  • Or: if you don't wait that long and release Del sooner (e.g. let's say you just hold it for 10-20 of seconds), Firefox still eats up a full core of CPU and consumes a bunch of memory, and takes a while to spin back down and complete the delete-history operation (as compared to how long it takes to complete if you just do a single quick Del keypress).

EXPECTED RESULTS:

  • Firefox should complete the operation relatively quickly, roughly the same amount of time it would take if you had simply pressed and released Del. For me, that's a few seconds.

Here's a profile of some time when Firefox was spinning its wheels in response to the STR (after I've released Del): https://share.firefox.dev/3FUvUc5

And here are two crash reports from me, showing what the OOM outcome looks like here:
bp-135225da-8f77-4d7a-bc7d-1441a0230327
bp-79846f25-f2cf-45f5-95ae-fb2db0230327

Summary: Firefox OOM's or crawls to a halt, in response to the user pressing and holding `Del` key with lots of history entries are selected → Firefox OOM's or crawls to a halt, in response to the user pressing and holding `Del` key with lots of history entries selected

Quick guess at a diagnosis: it looks like we're queuing/dispatching some sort of do-the-removal task when the user presses Del, and we do this redundantly for each Del keypress that we receive, and each one maybe has some sort of persistent hold on the history entries (?), so if the user sends lots of Del keypresses, we end up stuck in a death-spiral servicing all of those requests and essentially doing nothing.

If the "hold-and-press Del" scenario seems too extreme: for what it's worth, a similar issue happens if I just press Del a few times in a row (as in "muahaha, delete with extreme prejudice") with all of my 6months-and-older history entries selected. The history-deletion operation takes a little while, during which time Firefox isn't visibly doing anything; this then may prompt the user to hit Del a few more times, which unfortunately makes things worse. I just did a variant of that and managed to lock Firefox up pretty badly for at least several minutes (I'm waiting to see if it recovers).

We're not debouncing these operations, so it's a lot less efficient than the "remove all these selected entries" case.
Each removal, does the operation at the db level then sends a single notification about the removal, that the view (and all the other observers) have to handle. And the view tries to do some very smart things like setting the selection in a meaningful way after a removal, updating commands, and it does for each.
Selecting a lot of entries and removing them will do something similar but we send an array of notifications and then the observers can decide to take shortcuts (just requery instead of trying to do each single removal).

There's a lot that could be done here, but nothing particularly trivial:

  1. debouncing events would probably be good, though there's a couple risks: a) the current API and tests were designed considering notification have been handled once the API returns, we'd need to ensure serialization (fifo) and fix a ton of tests, b) there's risk of mishandling different operations as a single one, so probably debouncing should only group related notifications (multiple page removals for example).
  2. there's quite a few observers that are acting badly, https://bugzilla.mozilla.org/show_bug.cgi?id=1533061 contains a few good examples where the observer receives the notification and decides to do some very expensive work or even main-thread IO
  3. the views are not really optimized for batch operations yet
  4. likely there's many low hanging fruits we could do in the meanwhile, that requires someone to sit down in front of the profile and identify them
Severity: -- → S3
Performance Impact: --- → ?
Keywords: perf
Priority: -- → P3

Sadly the triage calculator isn't really well constructed for this type of direct interaction with the browser UI. However given the rather unique set of circumstances needed to reproduce this issue I'm going to set the impact as low.

Performance Impact: ? → low

(In reply to Marco Bonardo [:mak] from comment #2)

We're not debouncing these operations, so it's a lot less efficient than the "remove all these selected entries" case.

Just to clarify, in case it wasn't obvious: the Steps To Reproduce here do actually involve a "remove all these selected entries" operation. (note the "select-all" in the STR). The good vs. bad scenarios are just "press and release Del" [good] vs. "press and hold Del" [bad]; but in both cases, the setup involves having all the entries selected.

So: if there exists an efficient "Remove all these selected entries" codepath, then it seems like my STR should be hitting it (perhaps hitting it repeatedly/redundantly), right? Though maybe I'm misunderstanding you.

(Just mentioning since I can imagine the quite-different STR of "select the first entry, and then press and hold Del, to iteratively remove each entry one at a time" being a substantially different & less-efficient operation. It sounded like maybe that's how you were interpreting this bug's setup. Maybe that is what happens under the hood, even; but I'm not clear if that's expected or not.)

Flags: needinfo?(mak)

I'm sorry for any miscomprehension, I understand the issue is mostly with pressing DEL and holding it down, rather than selecting a set of entries and then pressing DEL just once, that is a most common operation. The latter case seems to be slow and expensive but acceptable, considered it has to remove 27k entries, it could be further optimized of couse.

I think we're not handling the "hold down a key" part at all, and we issue multiple conflicting delete operations, that just end up wasting resouces. I can also see how a person with accessibility concerns may hold down the button longer causing the issue.

We use normal xul commands dispatching, I wonder if there's a way to ignore/recognize commands generated by a KeyboardEvent with repeat (maybe we could check commandEvent.sourceEvent.repeat?). I must experiment with that.

From a first quick investigation, while it's not possible to get the sourceEvent on basic commands (without touching the event manager code), it would still be possible to intercept keydown. Now the fact is that won't fully solve the problem, because instead of holding down DEL, one could just press it furiously, and it would likely cause the same issue. Checking for .repeat then wouldn't help.

The issue in itself, afaict, is due to the transactions manager, on every bookmark removal we generate a transaction in memory that will allow us to undo the change at wish. That means we fetch all the bookmark info and store it in memory. While we're collecting the data, the view will remain the same, the user repeats the operation (holding down or pressing DEL again), and we start again collecting the info. This can easily bring to oom with enough entries.

There's some alternatives to the current transactions system, though they may introduce other risks:

  1. a recycle bin system. May be expensive to implement and less privacy forgiving (user thinks stuff is gone, but it's not)
  2. a memory journal table. Would be more efficient to just copy over from table to table, but it may not solve the memory problem. I could imagine compressing the memory table using Snappy or similar though.

Short term, we should probably skip any remove operation until the previous one is complete. This works until the notifications are sent by Places before the end of the operation, if in the future we should delay or reorder the notifications, it will be trickier to identify an effective end of the operation. The alternative would be to store the selection status and skip operations until the selection has completely changed, but that seems hack-ish.

Flags: needinfo?(mak)
Whiteboard: [snt-scrubbed][places-performance]
Blocks: 1829296
Blocks: 1695129
See Also: → 1876957

Bug 1876957 should have improved the situation, as now we should not handle anymore duplicate removal operations.

The memory-usage-climb does seem to have been mitigated!

I do still see the duration going up by some nonlinear factor if you hold Del down, though. Using today's Nightly, I tried deleting all of my January 2024 history (in a fresh profile with a copy of my places.sqlite), which is about 9000 history-entries.

Pressing Del once:
https://share.firefox.dev/3wfPc9H
Deletion takes about 15-16s.

vs. holding Del down continuously the whole time:
https://share.firefox.dev/42GsC6h
Deletion takes ~1min.

Having said that, the "holding down Del continuously" scenario has gotten much better in the past few days, likely due to the bug you linked. Here's a profile of me doing the same thing as the last profile from comment 8, but now in slightly-outdated Nightly 2024-02-10 (before the improvement that Mak referenced):
https://share.firefox.dev/3UIdKCx

That shows us being pegged for >5min (which is when I gave up). I held Del down continuously for several minutes, and then released the key and let another minute or two go by, and then captured the profile. It's now been another 5min or so, and the delete operation(s) still haven't yet completed (>10min after I pressed Del).

So: while comment 8 shows that holding Del does still slow things down (from 15s to 1min), it's much faster than it was in previous Nightlies (>10min, maybe ~forever).

I'm happy to consider this fixed by bug 1876957. Mak, would you prefer to keep this open for comment 8? Or should I file a new bug on that?

(The real-world use-cases I was worried about, when originally filing this, were from users pressing Del just a few extra times -- either due to holding the key down for a few seconds, or due to getting impatient when waiting for the UI to update and hitting Del again. If I do both of those things with the process from comment 8, my deletion operation takes about 30s -- somewhere between the single-keypress and holding-down-continuously durations, and on the shorter side of that range. So that's ~good. I don't think we need to optimize super-much for the "user holding down del continuously, forever" use-case, as long as the operation does eventually complete in a ~reasonable amount of time, which we're in a much better place on now.)

Flags: needinfo?(mak)

We can keep this open as a thing to investigate further in the future. There are likely more low hanging fruits to be collected, and the profiles will be useful.
Thank you for confirming the situation improved as we expected.

Flags: needinfo?(mak)

Sounds good. I've updated the bug summary so that it's no longer focusing on the parts that already have been addressed.

Summary: Firefox OOM's or crawls to a halt, in response to the user pressing and holding `Del` key with lots of history entries selected → Firefox takes much longer to complete a history-deletion operation, if the user presses and holds `Del` key (instead of pressing once), with lots of history entries selected
You need to log in before you can comment on or make changes to this bug.