Closed Bug 913160 Opened 7 years ago Closed 6 years ago

Deleting history caused browser hang due to the changes in bug 894331

Categories

(Toolkit :: Places, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla29
Tracking Status
firefox25 --- unaffected
firefox26 + fixed
firefox27 + fixed
firefox28 + verified
firefox29 + verified
b2g-v1.2 --- fixed

People

(Reporter: speciesx, Assigned: mak)

References

Details

(Keywords: hang, regression)

Attachments

(1 file, 1 obsolete file)

Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:26.0) Gecko/20100101 Firefox/26.0
Cset:77ed46bf4c1a

Steps to Reproduce: Delet 1000+ items at once.
If i delete 1000 history items on my PC with a SSD, it cause 3 min browser hang. This happens only with the Nightly, no hangs on Fx 23.
Keywords: regression
is this a regression in FF26?
Component: History: Global → Places
Product: Core → Toolkit
(In reply to Marco Bonardo [:mak] from comment #1)
> is this a regression in FF26?

Yes.

Here i have deleted 2000 history items.

beta	cest:	e62612cff09f	-	28 sec
Aurora	cset:	b0bc9c628c80	-	30 sec 
win x86	cset:	ab5f29823236	-	9 min
win x64	cset:	ab5f29823236	-	14 min
ok, then I guess may be related to bug 894331, I need to reproduce this locally.
Assignee: nobody → mak77
Blocks: 894331
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Duplicate of this bug: 920421
Marco, do you have an update for us? I have seen the same issue today with Aurora on Linux, when I deleted a lot more entries and Firefox was frozen for about 20 minutes.
Severity: normal → critical
Flags: needinfo?(mak77)
Keywords: hang
OS: Windows 8 → All
Hardware: x86_64 → All
I hope to be able to look at this during the Summit, off-hand I can tell if we don't have a simple fix we will just backout bug 894331 from Aurora, that should be safe to do.
Flags: needinfo?(mak77)
requesting tracking for non-tolerable perf issue
Attached patch patch v1 (-w) (obsolete) — Splinter Review
This is ugly but should solve the problem.
The fact is the new batching behavior starts after a given number of changes, while the previous one was immediately notifying the view and then skipping a given number of changes.
on batch notification the view disables select events, that are very expensive on large trees (as is the history tree), so previously it was doing that immediately, while now it was doing that after the first 10 removals. Summed up with the fact removals are done in chunks that's a lot of additional work.
The patch restores the previous behavior for the specific case of removing multiple history entries and tweaks a little bit the values that were quite conservative.
We should re-evaluate the issue once we have async RemovePlaces though.

PS: I'm going to retest this when I'm back from the Summit, so far I was only able to do some generic testing. So that's why this is only feedback?
Attachment #813998 - Flags: feedback?(mano)
Comment on attachment 813998 [details] [diff] [review]
patch v1 (-w)

So, REMOVE_PAGES_CHUNKLEN is set to 300, meaning that the back-end does know it's going to do quite a lot of work. Could we optimize the back-end side to batch right away when it's explicitly requested to do this amount of work?
(In reply to Mano from comment #9)
> Comment on attachment 813998 [details] [diff] [review]
> patch v1 (-w)
> 
> So, REMOVE_PAGES_CHUNKLEN is set to 300, meaning that the back-end does know
> it's going to do quite a lot of work. Could we optimize the back-end side to
> batch right away when it's explicitly requested to do this amount of work?

how would it do? the result doesn't know about the backend, the backend is notifying onBeginUpdateBatch, but we stopped listening to that.
Could this be related to bug734643?
I'm seeing extended browser hangs when deleting several thousands of entries at once on 24 too.
24 is unaffected, it will always be slow if you remove several thousands of entries, what we will do about this is move part of the process to a separate thread, though the UI has to be updated, the observers may do work (so if you have an add-on observing history changes it will be called for each removal) and so on.
(In reply to Mano from comment #9)
> Comment on attachment 813998 [details] [diff] [review]
> patch v1 (-w)
> 
> So, REMOVE_PAGES_CHUNKLEN is set to 300, meaning that the back-end does know
> it's going to do quite a lot of work. Could we optimize the back-end side to
> batch right away when it's explicitly requested to do this amount of work?

(In reply to Marco Bonardo [:mak] from comment #10)
> (In reply to Mano from comment #9)
> > Comment on attachment 813998 [details] [diff] [review]
> > patch v1 (-w)
> > 
> > So, REMOVE_PAGES_CHUNKLEN is set to 300, meaning that the back-end does know
> > it's going to do quite a lot of work. Could we optimize the back-end side to
> > batch right away when it's explicitly requested to do this amount of work?
> 
> how would it do? the result doesn't know about the backend, the backend is
> notifying onBeginUpdateBatch, but we stopped listening to that.
Flags: needinfo?(mano)
Comment on attachment 813998 [details] [diff] [review]
patch v1 (-w)

After discussing on IRC, we will backout bug 894331 from branches and prepare a different solution for trunk.
Attachment #813998 - Flags: feedback?(mano)
Flags: needinfo?(mano)
Depends on: 937560
I must backout also Bug 902765 and Bug 914294, they were code correctness changes upon bug 894331 changes, so that's fine. Though there's some other change that touched this code, I suspect some "automatic" rewrites like nsTHashTable or nullptr. I have some merges to solve, should be nothing crazy.
I can dedicate some time for this in the next couple of days if you're overload. Just let me know.
Attached patch backout.diffSplinter Review
it was Bug 910989 that decided to reindent our constructors :( btw, 1 row change.
I'm doing a local build to verify everything is as expected, I don't this this needs review since it's a plain backout (the row I changed is uninteresting, just indentation). Though if you want to take a look feel free to.
I will ask for approval after my local testing.
Comment on attachment 830856 [details] [diff] [review]
backout.diff

Provided it isn't going to mc, this looks good.
Attachment #830856 - Flags: feedback+
Attachment #813998 - Attachment is obsolete: true
Comment on attachment 830856 [details] [diff] [review]
backout.diff

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 894331
User impact if declined: browser hangs when deleting lots of history
Testing completed (on m-c, etc.): local
Risk to taking this patch (and alternatives if risky): it's a plain backout, very few things landed after these changes, so it should be relatively safe
String or IDL/UUID changes made by this patch: none
Attachment #830856 - Flags: approval-mozilla-beta?
Attachment #830856 - Flags: approval-mozilla-aurora?
Attachment #830856 - Flags: approval-mozilla-beta?
Attachment #830856 - Flags: approval-mozilla-beta+
Attachment #830856 - Flags: approval-mozilla-aurora?
Attachment #830856 - Flags: approval-mozilla-aurora+
Has this been reproduced in 29?
Flags: needinfo?(alice0775)
I can reproduce the browser hang in Nightly29.0a1 as well as Aurora28.0a2 and Beta27b2 with STR of Bug 933374 Comment#2 .
http://hg.mozilla.org/mozilla-central/rev/5c7fa2bfea8b
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0 ID:20131219030202
http://hg.mozilla.org/releases/mozilla-aurora/rev/809aabadac6d
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 ID:20131219004003
http://hg.mozilla.org/releases/mozilla-beta/rev/5a11c57ceaa7
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 ID:20131216183647
Flags: needinfo?(alice0775)
Any progress on this?

This regression just caused me to have to sit for 20 mins staring at a hung nightly whilst I was hoping to be doing moco work; afraid to kill the process in case I corrupted my places DB.
Flags: needinfo?(mak77)
it's my top priority, I'm sorry for the delay.
Flags: needinfo?(mak77)
Whiteboard: [triage]
Don't know about the internal layout of the DB, but I think adding a binary field (deleted?) to the DB and flipping only this bit for all entries with one (rather) simple query should be quite fast. Then a separate thread can take over and cleanup all entries that have this bit set.

Is there a bug for the followup work? Will you reintroduce the work from Bug 894331, and where (since that bug is not reopened despite a backout)?
the problem is not the database, it's the frontend code + sucky backend API.
Btw, I think I should just backout bug 894331 until we have the solution in bug 937560.
I will do that before the merge, and I should also backout from Aurora.
Comment on attachment 830856 [details] [diff] [review]
backout.diff

I'm reasking for Aurora approval cause I need approval to backout the original patch from Aurora 28, and I will also backout from trunk this time. We will reland the original patch once we have a fix for this regression.
Attachment #830856 - Flags: approval-mozilla-aurora+ → approval-mozilla-aurora?
Attachment #830856 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/11bfcce8f954

I wonder if we should change status-flags from "fixed" to "unaffected", since the backout makes those unaffected...
I will backout on inbound, and them mark the original bug as reopened and this one as fixed
Summary: Deleting history caused browser hang → Deleting history caused browser hang due to the changes in bug 894331
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Merge of trunk backout:
https://hg.mozilla.org/mozilla-central/rev/913739c56e72
Target Milestone: --- → mozilla29
Keywords: verifyme
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (X11; Linux i686; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:28.0) Gecko/20100101 Firefox/28.0

Reproduced with Nightly from 2013-12-19: I got a huge hang (~20 minutes) when deleting over 1000 items.
On latest Aurora (Build ID: 20140131004003) I got the following testing results with the attachment from bug 933374 Comment#2
(places.sqlite):
- on Windows 7 64bit: deleted 1218 items and browser hangs ~2 minutes
- Mac OS 10.8: deleted 1058 items and browser hangs ~2.5 minutes
- Ubuntu: deleted 1113 items and browser hangs ~1 minute

Although it's a big difference between 2 minutes and 20 minutes hang, is it enough to be called fixed?
Flags: needinfo?(mak77)
(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #33)
> Although it's a big difference between 2 minutes and 20 minutes hang, is it
> enough to be called fixed?

Yes, the fix was making things worse, but we were already hanging here. this view is not yet optimized for thousands of removals
Flags: needinfo?(mak77)
Thanks for your quick reply.
Marking as verified on Aurora 28.
Duplicate of this bug: 956683
No longer blocks: fxdesktopbacklog
Whiteboard: [triage]
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:29.0) Gecko/20100101 Firefox/29.0
Mozilla/5.0 (X11; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0

Verified as fixed on Firefox 29 beta 4 (Build ID: 20140331125246):
- on Windows 7 64bit: deleted 1120 items and browser hangs ~2 minutes
- Mac OS 10.6: deleted 1322 items and browser hangs ~3 minutes
- Ubuntu 12.04 LTS 32bit: deleted 1041 items and browser hangs ~1 minute
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.