Closed Bug 409723 Opened 17 years ago Closed 16 years ago

when clearing all history, can we avoid calling FindVisits()?

Categories

(Firefox :: Bookmarks & History, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: moco, Assigned: mak)

References

Details

(Keywords: perf, Whiteboard: [patch in 419170])

when clearing all history, can we avoid calling FindVisits()?

after thinking about bug #395299, I wonder if we could make ClearHistory() faster by:

1)  delete everything from moz_historyvisits (or drop and recreate it and the indexes, if that is faster.)
2)  delete all moz_places without corresponding moz_bookmarks
3)  delete all annos and favicons without corresponding moz_places
4)  (at this point, we might be able to vacuum, see bug #395299)

From a quick read through how ClearHistory() works, it seems like ExpireItems() is ideal for partial expiration of a few visits at a time, but it might not be efficient for expiring all of history at once.  

But, I haven't measured with a large amount of history so if ClearHistory() already fast enough then this is wontfix.

dietrich / marco, thoughts?
ExpireItems is very inefficient in case when complete history is deleted, and it could be made faster even for partial history deletion.

Looking into ClearHistory it seems, like point 2 and 3 from the comment #1 are already performed (see all those Expire*Paranoid functions). So the only remaining thing would be replacing call to ExpireItems(0, &keepGoing); with code deleting entire content of the table. 

I have tested both DELETE FROM and DROP and CREATE on table with 730000+ visits, DROP&CREATE seems to be faster, but only by 10-20%. This is not enough to justify using it instead of simple DELETE FROM.

I would suggest trying to use the same logic for partial deletion - this means deleting history visits based on date (instead of getting a list of IDs and deleting the records using IN operator afterwards) and running existing paranoid code afterwards.

The current code generates large SQL statements, in one on my tests the SQL statement to delete records from moz_places was 300kB long and simple removal of duplicate IDs from the statement shrank its size down to 9% of the original. Execution times were 50% better with shorter SQL statements. There are 3 statements with duplicate IDs generated at the moment.
the fastest way could probably be move out history table to another db attached to current connection, and delete the file. This could solve also the problem of other software wanting to clear firefox history without having to add sqlite support.

however the first things to do are what seth suggests.

> I would suggest trying to use the same logic for partial deletion - this means
> deleting history visits based on date 

yeah it would be probably faster but will need deep changes

> The current code generates large SQL statements, in one on my tests the SQL
> statement to delete records from moz_places was 300kB long and simple removal
> of duplicate IDs from the statement shrank its size down to 9% of the 
> original.

so, the first thing should be spun off a bug on queries with duplicate item ids, that should not happen at all, then we could make queries on chunk, for example if we have to delete 200 items we could do 2 queries of 100 chunks, to avoid the query grow too much in size

after that fix (necessary) we could change clearHistory with some number to see actual perf gain, and then evaluate moving to a different approach (date selection). 

the bad about date selection is that we are not using internal sqlite date format, so all dates must be calculated and binded to queries.
(In reply to comment #2)
> however the first things to do are what seth suggests.
> 
> > I would suggest trying to use the same logic for partial deletion - this means
> > deleting history visits based on date 
> 
> yeah it would be probably faster but will need deep changes

I looked at the code and think, that this should bot be that big change.

> > The current code generates large SQL statements, in one on my tests the SQL
> > statement to delete records from moz_places was 300kB long and simple removal
> > of duplicate IDs from the statement shrank its size down to 9% of the 
> > original.
> 
> so, the first thing should be spun off a bug on queries with duplicate item
> ids, that should not happen at all, then we could make queries on chunk, for
> example if we have to delete 200 items we could do 2 queries of 100 chunks, to
> avoid the query grow too much in size

I would suggest giving a try to complete removal of FindVisits. If this does not lead to a good results within a really short time, we should think about optimizing FindVisits.
 
> after that fix (necessary) we could change clearHistory with some number to see
> actual perf gain, and then evaluate moving to a different approach (date
> selection). 
> 
> the bad about date selection is that we are not using internal sqlite date
> format, so all dates must be calculated and binded to queries.

In bug #385245 I am using this to work with sqlite dates:

 WHERE  visit_date >= CAST(strftime('%s',current_date, '-2 days') AS UNSIGNED)*1000000
   AND  visit_date <  CAST(strftime('%s',current_date, '-1 days') AS UNSIGNED)*1000000
i'm spunning-off bug 412438 about having duplicated data into delete strings... 

imho this bug should only change ExpireItems to avoid calling FindVisits when we want to expire everything, and we should fill a new bug on investigating the complete removal of FindVisits, since clearHisotyr does not need to delete in chunks or by date
Blocks: 412758
Keywords: perf
Priority: -- → P3
Flags: blocking-firefox3?
This will not block the final release of Firefox 3.
Flags: wanted-firefox3+
Flags: blocking-firefox3?
Flags: blocking-firefox3-
patch in bug 419170

dropped clearhistory time for 140000 visits from 28s -> 8s

please fill a new bug to investigate the removal of FindVisits for normal expiration workflow
Assignee: nobody → mak77
Whiteboard: [patch in 419170]
Status: NEW → ASSIGNED
Depends on: 419170
Blocks: 423671
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in before you can comment on or make changes to this bug.