Closed Bug 389789 Opened 17 years ago Closed 17 years ago

nsNavHistory::RemoveDuplicateURIs() should not delete moz_historyvisits, it should remap them to the moz_place id we are keeping

Categories

(Firefox :: Bookmarks & History, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Firefox 3 beta2

People

(Reporter: moco, Assigned: dietrich)

References

()

Details

(Keywords: dataloss)

Attachments

(2 files, 7 obsolete files)

nsNavHistory::RemoveDuplicateURIs() should not delete moz_historyvisits, it should remap them to the moz_place id we are keeping

with (or without) marco's change for bug #386956, there is a problem with nsNavHistory::RemoveDuplicateURIs()

say in fx 2 I visited the following two pages:

http://talkback-public.mozilla.org/search/start.jsp?search=2&type=iid&id=984023'
http://talkback-public.mozilla.org/searchstart.jsp?search=2&type=iid&id=984023%27

after migrating to fx 2, we would only end up with one entry in moz_places but with two visits.   but in moz_historyvists, we would only have one visit.

this could be seen as dataloss when the history menu or history sidebar.
i'd like to try to take this

i have found a problem in my patch for bug #386956, it does delete duplicates but only if we have only 2 dupes, with 3 dupes only one of them is removed, the select  takes only 1 id from a list of duplicates... we can however use that "feature" to remap moz_historyvisits entries...

i have an half made patch to fix boh problems, will attach soon, then you can take a decision on that.

my idea is to use the "group by having count" query to get the ids to be retained, remap all dupe visits to that ids, then recount visit_count and remove all other not retained ids, it will however require at least 2 loops.
Status: NEW → ASSIGNED
Assignee: nobody → mak77
Status: ASSIGNED → NEW
I look forward to your patch.

putting this for M8 and blocking fx 3, as it is a dataloss problem (with the vists).  

Additionally, having duplicates in moz_places might cause problems.  We might want to bump the schema and have some MigrateV7Up() that looks for duplicates and removes them.  I've logged a bug on that.
Flags: blocking-firefox3?
Keywords: dataloss
Target Milestone: --- → Firefox 3 M8
Attached patch first attempt (obsolete) — Splinter Review
this is made up of a loop (select statement) with three queries in each cycle

the idea is:
- select place id (that will be retained), duplicated url and total visit count
- remap all historyvisits to that place id
- delete all other duplicated place ids
- update visit count to total visit count

the main disadvantage here (against simply deleting the visits) is that with 10 duplicated urls (each of them can have n duplicates) we end up with 31 queries (one for the loop, 3 for each duplicated url)

i need help in testing this, for functionality and speed
Flags: blocking-firefox3? → blocking-firefox3+
Target Milestone: Firefox 3 M8 → Firefox 3 M9
(In reply to comment #3)
> Created an attachment (id=274462) [details]
> first attempt
> 
> this is made up of a loop (select statement) with three queries in each cycle
> 
> the idea is:
> - select place id (that will be retained), duplicated url and total visit count
> - remap all historyvisits to that place id
> - delete all other duplicated place ids
> - update visit count to total visit count
> 
> the main disadvantage here (against simply deleting the visits) is that with 10
> duplicated urls (each of them can have n duplicates) we end up with 31 queries
> (one for the loop, 3 for each duplicated url)
> 
> i need help in testing this, for functionality and speed
> 

hm, yeah this could be a lengthy procedure. and we'd need to update bookmarks, annotations, etc as well.
Target Milestone: Firefox 3 M9 → Firefox 3 M10
i've added annos and bookmark remapping.

notice: with 10 duplicates this will still end up in 51 queries 

this function atm is only used once in morkimporter
Attachment #274462 - Attachment is obsolete: true
(In reply to comment #5)
> 
> notice: with 10 duplicates this will still end up in 51 queries 
> 
> this function atm is only used once in morkimporter
> 

Ok, that's bearable given that should only occur once. It should also be improved once bug 392193 lands.
this would be useless however if the index on url would be changed to a unique one, and the INSERTs changed into REPLACEs.
Priority: -- → P2
(In reply to comment #7)
> this would be useless however if the index on url would be changed to a unique
> one, and the INSERTs changed into REPLACEs.
> 

Per your comment in bug 381795 (https://bugzilla.mozilla.org/show_bug.cgi?id=381795#c24), it's still needed to upgrade older databases.
yep, this should be called before creating the new UNIQUE index on moz_places.url
Attached patch updated patch (obsolete) — Splinter Review
I needed this to test bug 381795, so cleaned up this patch: removed extra parentheses, and fixed broken statement parameter indexes, and re-ordered parameter indexes where not in numerical order. Other than these things, works fine. Marco, can you take a look at these changes?
Attachment #285718 - Attachment is obsolete: true
Attachment #288974 - Flags: review?(mak77)
Attached file test database (obsolete) —
This is a sqlite file that has duplicate moz_places rows, each with a bookmark, a visit and an annotation. I tested and confirmed that all records are still present and correctly mapped after running w/ this patch applied.
Blocks: 381795
Comment on attachment 288974 [details] [diff] [review]
updated patch

why this?
+    if (NS_FAILED(rv)) {
+      nsCAutoString e;
+      mDBConn->GetLastErrorString(e);
+      printf("SQL ERROR: %s\n", PromiseFlatCString(e).get());
+    }

probably has been forget there. r+ after that
Attachment #288974 - Flags: review?(mak77) → review+
Attachment #288974 - Flags: review?(sspitzer)
dietrich, here are some review comments and a couple nits.

1)  can you confirm that you tested that everything works as expected if we have three or more copies of a uri (in addition to just two copies)?

2) couldn't we also exclude ?1 from the sub select, because we'll just be re-setting id x to id x?

+  rv = mDBConn->CreateStatement(
+      NS_LITERAL_CSTRING(
+        "UPDATE moz_historyvisits "
+        "SET place_id = ?1 "
+        "WHERE place_id IN (SELECT id from moz_places WHERE url = ?2)"),
+      getter_AddRefs(updateStatement));
   NS_ENSURE_SUCCESS(rv, rv);

Something like:

+        "WHERE place_id IN (SELECT id from moz_places WHERE url = ?2 and id <> ?1)"),

That should save us some updates, right?  

Is it worth trying to figure out which is the ideal id to use as ?1

It would be the one with the most historyvisits, right?

3)  don't we need to worry about moz_items_annos for the deleted places?

4)  don't we need to worry about moz_favicons for the deleted places?

5)   +    nsCString url;

Perhaps use a auto string.  If the url is < the default storage size (kDefaultStorageSize, 64 bytes, see http://lxr.mozilla.org/seamonkey/source/xpcom/string/public/nsTString.h#558), we can save a heap allocation.

6)

+    selectStatement->GetUTF8String(1, url)

7)

nit:

+   rv = selectStatement->GetUTF8String(1, url)
+  NS_ENSURE_SUCCESS(rv, rv);
> couldn't we also exclude ?1 from the sub select, because we'll just be
> re-setting id x to id x?

yes, it's resetting place_id also on the retained one, the saving will be 1 update, not much, probably less than 1ms, but could be done

> Is it worth trying to figure out which is the ideal id to use as ?1
> 
> It would be the one with the most historyvisits, right?

atm is sqlite that chooses it with the group by clause, i think that it will retain the last (or first) inserted... some test could be done to see if an order by or a different HAVING clause can force sqlite to choose the desidered id.
exactly. in group by clause, sqlite chooses the last inserted id

> 
> > Is it worth trying to figure out which is the ideal id to use as ?1
> > 
> > It would be the one with the most historyvisits, right?
> 
> atm is sqlite that chooses it with the group by clause, i think that it will
> retain the last (or first) inserted... some test could be done to see if an
> order by or a different HAVING clause can force sqlite to choose the desidered
> id.
> 

confirming marco's comment: in my tests, sqlite chose the last record due to the grouping. however, in our queries such as mDBGetURLPageInfo, the first record in insert order is returned, and has most of the visits, bookmarks, etc. i'll change the select query to return the first record.
> yes, it's resetting place_id also on the retained one, the saving will be 1
> update, not much, probably less than 1ms, but could be done

say I have 100 pairs of duplicates.  That's 100 updates vs 200, right?
> say I have 100 pairs of duplicates.  That's 100 updates vs 200, right?

right, even if i don't know what could be the real number of dupes in a real life db

this is a possible way to get the id with MAX(visit_count):

SELECT
(SELECT id FROM moz_places h WHERE h.url=url ORDER BY visit_count DESC LIMIT 1),
url, SUM(visit_count) AS total_visit_count
FROM moz_places
GROUP BY url HAVING( COUNT(url) > 1)


sorry, typo and useless label

SELECT
(SELECT h.id FROM moz_places h WHERE h.url=url 
  ORDER BY h.visit_count DESC LIMIT 1),
url, SUM(visit_count)
FROM moz_places
GROUP BY url HAVING( COUNT(url) > 1)
(In reply to comment #13)
> dietrich, here are some review comments and a couple nits.
> 
> 1)  can you confirm that you tested that everything works as expected if we
> have three or more copies of a uri (in addition to just two copies)?

Adding a new copy of the test db w/ 3 of everything, works fine.

> 
> 2) couldn't we also exclude ?1 from the sub select, because we'll just be
> re-setting id x to id x?
...
> 
> That should save us some updates, right?  

Fixed.

> 
> Is it worth trying to figure out which is the ideal id to use as ?1
> 
> It would be the one with the most historyvisits, right?

Fixed, thanks for query Marco.

> 
> 3)  don't we need to worry about moz_items_annos for the deleted places?

No, they map to bookmark id, not place id.

> 
> 4)  don't we need to worry about moz_favicons for the deleted places?

Well, I guess it's possible for the dupes to have different favicons, but I think it'll be ok to just use the favicon of our selected id to remap everything to. Favicon should get updated properly on next visit anyways.

> 
> 5)   +    nsCString url;
> 
> Perhaps use a auto string.  If the url is < the default storage size
> (kDefaultStorageSize, 64 bytes, see
> http://lxr.mozilla.org/seamonkey/source/xpcom/string/public/nsTString.h#558),
> we can save a heap allocation.

Fixed

> 
> 6)
> 
> +    selectStatement->GetUTF8String(1, url)
> 
> 7)
> 
> nit:
> 
> +   rv = selectStatement->GetUTF8String(1, url)
> +  NS_ENSURE_SUCCESS(rv, rv);
> 

Fixed
Attached patch fix (obsolete) — Splinter Review
addresses sspitzer & marco's comments
Attachment #288974 - Attachment is obsolete: true
Attachment #289095 - Flags: review?(sspitzer)
Attachment #288974 - Flags: review?(sspitzer)
+      NS_LITERAL_CSTRING("SELECT "
+        "(SELECT id FROM moz_places h WHERE h.url=url ORDER BY visit_count
DESC LIMIT 1), "
+        "url, SUM(visit_count) AS total_visit_count "
+        "FROM moz_places "
+        "GROUP BY url HAVING( COUNT(url) > 1) ORDER BY id"),

why the order by id?
Attached patch fix (obsolete) — Splinter Review
removed "order by id", was left over from testing a previous query.
Attachment #289095 - Attachment is obsolete: true
Attachment #289097 - Flags: review?(sspitzer)
Attachment #289095 - Flags: review?(sspitzer)
Comment on attachment 289097 [details] [diff] [review]
fix

r=sspitzer, thanks dietrich.
Attachment #289097 - Flags: review?(sspitzer) → review+
Assignee: mak77 → dietrich
you can compact and uniform the query as in comment #19 avoiding the useless label and putting h. in the right places. don't know if you have mantained the first by intent though
Attached patch fix (obsolete) — Splinter Review
addresses marco's comment #26
Attachment #289097 - Attachment is obsolete: true
Attachment #289254 - Flags: review?(sspitzer)
Comment on attachment 289254 [details] [diff] [review]
fix

r=sspitzer, thanks dietrich (and marco)
Attachment #289254 - Flags: review?(sspitzer) → review+
+        "url, SUM(visit_count)"

a space is missing here at the end
Attached patch typo fixedSplinter Review
ugh. thanks marco.
Attachment #289254 - Attachment is obsolete: true
Checking in toolkit/components/places/src/nsNavHistory.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.cpp,v  <--  nsNavHistory.cpp
new revision: 1.193; previous revision: 1.192
done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: