Closed
Bug 389789
Opened 18 years ago
Closed 17 years ago
nsNavHistory::RemoveDuplicateURIs() should not delete moz_historyvisits, it should remap them to the moz_place id we are keeping
Categories
(Firefox :: Bookmarks & History, defect, P2)
Firefox
Bookmarks & History
Tracking
()
RESOLVED
FIXED
Firefox 3 beta2
People
(Reporter: moco, Assigned: dietrich)
References
()
Details
(Keywords: dataloss)
Attachments
(2 files, 7 obsolete files)
120.00 KB,
application/octet-stream
|
Details | |
5.77 KB,
patch
|
Details | Diff | Splinter Review |
nsNavHistory::RemoveDuplicateURIs() should not delete moz_historyvisits, it should remap them to the moz_place id we are keeping
with (or without) marco's change for bug #386956, there is a problem with nsNavHistory::RemoveDuplicateURIs()
say in fx 2 I visited the following two pages:
http://talkback-public.mozilla.org/search/start.jsp?search=2&type=iid&id=984023'
http://talkback-public.mozilla.org/searchstart.jsp?search=2&type=iid&id=984023%27
after migrating to fx 2, we would only end up with one entry in moz_places but with two visits. but in moz_historyvists, we would only have one visit.
this could be seen as dataloss when the history menu or history sidebar.
Comment 1•18 years ago
|
||
i'd like to try to take this
i have found a problem in my patch for bug #386956, it does delete duplicates but only if we have only 2 dupes, with 3 dupes only one of them is removed, the select takes only 1 id from a list of duplicates... we can however use that "feature" to remap moz_historyvisits entries...
i have an half made patch to fix boh problems, will attach soon, then you can take a decision on that.
my idea is to use the "group by having count" query to get the ids to be retained, remap all dupe visits to that ids, then recount visit_count and remove all other not retained ids, it will however require at least 2 loops.
Status: NEW → ASSIGNED
Updated•18 years ago
|
Assignee: nobody → mak77
Status: ASSIGNED → NEW
Reporter | ||
Comment 2•18 years ago
|
||
I look forward to your patch.
putting this for M8 and blocking fx 3, as it is a dataloss problem (with the vists).
Additionally, having duplicates in moz_places might cause problems. We might want to bump the schema and have some MigrateV7Up() that looks for duplicates and removes them. I've logged a bug on that.
Comment 3•18 years ago
|
||
this is made up of a loop (select statement) with three queries in each cycle
the idea is:
- select place id (that will be retained), duplicated url and total visit count
- remap all historyvisits to that place id
- delete all other duplicated place ids
- update visit count to total visit count
the main disadvantage here (against simply deleting the visits) is that with 10 duplicated urls (each of them can have n duplicates) we end up with 31 queries (one for the loop, 3 for each duplicated url)
i need help in testing this, for functionality and speed
Updated•18 years ago
|
Flags: blocking-firefox3? → blocking-firefox3+
Assignee | ||
Updated•18 years ago
|
Target Milestone: Firefox 3 M8 → Firefox 3 M9
Assignee | ||
Comment 4•17 years ago
|
||
(In reply to comment #3)
> Created an attachment (id=274462) [details]
> first attempt
>
> this is made up of a loop (select statement) with three queries in each cycle
>
> the idea is:
> - select place id (that will be retained), duplicated url and total visit count
> - remap all historyvisits to that place id
> - delete all other duplicated place ids
> - update visit count to total visit count
>
> the main disadvantage here (against simply deleting the visits) is that with 10
> duplicated urls (each of them can have n duplicates) we end up with 31 queries
> (one for the loop, 3 for each duplicated url)
>
> i need help in testing this, for functionality and speed
>
hm, yeah this could be a lengthy procedure. and we'd need to update bookmarks, annotations, etc as well.
Updated•17 years ago
|
Target Milestone: Firefox 3 M9 → Firefox 3 M10
Comment 5•17 years ago
|
||
i've added annos and bookmark remapping.
notice: with 10 duplicates this will still end up in 51 queries
this function atm is only used once in morkimporter
Attachment #274462 -
Attachment is obsolete: true
Assignee | ||
Comment 6•17 years ago
|
||
(In reply to comment #5)
>
> notice: with 10 duplicates this will still end up in 51 queries
>
> this function atm is only used once in morkimporter
>
Ok, that's bearable given that should only occur once. It should also be improved once bug 392193 lands.
Comment 7•17 years ago
|
||
this would be useless however if the index on url would be changed to a unique one, and the INSERTs changed into REPLACEs.
Assignee | ||
Updated•17 years ago
|
Priority: -- → P2
Assignee | ||
Updated•17 years ago
|
Assignee | ||
Comment 8•17 years ago
|
||
(In reply to comment #7)
> this would be useless however if the index on url would be changed to a unique
> one, and the INSERTs changed into REPLACEs.
>
Per your comment in bug 381795 (https://bugzilla.mozilla.org/show_bug.cgi?id=381795#c24), it's still needed to upgrade older databases.
Comment 9•17 years ago
|
||
yep, this should be called before creating the new UNIQUE index on moz_places.url
Assignee | ||
Comment 10•17 years ago
|
||
I needed this to test bug 381795, so cleaned up this patch: removed extra parentheses, and fixed broken statement parameter indexes, and re-ordered parameter indexes where not in numerical order. Other than these things, works fine. Marco, can you take a look at these changes?
Attachment #285718 -
Attachment is obsolete: true
Attachment #288974 -
Flags: review?(mak77)
Assignee | ||
Comment 11•17 years ago
|
||
This is a sqlite file that has duplicate moz_places rows, each with a bookmark, a visit and an annotation. I tested and confirmed that all records are still present and correctly mapped after running w/ this patch applied.
Comment 12•17 years ago
|
||
Comment on attachment 288974 [details] [diff] [review]
updated patch
why this?
+ if (NS_FAILED(rv)) {
+ nsCAutoString e;
+ mDBConn->GetLastErrorString(e);
+ printf("SQL ERROR: %s\n", PromiseFlatCString(e).get());
+ }
probably has been forget there. r+ after that
Attachment #288974 -
Flags: review?(mak77) → review+
Assignee | ||
Updated•17 years ago
|
Attachment #288974 -
Flags: review?(sspitzer)
Reporter | ||
Comment 13•17 years ago
|
||
dietrich, here are some review comments and a couple nits.
1) can you confirm that you tested that everything works as expected if we have three or more copies of a uri (in addition to just two copies)?
2) couldn't we also exclude ?1 from the sub select, because we'll just be re-setting id x to id x?
+ rv = mDBConn->CreateStatement(
+ NS_LITERAL_CSTRING(
+ "UPDATE moz_historyvisits "
+ "SET place_id = ?1 "
+ "WHERE place_id IN (SELECT id from moz_places WHERE url = ?2)"),
+ getter_AddRefs(updateStatement));
NS_ENSURE_SUCCESS(rv, rv);
Something like:
+ "WHERE place_id IN (SELECT id from moz_places WHERE url = ?2 and id <> ?1)"),
That should save us some updates, right?
Is it worth trying to figure out which is the ideal id to use as ?1
It would be the one with the most historyvisits, right?
3) don't we need to worry about moz_items_annos for the deleted places?
4) don't we need to worry about moz_favicons for the deleted places?
5) + nsCString url;
Perhaps use a auto string. If the url is < the default storage size (kDefaultStorageSize, 64 bytes, see http://lxr.mozilla.org/seamonkey/source/xpcom/string/public/nsTString.h#558), we can save a heap allocation.
6)
+ selectStatement->GetUTF8String(1, url)
7)
nit:
+ rv = selectStatement->GetUTF8String(1, url)
+ NS_ENSURE_SUCCESS(rv, rv);
Comment 14•17 years ago
|
||
> couldn't we also exclude ?1 from the sub select, because we'll just be
> re-setting id x to id x?
yes, it's resetting place_id also on the retained one, the saving will be 1 update, not much, probably less than 1ms, but could be done
> Is it worth trying to figure out which is the ideal id to use as ?1
>
> It would be the one with the most historyvisits, right?
atm is sqlite that chooses it with the group by clause, i think that it will retain the last (or first) inserted... some test could be done to see if an order by or a different HAVING clause can force sqlite to choose the desidered id.
Comment 15•17 years ago
|
||
exactly. in group by clause, sqlite chooses the last inserted id
Assignee | ||
Comment 16•17 years ago
|
||
>
> > Is it worth trying to figure out which is the ideal id to use as ?1
> >
> > It would be the one with the most historyvisits, right?
>
> atm is sqlite that chooses it with the group by clause, i think that it will
> retain the last (or first) inserted... some test could be done to see if an
> order by or a different HAVING clause can force sqlite to choose the desidered
> id.
>
confirming marco's comment: in my tests, sqlite chose the last record due to the grouping. however, in our queries such as mDBGetURLPageInfo, the first record in insert order is returned, and has most of the visits, bookmarks, etc. i'll change the select query to return the first record.
Reporter | ||
Comment 17•17 years ago
|
||
> yes, it's resetting place_id also on the retained one, the saving will be 1
> update, not much, probably less than 1ms, but could be done
say I have 100 pairs of duplicates. That's 100 updates vs 200, right?
Comment 18•17 years ago
|
||
> say I have 100 pairs of duplicates. That's 100 updates vs 200, right?
right, even if i don't know what could be the real number of dupes in a real life db
this is a possible way to get the id with MAX(visit_count):
SELECT
(SELECT id FROM moz_places h WHERE h.url=url ORDER BY visit_count DESC LIMIT 1),
url, SUM(visit_count) AS total_visit_count
FROM moz_places
GROUP BY url HAVING( COUNT(url) > 1)
Comment 19•17 years ago
|
||
sorry, typo and useless label
SELECT
(SELECT h.id FROM moz_places h WHERE h.url=url
ORDER BY h.visit_count DESC LIMIT 1),
url, SUM(visit_count)
FROM moz_places
GROUP BY url HAVING( COUNT(url) > 1)
Assignee | ||
Comment 20•17 years ago
|
||
(In reply to comment #13)
> dietrich, here are some review comments and a couple nits.
>
> 1) can you confirm that you tested that everything works as expected if we
> have three or more copies of a uri (in addition to just two copies)?
Adding a new copy of the test db w/ 3 of everything, works fine.
>
> 2) couldn't we also exclude ?1 from the sub select, because we'll just be
> re-setting id x to id x?
...
>
> That should save us some updates, right?
Fixed.
>
> Is it worth trying to figure out which is the ideal id to use as ?1
>
> It would be the one with the most historyvisits, right?
Fixed, thanks for query Marco.
>
> 3) don't we need to worry about moz_items_annos for the deleted places?
No, they map to bookmark id, not place id.
>
> 4) don't we need to worry about moz_favicons for the deleted places?
Well, I guess it's possible for the dupes to have different favicons, but I think it'll be ok to just use the favicon of our selected id to remap everything to. Favicon should get updated properly on next visit anyways.
>
> 5) + nsCString url;
>
> Perhaps use a auto string. If the url is < the default storage size
> (kDefaultStorageSize, 64 bytes, see
> http://lxr.mozilla.org/seamonkey/source/xpcom/string/public/nsTString.h#558),
> we can save a heap allocation.
Fixed
>
> 6)
>
> + selectStatement->GetUTF8String(1, url)
>
> 7)
>
> nit:
>
> + rv = selectStatement->GetUTF8String(1, url)
> + NS_ENSURE_SUCCESS(rv, rv);
>
Fixed
Assignee | ||
Comment 21•17 years ago
|
||
Attachment #288975 -
Attachment is obsolete: true
Assignee | ||
Comment 22•17 years ago
|
||
addresses sspitzer & marco's comments
Attachment #288974 -
Attachment is obsolete: true
Attachment #289095 -
Flags: review?(sspitzer)
Attachment #288974 -
Flags: review?(sspitzer)
Reporter | ||
Comment 23•17 years ago
|
||
+ NS_LITERAL_CSTRING("SELECT "
+ "(SELECT id FROM moz_places h WHERE h.url=url ORDER BY visit_count
DESC LIMIT 1), "
+ "url, SUM(visit_count) AS total_visit_count "
+ "FROM moz_places "
+ "GROUP BY url HAVING( COUNT(url) > 1) ORDER BY id"),
why the order by id?
Assignee | ||
Comment 24•17 years ago
|
||
removed "order by id", was left over from testing a previous query.
Attachment #289095 -
Attachment is obsolete: true
Attachment #289097 -
Flags: review?(sspitzer)
Attachment #289095 -
Flags: review?(sspitzer)
Reporter | ||
Comment 25•17 years ago
|
||
Comment on attachment 289097 [details] [diff] [review]
fix
r=sspitzer, thanks dietrich.
Attachment #289097 -
Flags: review?(sspitzer) → review+
Reporter | ||
Updated•17 years ago
|
Updated•17 years ago
|
Assignee: mak77 → dietrich
Comment 26•17 years ago
|
||
you can compact and uniform the query as in comment #19 avoiding the useless label and putting h. in the right places. don't know if you have mantained the first by intent though
Assignee | ||
Comment 27•17 years ago
|
||
addresses marco's comment #26
Attachment #289097 -
Attachment is obsolete: true
Attachment #289254 -
Flags: review?(sspitzer)
Reporter | ||
Comment 28•17 years ago
|
||
Comment on attachment 289254 [details] [diff] [review]
fix
r=sspitzer, thanks dietrich (and marco)
Attachment #289254 -
Flags: review?(sspitzer) → review+
Comment 29•17 years ago
|
||
+ "url, SUM(visit_count)"
a space is missing here at the end
Assignee | ||
Comment 30•17 years ago
|
||
ugh. thanks marco.
Attachment #289254 -
Attachment is obsolete: true
Assignee | ||
Comment 31•17 years ago
|
||
Checking in toolkit/components/places/src/nsNavHistory.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.cpp,v <-- nsNavHistory.cpp
new revision: 1.193; previous revision: 1.192
done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 32•15 years ago
|
||
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".
In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body contains places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.
Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.
Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in
before you can comment on or make changes to this bug.
Description
•