Closed Bug 327330 Opened 18 years ago Closed 18 years ago

Speed up mork history import

Categories

(Firefox :: Bookmarks & History, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Firefox 2 alpha2

People

(Reporter: bryner, Assigned: bryner)

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

Currently when importing history entries, we call setPageDetails() and then addVisit() to add a visit on the last visit date.  This does more queries than we really need to since in both cases it checks whether the page already exists in history.  We could combine these into a single (non-interface) method on nsNavHistory and reduce the amount of querying.
Priority: -- → P3
Target Milestone: --- → Firefox 2 alpha2
taking and resummarizing
Assignee: brettw → bryner
Summary: Optimize database access when importing history entries → Speed up mork history import
Attached patch many improvements (obsolete) — Splinter Review
The changes here are somewhat extensive.  Here are the highlights:

nsMorkReader:
- Rather than keeping a per-row hashtable of column id to value, I'm now just keeping a nsTArray<nsCString> per row, with the array indexes corresponding to a global column array.  This reduces hashtable operations significantly.

- Atom map _keys_ (not values) now use a 9-byte string buffer built into the key, rather than a heap-allocated string (8 bytes for the string representation of a 32-bit key, plus a byte for the terminating null).  This reduces heap allocations significantly for parsing the atom map.  It does increase the amount of memcpy work when we grow the atom table, however, so one optimization we could do if we need to get more speed is to come up with a heuristic for initially sizing the atom table based on the size of the file.

- Don't MorkUnescape column ids which are atom keys (in practice that's all column ids)

- Consider a colon as a terminator for a row id.  Not doing this was causing us to treat table edits as new rows, probably causing duplicate entries and lost data.

nsNavHistory:
- Added new method AddPageWithVisit, which adds a new entry to history (without checking to see whether it's already there) and optionally adds a visit, doing everything with just 2 statement executions (and the statements cached).

nsMorkHistoryImporter:
- Use AddPageWithVisit, and update for nsMorkReader changes

nsStorageFormHistory:
- Update for nsMorkReader changes
Attachment #213408 - Flags: review?(brettw)
Comment on attachment 213408 [details] [diff] [review]
many improvements

I'm a little uncomfortable with not checking URLs to see if they already exist. However, from my timing experiments we can only check 5000 URLs for existance per second on my powerbook. My history contains 10K items, so this isn't really fast enough. You should be sure the code can not call update for the same item more than once, because we'll get really confused if this happens. Also, we should be sure that this happens before bookmark or default_places import since those will both create history items.
Attachment #213408 - Flags: review?(brettw) → review+
Attached patch better patchSplinter Review
implements the post-import duplicate pruning we talked about
Attachment #213408 - Attachment is obsolete: true
Attachment #213521 - Flags: review?(brettw)
Comment on attachment 213521 [details] [diff] [review]
better patch


>+nsNavHistory::InternalAddNewPage(nsIURI* aURI, const nsAString& aTitle,
>+                                 PRBool aHidden, PRBool aTyped,
>                                  PRInt32 aVisitCount, PRInt64* aPageID)
> {
>   mozStorageStatementScoper scoper(mDBAddNewPage);
>   nsresult rv = BindStatementURI(mDBAddNewPage, 0, aURI);
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>+  // title
>+  rv = mDBAddNewPage->BindStringParameter(1, aTitle);

Check for isNull and use BindNullParameter(1) if so.
Attachment #213521 - Flags: review?(brettw) → review+
checked in, trunk and branch
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Looks like you forgot to check in the nsMorkReader.* changes, and it broke balsa.
Oh, db/morkreader isn't in the tinderbox module...
biesi checked in a fix for balsa:

Index: mozilla/db/morkreader/nsMorkReader.cpp
===================================================================
RCS file: /cvsroot/mozilla/db/morkreader/nsMorkReader.cpp,v
retrieving revision 1.3
retrieving revision 1.4
diff -r1.3 -r1.4
564c564
<   nsAutoPtr< nsTArray<nsCString> > array = new nsTArray<nsCString>(aCount);
---
>   nsAutoPtr< nsTArray<nsCString> > array(new nsTArray<nsCString>(aCount));

Didn't land on the 1.8 branch yet.

(dbaron fixed the tinderbox module in bug 328979)
Nick's fix is now on the branch.
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: