Closed Bug 351351 Opened 18 years ago Closed 14 years ago

Change default history format from mork to sqlite

Categories

(Camino Graveyard :: History, defect)

All
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Camino2.1

People

(Reporter: hwaara, Assigned: chris)

References

Details

Attachments

(5 files, 5 obsolete files)

64.46 KB, text/plain
Details
204.59 KB, patch
nick.kreeger
: review+
Details | Diff | Splinter Review
106.18 KB, patch
Details | Diff | Splinter Review
66.50 KB, patch
stuart.morgan+bugzilla
: superreview+
Details | Diff | Splinter Review
34.45 KB, patch
stuart.morgan+bugzilla
: superreview+
Details | Diff | Splinter Review
Every time I start Camino, it hangs for a few seconds. I always suspected this is due to my large history (I've set it to remember the last 365 days).

So, today I profiled Camino with Shark. Lo and behold, mork is eating 66% (!) of the startup time for me. nsSimpleGlobalHistory is the caller.

I see a few improvements that could be made:

1) Display some kind of progress while loading history. I don't know if we can safely put the mork code on a thread, though...

2) Change our format to sqlite, like Firefox, and use the morkReader app to convert previous users.

Core already did a lot of hard work for us. There are the mozIStorage interfaces for talking to SQLite, and there's a small morkReader app that will take care of the conversion. 

I think this could greatly improve performance, and allow for more autocomplete/history features.
Note that this profile does not apply for new users, or users without a history.

I also profiled opening up Camino the first time (without a profile) and the second time, and obviously the history didn't show up there.
We'd have to ship it on OS's earlier than Tiger, and are we sure that would really solve our perf problems? 
(In reply to comment #3)
> We'd have to ship it on OS's earlier than Tiger, and are we sure that would
> really solve our perf problems? 

I actually think we already ship & link mozStorage, due to the DOM code requiring it.

Wasn't the whole reason for having mozStorage/SQLite as a replacement for mork that it was much faster? IIRC, Vlad did some investigation before the Places work started.
It can be faster, but can also be slower depending on how it's used.  It's far more featureful, and future new-world bookmarks/history backend stuff will almost certainly use it.
I think Camino would be doing good to switch to the places history backend that is sqlite-based, directly replaces the mork-based one in Gecko without any additional work, and is included with the Mozilla toolkit. Along with (or just after) a xpfe-to-toolkit switch, it should probably be easy to turn on, as we are seeing with SeaMonkey (our old UI that bases on XUL templates is the major roadblocker there, but you are probably doing stuff differently for the UI anyways).
(In reply to comment #3)
> We'd have to ship it on OS's earlier than Tiger, and are we sure that would
> really solve our perf problems? 

Since this is only about trunk and trunk requires 10.4 now, this isn't a worry any more.
Hardware: Macintosh → All
Now that Firefox3 is saving history using SQLite (Bug 242207) it may be time to
close this bug.
(In reply to comment #8)
> Now that Firefox3 is saving history using SQLite (Bug 242207) it may be time to
> close this bug.

This is a Camino bug, not a Firefox bug.
Attached patch Initial work (obsolete) — Splinter Review
Here is my first go with moving Camino's history backend to Places. It includes the necessary additions and changes to get Places to build and be linked against. I have only got it working with a srcdir build, but hopefully it will be relatively straightforward to get it working with the other build styles.

There are no visible changes to Camino, other than maybe a *slight* speedup when dealing with large history files. Places automatically migrates the mork db over when it first finds it.

The patch is probably reviewable in its current state, although I may find improvements that can be made over the next few days. It's pretty usable in the mean time, and will be useful for 1.9.1 build work that needs to be done.
Assignee: nobody → trendyhendy2000
Status: NEW → ASSIGNED
Forgot to say: the following files in camino/src/history are no longer necessary:

nsIHistoryDisplay.h
nsSimpleGlobalHistory.h
nsIHistoryItems.h
nsIHistoryObserver.h
nsSimpleGlobalHistory.cpp
Project patch also needs to remove the Mork libs (and any irrelevant associated search paths, etc.).
Just adding some more notes-to-self here, for fixing this patch up for 1.9.0 based on the 1.9.1 hacking:

There are also libplaces.dylibs going into Resources, the History prefPane needs some fixes for the static build[1] (aren't static builds fun?), and there may be more static build fixes needed once we have a static build that launches.

http://pastebin.mozilla.org/701504
(In reply to comment #13)
the History prefPane
> needs some fixes for the static build[1] (aren't static builds fun?), and there
> may be more static build fixes needed once we have a static build that
> launches.

History.prefPane's Deployment target needs to have ../dist/include/places in the Header Search paths, CmStaticApp needs to link against ../staticlib/components/libplaces.a, and nsPlacesModule needs to go into nsStaticComponents.

I'll see about respinning a complete 1.9.0 patch later on.
Attached patch v1.1 (obsolete) — Splinter Review
This patch contains the changes for comment 11-comment 14.  (I haven't yet rebuilt my 1.9.0 tree with it to sanity-check the changes, but they should be OK.)

To be perfectly honest, if hendy hasn't found any improvements or fixes to make yet, I'd prefer we start to get this into the review process and landed on 1.9.0, because no matter what we do, we want this change, it's large and needs to be in for a while before a1, and juggling both it and Dan's (also large) perf fix outside the tree at the same time has the potential for unhappiness.
Attachment #424924 - Attachment is obsolete: true
(In reply to comment #15)
> Created an attachment (id=426044) [details]
> v1.1

patch is malformed:

patch unexpectedly ends in middle of line
patch: **** malformed patch at line 4282:
For some reason diff kept dying in nsSimpleGlobalHistory.cpp (I replicated it a couple of times trying to create one that would be whole), but I finally coerced a complete one.

This is the same as 1.1, except not malformed, and with a new method on the observer in HistoryDataSource from hendy: "This method is required for nsNavHistoryObservers in 1.9.2 and later. It doesn't hurt to have it in there for earlier Geckos, as we don't do anything with it anyway."
Attachment #426044 - Attachment is obsolete: true
Attachment #426158 - Flags: review?(nick.kreeger)
Attached patch v1.2, part 1: code changes only (obsolete) — Splinter Review
For ease of reviewing, here's v1.2 divided into three chunks: 

1) the changes to Camino code files
2) the changes and file additions for build-config (project, xcconfig, Makefiles, etc.)
3) the file removals for our nsSimpleGlobalHistory impl
Comment on attachment 426158 [details] [diff] [review]
v1.2, with the whole diff, and a new method on the observer


Phew, sorry for the delay on this - the past couple of weeks have been crazy since it's release time around Songbird.

This patch looks mostly good, I'd prefer if you re-worked the gecko calls below to the format that I outlined for each block.

r+ w/ the below changes:


+# Portions created by the Initial Developer are Copyright (C) 2007

Might be worth updating the dates on these new files that have licenses.

+  NSMutableDictionary*    mHistoryItemsDictionary;    // history items indexed by url

Would be nice to know that this is strong-ref in the header.

+  NS_IMETHOD OnVisit(nsIURI *aURI, PRInt64 aVisitID, PRTime aTime, PRInt64 aSessionID,
+                     PRInt64 aReferringID, PRUint32 aTransitionType, PRUint32 *aAdded)

Please NS_ENSURE_ARG_POINTER() all the pointer in-params here please.

+        nsCOMPtr<nsINavHistoryService> histSvc = do_GetService("@mozilla.org/browser/nav-history-service;1");
+        if (!histSvc)
+          return NS_OK;

pass in a reference to a nsresult and NS_ENSURE_SUCCESS() the call to the component manager:
  
  nsresult rv;
  nsCOMPtr<nsINavHistoryService> histSvc =
    do_GetService("@mozilla.org/browser/nav-history-service;1", &rv);
  NS_ENSURE_SUCCESS(rv, rv);

+        nsCOMPtr<nsINavHistoryQuery> query;
+        histSvc->GetNewQuery(getter_AddRefs(query));
+        if (!query)
+          return NS_OK;

About the same here:
  nsCOMPtr<nsINavHistoryQuery> query;
  rv = histSvc->GetNewQuery(getter_AddRefs(query), &rv);
  NS_ENSURE_SUCCESS(rv, rv);

+        PRBool queryParametersSet = NS_SUCCEEDED(query->SetUri(aURI));
+        queryParametersSet &= NS_SUCCEEDED(query->SetMinVisits(1));
+        queryParametersSet &= NS_SUCCEEDED(query->SetMaxVisits(1));
+        if (!queryParametersSet)
+          return NS_OK;

This works, but it will be easier to track down problems if written like:
  rv = query->SetUri(aURI);
  NS_ENSURE_SUCCESS(rv, rv);

  rv = query->SetMinVisits(1);
  NS_ENSURE_SUCCESS(rv, rv);

  rv = query->SetMaxVisits(1);
  NS_ENSURE_SUCCESS(rv, rv);

+        nsCOMPtr<nsINavHistoryQueryOptions> options;
+        histSvc->GetNewQueryOptions(getter_AddRefs(options));
+        if (!options)
+          return NS_OK;

  rv = histSvc->GetNewQueryOptions(getter_AddRefs(options));
  NS_ENSURE_SUCCESS(rv, rv);

+        nsCOMPtr<nsINavHistoryResult> result;
+        histSvc->ExecuteQuery(query, options, getter_AddRefs(result));
+        if (!result)
+          return NS_OK;

  rv = histSvc->ExecuteQuery(query, options, getter_AddRefs(result));
  NS_ENSURE_SUCCESS(rv, rv);

+        nsCOMPtr<nsINavHistoryContainerResultNode> rootNode;
+        result->GetRoot(getter_AddRefs(rootNode));
+        if (!rootNode)
+          return NS_OK;

  rv = result->GetRoot(getter_AddRefs(rootNode));
  NS_ENSURE_SUCCESS(rv, rv);
  NS_ENSURE_TRUE(rootNode, NS_ERROR_UNEXPECTED);  // only if needed.

+        rootNode->SetContainerOpen(PR_TRUE);

  rv = rootNode->SetContainerOpen(PR_TRUE);
  NS_ENSURE_SUCCESS(rv, rv);

+        PRUint32 childCount = 0;
+        rootNode->GetChildCount(&childCount);
+        if (childCount != 1)
+          return NS_OK;

  rv = rootNode->GetChildCount(&childCount);
  NS_ENSURE_SUCCESS(rv, rv);
  NS_ENSURE_TRUE(childCount != 1, NS_ERROR_UNEXPECTED);  // or desired error result

+
+        nsCOMPtr<nsINavHistoryResultNode> resultNode;
+        rootNode->GetChild(0, getter_AddRefs(resultNode));
+        if(!resultNode)
+          return NS_OK;

  rv = rootNode->GetChild(0, getter_AddRefs(resultNode));
  NS_ENSURE_SUCCESS(rv, rv);
  NS_ENSURE_TRUE(resultNode, NS_ERROR_UNEXPECTED);  // if needed again


NOTE: NS_ENSURE_* Logging helps run down problems in debug builds, this is why it's the prefered why of catching exceptions from inside of gecko code. From working on core gecko code for the past 2 years (mostly C code too) I've found this is one of the best ways of tracking down gecko exceptions.

+  NS_IMETHOD OnTitleChanged(nsIURI *aURI, const nsAString & aPageTitle)
   {

Again NS_ENSURE_ARG_POINTER() |aURI|.

+    HistorySiteItem* item = HistoryItemFromURI(aURI);
+    if (!item)
+      return NS_OK;

I prefer: NS_ENSURE_TRUE(item, NS_ERROR_UNEXPECTED); // or NS_OK if needed.

+  NS_IMETHOD OnDeleteURI(nsIURI *aURI)

Again, NS_ENSURE_ARG_POINTER(aURI);

-  HistoryDataSource*    mDataSource;
+  HistoryDataSource* mDataSource;
 };
 
Please add a comment specifying that this is a |weak| reference on |mDataSource|.

+  nsCOMPtr<nsINavHistoryQuery> query;
+  mNavHistoryService->GetNewQuery(getter_AddRefs(query));
+  if (!query)
+    return;

Another way to work w/ gecko API's in ObjC code is to do something like:
  rv = mNavHistoryService->GetNewQuery(getter_AddRefs(query));
  NS_ENSURE_SUCCESS(rv, /* void */);
  NS_ENSURE_TRUE(query, /* void */);

This will still log out the gecko exceptions in debug builds - very handy when running down issues.

+  nsCOMPtr<nsINavHistoryQueryOptions> options;
+  mNavHistoryService->GetNewQueryOptions(getter_AddRefs(options));
+  if (!options)
+    return;
  rv = mNavHistoryService->GetNewQueryOptions(getter_AddRefs(options));
  NS_ENSURE_SUCCESS(rv, /* void */);

+  nsCOMPtr<nsINavHistoryResult> result;
+  mNavHistoryService->ExecuteQuery(query, options, getter_AddRefs(result));
+  if (!result)
+    return;

  rv = mNavHistoryService->ExecuteQuery(query, options, getter_AddRefs(result));
  NS_ENSURE_SUCCESS(rv, /* void */);

+  nsCOMPtr<nsINavHistoryContainerResultNode> rootNode;
+  result->GetRoot(getter_AddRefs(rootNode));
+  if (!rootNode)
+    return;
  
  rv = result->GetRoot(getter_AddRefs(rootNode));
  NS_ENSURE_SUCCESS(rv, /* void */);

+  rootNode->SetContainerOpen(PR_TRUE);
+  PRUint32 childCount = 0;
+  rootNode->GetChildCount(&childCount);

  rv = rootNode->SetContainerOpen(PR_TRUE);
  NS_ENSURE_SUCCESS(rv, /* void */);
  
  PRUint32 childCount = 0;
  rv = rootNode->GetChildCount(&childCount);
  NS_ENSURE_SUCCESS(rv, /* void */);


Let's work on this block as well:
 - (NSDate*)firstVisit
 {
+  if (!mFirstVisitDate) {
+    // First set up a sensible default in case we can't get the true date.
+    mFirstVisitDate = [[NSDate alloc] init];
+
+    nsCOMPtr<nsINavHistoryService> histSvc = do_GetService("@mozilla.org/browser/nav-history-service;1");
+    if (!histSvc)
+      return mFirstVisitDate;

  nsCOMPtr<nsINavHistoryService> histSvn =
    do_GetService("@mozilla.org/browser/nav-history-service;1", &rv);
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

+    nsCOMPtr<nsINavHistoryQuery> query;
+    histSvc->GetNewQuery(getter_AddRefs(query));
+    if (!query)
+      return mFirstVisitDate;

  rv = histSvc->GetNewQuery(getter_AddRefs(query));
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

+    nsCOMPtr<nsIURI> uri;
+    if (NS_FAILED(NS_NewURI(getter_AddRefs(uri), [mURL UTF8String])))
+      return mFirstVisitDate;

  rv = NS_NewURI(getter_AddRefs(uri), [mURL UTF8String]);
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

+    if (NS_FAILED(query->SetUri(uri)))
+      return mFirstVisitDate;

  rv = query->SetUri(uri);
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

+    nsCOMPtr<nsINavHistoryQueryOptions> options;
+    histSvc->GetNewQueryOptions(getter_AddRefs(options));
+    if (!options)
+      return mFirstVisitDate;

  rv = histSvc->GetNewQueryOptions(getter_AddRefs(options));
  NS_ENSURE_SUCCESS(mFirstVisitDate, mFirstVisitDate);

+    if (NS_FAILED(options->SetResultType(nsINavHistoryQueryOptions::RESULTS_AS_VISIT)))
+      return mFirstVisitDate;

  rv = options->SetResultType(
    nsINavHistoryQueryOptions::RESULTS_AS_VISIT));
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

+    if (NS_FAILED(options->SetSortingMode(nsINavHistoryQueryOptions::SORT_BY_DATE_ASCENDING)))
+      return mFirstVisitDate;

  rv = options->SetSortingMode(
    nsINavHistoryQueryOptions::SORT_BY_DATE_ASCENDING);
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

+    // We only want one result: the oldest.
+    if (NS_FAILED(options->SetMaxResults(1)))
+      return mFirstVisitDate;

  rv = options->SetMaxResults(1);
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

+    nsCOMPtr<nsINavHistoryResult> result;
+    histSvc->ExecuteQuery(query, options, getter_AddRefs(result));
+    if (!result)
+      return mFirstVisitDate;

  rv = histSvc->ExecuteQuery(query, options, getter_AddRefs(result));
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);
  NS_ENSURE_TRUE(result, mFirstVisitDate);  // if needed.

+    nsCOMPtr<nsINavHistoryContainerResultNode> rootNode;
+    result->GetRoot(getter_AddRefs(rootNode));
+    if (!rootNode)
+      return mFirstVisitDate;

  rv = result->GetRoot(getter_AddRefs(rootNode));
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);
  NS_ENSURE_TRUE(mFirstVisitDate, mFirstVisitDate);  // if needed.

+    rootNode->SetContainerOpen(PR_TRUE);
+    PRUint32 childCount = 0;
+    rootNode->GetChildCount(&childCount);

  rv = rootNode->SetContainerOpen(PR_TRUE);
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

  PRUint32 childCount = 0;
  rv = rootNode->GetChildCount(&childCount);
  NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

+    nsCOMPtr<nsINavHistoryResultNode> resultNode;
+    if (childCount)
+      rootNode->GetChild(0, getter_AddRefs(resultNode));
+
+    PRTime firstVisit;
+    if (resultNode && NS_SUCCEEDED(resultNode->GetTime(&firstVisit))) {
+      [mFirstVisitDate autorelease];
+      mFirstVisitDate = [[NSDate dateWithPRTime:firstVisit] retain];
+    }

  if (childCount) {
    rv = rootNode->GetChild(0, getter_AddRefs(resultNode));
    NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

    PRTime firstVisit;
    rv = resultNode->GetTime(&firstVisit);
    NS_ENSURE_SUCCESS(rv, mFirstVisitDate);

    [mFirstVisitDate autorelease];  // why not just |release?|.
    mFirstVisitDate = [[NSDate dateWithPRTime:firstVisit] retain];
  }
Attachment #426158 - Flags: review?(nick.kreeger) → review+
This just merges the project changes with current cvs trunk (new files added by bug 306613) so that this patch applies again.
Attachment #426419 - Attachment is obsolete: true
New patch of code changes, with review comments addressed. The other patches shouldn't need updating, except maybe the build-change one to update the years in the license blocks.
Attachment #426418 - Attachment is obsolete: true
Attachment #431304 - Flags: superreview?(stuart.morgan+bugzilla)
Attachment #431304 - Flags: review+
Comment on attachment 431304 [details] [diff] [review]
v1.3, part 1: code changes only 

>+        NS_ENSURE_TRUE(query, NS_ERROR_UNEXPECTED);

I just have to say for the record... this macro name sucks :( I had to go read the definition of the macro to convince myself it was safe to use on a pointer. I see that it's commonly used this way in core, so it's fine here, but yuck.

>+        rootNode->SetContainerOpen(PR_TRUE);
>+        PRUint32 childCount = 0;
>+        rv = rootNode->GetChildCount(&childCount);
>+        NS_ENSURE_SUCCESS(rv, rv);
>+        NS_ENSURE_TRUE(childCount != 1, NS_ERROR_UNEXPECTED);

Why != 1? Seems like it could use a comment explaining, since as a reader who doesn't know the API this is the only part of the function that didn't seem straightforward (and a glance at the header wasn't illuminating).

>+  for (uint i = 0; i < childCount; i++) {

We generally use "unsigned int".

>+// Returns the date of the first visit to this item. Finding this date involves
>+// querying Places for the earliest visit to the item's URL. Doing this when
>+// initializing the item is slower by an order of magnitude, presumably because
>+// we are already accessing the database.

Can you clarify the second sentence in this comment? Do you mean 'getting the first visit count of everything during initial load vs not getting it at all is 10x slower' or 'getting the first visit count of one item is 10x slower during initial load than it would be later on'? The second would surprise me, whereas the first makes sense to me (assuming that what I'm inferring about the DB structure from some comments in the IDL is correct).

The last part of the comment confuses (especially if you mean the second), since it's opposite what one would expect about disk locality.


sr- only because I'd like to see the updated comments. Overall this is awesome :)
Attachment #431304 - Flags: superreview?(stuart.morgan+bugzilla) → superreview-
SR comments addressed.

>Why != 1? 

Oops, that should be == 1, since we are expecting only 1 result from our query.

>Do you mean 'getting the first visit count of everything during initial load
>vs not getting it at all is 10x slower' or 'getting the first visit count of
>one item is 10x slower during initial load than it would be later on'?

I presume it is the former. Moving the query from init to firstVisit meant loading about:history went from ~50 seconds to less than 5.
Attachment #431304 - Attachment is obsolete: true
Attachment #432361 - Flags: superreview?(stuart.morgan+bugzilla)
Comment on attachment 432361 [details] [diff] [review]
Fix v1.4, part 1: code changes only

sr=smorgan
Attachment #432361 - Flags: superreview?(stuart.morgan+bugzilla) → superreview+
Comment on attachment 430886 [details] [diff] [review]
v1.2, part 2: build-related changes (merged to trunk)

sr=smorgan on build changes too. Ship it!
Attachment #430886 - Flags: superreview+
(In reply to comment #27)
> (From update of attachment 430886 [details] [diff] [review])
> sr=smorgan on build changes too. Ship it!

So shipped!

(without the MOZ_MORKREADER in confvars.sh, because MOZ_PLACES=1 does that for us)

[8:39pm] • sauron applauds hendy
[8:39pm] hendy: well done everyone

Everyone be sure to do a full rebuild of your trees to pick up the changes that make the new Core stuff build.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Flags: camino2.1a1? → camino2.1a1+
Resolution: --- → FIXED
Target Milestone: --- → Camino2.1
Some numbers, for the record (and because Stuart asked about the size Saturday):

First column is the 2010-03-13 nightly, second is 2010-03-14 nightly.

Camino.app  47,563,856 B  47,847,289 B  +283,433 B  +0.59%
Disk image  16,556,278 B  16,652,678 B  + 96,400 B  +0.58%

Note the size numbers include the savings we got from no longer linking/shipping libxpautocomplete (since bug 545341 also landed in time for the 3-14 nightly).

Ts on cb-minibinus01 + ~40ms, + 2.6%
Ts on cb-xserve01    +  ~5ms, + 0.8%
Ts on cb-xserve04    + ~20ms, + 3.1%

So, in summary, we're about 1/2 a percent larger and between 1 and 3 percent slower at startup.
are history-related ops once we're started faster though?
(In reply to comment #29)
> between 1 and 3 percent slower at startup.

Ouch :(

hendy, our history loading is currently synchronous on startup, right? Perhaps we should file a bug to do it asynchronously instead (the way autocomplete works).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: