Typing in location bar is horrendously slow with large history file after new autocomplete landed

RESOLVED FIXED in Camino2.1

Status

defect
--
major
RESOLVED FIXED
10 years ago
9 years ago

People

(Reporter: bugzilla-graveyard, Assigned: stuart.morgan+bugzilla)

Tracking

(Depends on 1 bug)

Trunk
Camino2.1
All
macOS
Dependency tree / graph
Bug Flags:
camino2.1a1 +

Details

Attachments

(8 attachments, 20 obsolete attachments)

20.85 KB, patch
mikepinkerton
: superreview+
Details | Diff | Splinter Review
12.89 KB, patch
mikepinkerton
: superreview+
Details | Diff | Splinter Review
5.35 KB, patch
mikepinkerton
: superreview+
Details | Diff | Splinter Review
23.03 KB, text/plain
Details
21.80 KB, text/plain
Details
31.64 KB, patch
mikepinkerton
: superreview+
Details | Diff | Splinter Review
2.11 KB, text/plain
Details
Fix
43.73 KB, patch
mikepinkerton
: superreview+
Details | Diff | Splinter Review
I just grabbed my first nightly after the autocomplete landing. Typing in the location bar is absolutely unusable. It takes ~ 2 seconds for each letter to appear.

I need to make a canonical "large" history.dat file available for testing like we have for bookmarks.
Severity: blocker → major
Assignee: nobody → dan.j.weber
Blocks: 166288
Can you try with a smaller value for the chunk size constant, and see if that goes any better?
Posted patch Patch 1 (obsolete) — Splinter Review
The problem is that no where else in Camino is ALL of the history data loaded at one time. When this is done using loadLazily on the HistoryDataSource, there is a significant pause and the main thread is tied up. This first-draft patch does not use HistoryDataSource, but instead loads the history data using a chunked asynchronous technique, similar to how the bookmark and history data is searched. Note that this patch does not include the optimization for the common case, where the old search string is a substring of the new search string. That is being worked on separately. Thus, while this patch does allow the main thread to update and won't tie up the GUI, autocompleting from 10k+ history items is still relatively slow.
Attachment #390935 - Flags: review?(stuart.morgan+bugzilla)
Posted patch Patch 2 (obsolete) — Splinter Review
Attachment #390935 - Attachment is obsolete: true
Attachment #390940 - Flags: review?(stuart.morgan+bugzilla)
Attachment #390935 - Flags: review?(stuart.morgan+bugzilla)
Comment on attachment 390940 [details] [diff] [review]
Patch 2

I really really don't want to have another class start leaking Gecko objects from its API. Seems like the asynchronous load would be great as an alternative to (the extremely poorly named) loadLazily in the HistoryDataSource itself, with a callback on completion to its caller. Plus, then you can share the implementation instead of copying it (probably by having a helper that takes a max number to process, and having both versions call that, but I haven't thought that all the way through).

>+  nsISupports*            mCurrentHistoryItem;

There's no need for this to be a member variable at all.

>+  nsISimpleEnumerator*    mHistoryEnumerator;

This is an imposter member variable; it's only really valid during the execution of one method. Instead, you can box the pointer with NSValue and then have it be an argument to the chunked loader function.

> +  int itemCount = 0;
> +  while (itemCount < 500) {

Just use a for loop; there's no harm in incrementing in the unexpected error-getting-thisItem case.

>+    mHistoryEnumerator->GetNext(&mCurrentHistoryItem);

You are leaking the "returned" nsISupports here. Core memory management is really confusing at first though, so I'm not surprised ;)

Like Cocoa objects, core objects are refcounted. There's no autorelease pool though, so ownership is trickier; the convention is that Get---- functions give you a refcounted object. nsCOMPtr is a smart pointer class that manages reference counting for you; basically, when you assign to one it increments the refcount of the thing it's getting, and when the nsCOMPtr is destroyed (usually by going out of scope) the refcount is decremented. When you see "getter_AddRefs" wrapping an output argument, as in the code you based this on, that's a set of typecasting tricks that makes it work out so that the nsCOMPtr doesn't end up adding an extra reference on top of the one that the Get---- function makes and transfers to you.

So if you can't use an nsCOMPtr, you are responsible for calling NS_RELEASE on things you get back from Get-ers.

In this case, you can just make mCurrentHistoryItem an nsCOMPtr when you make it into a local.

>+  if (historyItems) {  

Whitespace at end of line.

>+    historyItems->GetItemEnumerator(&mHistoryEnumerator);

Here you'll have to manage ownership manually; the chunked loader will need to be responsible for NS_RELEASE-ing this when the load is finished.
Attachment #390940 - Flags: review?(stuart.morgan+bugzilla) → review-
Posted patch Patch 3 (obsolete) — Splinter Review
There are still some outstanding issues, but I need a break from this for a bit:
- the autocomplete data starts loading on launch and can take a while if you have a large history file; you can use autocomplete in the meantime, but not all results will show up until all the data is loaded
- if you delete a history item, it will still show up in autocomplete until the next launch
- again, if you have a large history file: on quit, an enormous tree is dealloc'ed and it takes a second
- I don't know if url/title matching is 100% where it should be. Please test and criticize

On another note, I've been testing Safari 4 with 15k history items and it doesn't handle it too well. Beachballs a lot in history view, thanks to that "amazing" coverflow view. Autocomplete beachballs for several seconds the first couple times you use it, and after that it's snappy. Of course it's not even searching titles. In a way, this makes me happy : )
Attachment #390940 - Attachment is obsolete: true
Attachment #393004 - Flags: review?(stuart.morgan+bugzilla)
Comment on attachment 393004 [details] [diff] [review]
Patch 3

Just so I'm straight on how the in-flight stuff interacts, how does this fit in with the bug 504876 patch?

>+  AutoCompleteTrieNode*           mAutoCompleteDataRoot;    // owned

I think we should avoid using the term auto-complete for the data inside the bookmark/history system, favoring a generic term (search term index?) that makes it clear it's generally applicable to doing searches.

>+// Autocomplete
>+- (NSArray *)bookmarksMatchingString:(NSString *)searchString;

We've been opting for quality over consistency in comments for new methods (so that things don't get even worse), so even though the rest of the header doesn't have real comments this new method should.

>+    if ([searchString hasPrefix:scheme] && [searchString length] > [scheme length]) {

May as well swap the order here, to save a bit of work.

>+      NSRange schemeRange = [searchString rangeOfString:scheme];
>+      searchString = [searchString substringFromIndex:NSMaxRange(schemeRange)];

You already know it's a prefix, so you can just use [scheme length].

>+      truncatedScheme = scheme;

Break after this, since there's no need to keep checking other schemes.

>+  for (unsigned i = 0; [mBookmarkResultsInProgress count] < kMaxResultsPerHeading && i < [bookmarkData count]; i++) {
>+    if (truncatedScheme) {
>+      if (![[[bookmarkData objectAtIndex:i] url] hasPrefix:truncatedScheme])
>+        continue;
>+    }
>+    AutoCompleteResult *result = [self autoCompleteResultForItem:[bookmarkData objectAtIndex:i]];
>+    if (![mBookmarkResultsInProgress containsObject:result]) 
>+      [mBookmarkResultsInProgress addObject:result];
>+  }
>+  [identical code, but for history]

I see a helper method calling out to be born :)

>+@interface AutoCompleteTrieNode : NSObject

Needs a class-level comment

>+  NSString*              mString;

Is there something more descriptive this could be named?.

>+// Searches the tree and returns an array of all items
>+// that match the given string.
>+- (NSArray *)itemsForString:(NSString *)string;

You should clarify that "match" means prefix matching.

>+//- (void)removeItem:(id)item withString:(NSString *)string atDepth:(unsigned)depth;

?

>+- (NSArray *)itemsForString:(NSString *)string atDepth:(unsigned)depth;
>+- (NSArray *)keyArrayForItem:(id)item;
>+- (NSArray *)formsOfURL:(NSString *)url;

All the private methods should have comments describing them too.

>+  if (node) {
>+    [node addItem:item];
>+    [node insertItem:item withString:string atDepth:++depth];
>+  } else {
>+    node = [[[AutoCompleteTrieNode alloc] init] autorelease];
>+    [mChildren setObject:node forKey:key];
>+    [self insertItem:item withString:string atDepth:depth];
>+  }

Why the non-incrementing recursion in the case where you need to make a new node, rather than:
if (!node) {
  // make the node and add it to mChildren
}
[node addItem:item];
[self insertItem:item withString:string atDepth:depth]

>+  /*NSArray *keys = [self keyArrayForItem:item];
>+  NSEnumerator *keyEnumerator = [keys objectEnumerator];
>+  NSString *key;
>+  while ((key = [keyEnumerator nextObject])) {
>+    if ([key length] > 0)
>+      [self removeItem:item withString:key atDepth:0];
>+  }*/

?

>+- (NSArray *)itemsForString:(NSString *)string
>+{
>+  NSArray *words = [string componentsSeparatedByString:@" "];
>+  NSMutableArray *items = [[self itemsForString:[words objectAtIndex:0] atDepth:0] mutableCopy];
>+  for (unsigned i = 1; i < [words count]; i++) {
>+    NSString *word = [words objectAtIndex:i];
>+    if ([word length] > 0) {
>+      NSArray *moreItems = [self itemsForString:word atDepth:0];
>+      NSEnumerator *itemEnumerator = [items objectEnumerator];
>+      id item;
>+      while ((item = [itemEnumerator nextObject])) {
>+        if (![moreItems containsObject:item])
>+          [items removeObject:item];

Can you add some explanatory comments here? I'm not clear on what this is doing.

>+  NSString *keyString = [item title];
>+  NSArray *urls = [self formsOfURL:[item url]];
>+  NSEnumerator *urlEnumerator = [urls objectEnumerator];
>+  NSString *url;
>+  while ((url = [urlEnumerator nextObject]))
>+    keyString = [keyString stringByAppendingFormat:@" %@", url];
>+  keyString = [keyString lowercaseString];
>+  return [keyString componentsSeparatedByString:@" "];

Rather than making a big string you don't want, just split the title and then append each URL form; URLs can't contain spaces, so you don't need to split them.

>+  NSURL *nsURL = [NSURL URLWithString:url];

Declare this when you need it, not up here.

>+  [urls addObject:url];
>+  ...
>+      [urls removeLastObject];

Rather than add and then maybe remove it, just keep track of whether you want to add it and then do so after if necessary. I'm confused by the intent here though... the full form of the URL only goes in if it doesn't have a scheme you recognize?

>+    NSRange schemeRange = [url rangeOfString:scheme];
>+    if (schemeRange.location == 0...

[url hasPrefix:scheme];

>+      url = [url substringFromIndex:NSMaxRange(schemeRange)];

Assigning to input parameters tends to be confusing; make a new local instead and initialize it to url.

>+  NSString *nsHost = [nsURL host];

Just "host"; it's not a NSHost.

>+    NSString *restOfURL = [url substringFromIndex:NSMaxRange([url rangeOfString:nsHost])];
>+    [urls addObject:[nsHost stringByAppendingString:restOfURL]];

Isn't this the same as stripping off the parameter (with the really rare exception of in-url login information)?

>+    NSRange lastDot = [nsHost rangeOfString:@"." options:NSBackwardsSearch];

Why do you need this, rather than just stopping when you can't find any more dots?

>+      [urls addObject:[nsHost stringByAppendingString:restOfURL]];

This worries me a bit, size-wise, but I don't have any good suggestions for alternatives at the moment.

I do think though, that if this is structure is going to contain whole URLs, that we should probably use a Patricia trie rather than a pure trie. Many URLs are long strings that only share a relatively short prefix with everything else, and breaking them down to full nodes for individual characters will be a *lot* of memory overhead.

>+- (unsigned)numberOfVisits;

unsigned int

>+  unsigned          mNumberOfVisits;

Same
Attachment #393004 - Flags: review?(stuart.morgan+bugzilla) → review-
(In reply to comment #6)
> I do think though, that if this is structure is going to contain whole URLs,
> that we should probably use a Patricia trie rather than a pure trie. Many URLs
> are long strings that only share a relatively short prefix with everything
> else, and breaking them down to full nodes for individual characters will be a
> *lot* of memory overhead.

I've looked into this and created some test cases, and the problem I see is that we need to be able to get all items for a key or any part of the key. In a trie, "mozilla.com" would be stored under each letter so the item will show up when the user types "m" or "mozil", for example. In a patricia tree, since new nodes are created only when the prefixes diverge, how would we fetch items for a part of the key where it doesn't diverge?

I agree that the memory required for a trie with thousands of URLs is too much. In my original code for this bug, I only built a "trie" up to the first two characters, and then simply searched an array when the search string was longer. Even with 20k history items, this seemed to be fast enough, and it used much less memory. I'm going to go ahead and rework this to use my original, simpler approach unless a better idea comes up.
A patricia trie is doable, but maybe not worth the effort right now; I'll file a follow-up bug for that (and it doesn't even have to be assigned to you ;) ).

I don't think we need to go all the way back to a depth of two though. If we want to put a max depth, let's use something longer (20?) to get most of the domain part in the trie, and just have URLs dangling. But that suggests another approach that should be trivial to implement: slice the path off of the URL, use the trie for just the domain, and then do an array-scan doing prefix searching on the path part. Does that seem like a reasonable short-term compromise?
Posted patch Patch 4 (obsolete) — Splinter Review
Attachment #394462 - Flags: review?(stuart.morgan+bugzilla)
Comment on attachment 394462 [details] [diff] [review]
Patch 4

>+    // Load autocomplete data.
>+    mBookmarkSearchData = [[SearchDataStorage alloc] init];
>+    NSArray* allBookmarks = [[self bookmarkRoot] allChildBookmarks];
>+    NSEnumerator* bookmarkEnumerator = [allBookmarks objectEnumerator];
>+    Bookmark* bookmark;
>+    while ((bookmark = [bookmarkEnumerator nextObject])) {
>+      [mBookmarkSearchData insertItem:bookmark];
>+    }

I forget, is this method running on the main thread? If so, how long does this step take for a large bookmark set?

>+#import "CHBrowserView.h"

What is this import needed for? Seems odd to have a backend class importing a view class.

> const unsigned int kNumberOfItemsPerChunk = 100;
>+const unsigned int kNumberOfItemsToLoadPerChunk = 200;

These need clearer names and/or some comments, since the difference is not clear.

>+// Starts loading history data asynchronously.
>+- (void)loadHistorySearchData;
>+
>+// Loads one chunk of the history data according to kNumberOfItemsToLoadPerChunk,
>+// and inserts each item into the history search storage.
>+- (void)loadChunkOfHistoryFromEnumerator:(NSValue *)enumerator;

What happened to making this part of the history system (per comment 4)?

>+      HistorySiteItem *historyItem = [[[HistorySiteItem alloc] initWithDataSource:nil historyItem:thisItem] autorelease];
>+      [mHistorySearchData insertItem:historyItem];

If the raw history loading does need to stay in this class, you might want to add an alternate API for populating the search data that bypasses the creation of a HistorySiteItem, since that's doing a bunch of work that you don't need (string and date creation for members you'll never access). It would be worth profiling to see how much time that's taking (either with sample, or we can talk about Shark sometime next week).

>+// Returns an array of prefixes for a key, the longest of which is specified by
>+// kMaxPrefixLength. The longer the prefixes are, the faster the search will be,
>+// since prefixes are stored in a dictionary associated with the item they were
>+// derived from. However, we need to store not just the maximum-length prefix,
>+// but all prefixes of a key. For example, the key "mozilla" has the prefixes:
>+// "m", "mo", "moz", "mozi", "mozil", "mozill", "mozilla". Thus, longer prefixes
>+// require significantly more memory.
>+- (NSArray *)prefixesForKey:(NSString *)key;

The memory cost here is going to be crazy; making autocomplete faster at the expense of making the entire app slower due to increased memory pressure is not an option.

>+  if (!schemesToIgnore)
>+    schemesToIgnore = [[NSArray alloc] initWithObjects:@"ftp://", @"http://", @"https://", nil];

Why have a list of schemes to ignore, rather than ignoring all schemes?

>+  if (!topLevelDomainsToIgnore)
>+    topLevelDomainsToIgnore = [[NSArray alloc] initWithObjects:@"com", @"co.uk", nil];

This is way too short a list. The temporary fix (if we need this at all) should be to ignore just the last fragment, whatever it is. We can wire in a true eTLD system later if necessary.



I feel like we're just thrashing on index implementations here. We need performance and numbers for construction, search, and shutdown for approaches, as well as memory cost for the structure being built, for a consistent (large) data set, so that we can really understand the costs and tradeoffs. We also need to profile the slow parts and really see where the cost is coming from in each case so that we can figure out if we can improve an existing approach, rather than throwing away and rewriting each time.

Ping me in channel when you get a chance, and we can talk about how to get hard data for the implementations that have been written so far.
Attachment #394462 - Flags: review?(stuart.morgan+bugzilla) → review-
There are two problems with this new implementation:
1. It's really slow ('strings -a history.dat | grep -c http://' returns '22378' for me). I keep 90 days of history and (used to) love the fact I could find most things by typing a couple of characters of the URL from the last time I visited. Now, it's not just slow, it's clunky - bookmarks triumph over history, and the dropdown gets stuck open even after the page has been selected and loaded (until the URL is cleared from the location bar).

2. It has changed the functionality. I used to be able to choose wikipedia from my recent links and add the page I was after to the end, eg. http://en.wikipedia.org/wiki/ + SSID => http://en.wikipedia.org/wiki/SSID. Now, however, when I type the SSID, it's just ignored and I get directed to the main page of wikipedia. I think this is a bug, but perhaps the implementers of this new approach think it's a feature?

FWIW, these two combined make me very inclined to regress back to the last working version of Camino that didn't have this new "feature"
(In reply to comment #11)
> 1. It's really slow

Thus this bug.

> bookmarks triumph over history,

Nothing at all to do with this bug.

> and the dropdown gets stuck open even after the page has been selected and
> loaded

Also a completely different (already filed) bug.

> 2. It has changed the functionality.

I've never seen what you are talking about. It is, however, completely unrelated to this bug.

If you want to have general discussions, please use the forums, not a random bug that's unrelated to most of what you are talking about.

> FWIW, these two combined make me very inclined to regress back to the last
> working version of Camino that didn't have this new "feature"

If putting up with bugs and partially-finished changes bothers you, then using nightly builds is pretty clearly the wrong choice. That's why we have a stable version.
We should try to get this for a1.
Flags: camino2.1a1?
Posted patch Patch 5 (obsolete) — Splinter Review
Ok, coming back to this after a little break I've improved the trie a lot, particularly the add method, which is a lot faster now. Right now, I only deal with history items, but adding bookmarks back in later shouldn't be hard. With about 18.5k history items, the trie is built in about 10 seconds on my machine, and that's with generous time given to other things on the thread. I tried to make minimal changes to other classes this time around, working with the HistoryDataSource pretty much as it is.

The way this works is at startup, the trie building process is started. Autocomplete will work right off the bat, but the trie will have to completely finish for all results to show up. For most people's history, that will be instantly. For people with a lot of history, it will take several seconds. Behavioral wise, I think this is about as good as it's going to get.
Attachment #393004 - Attachment is obsolete: true
Attachment #394462 - Attachment is obsolete: true
Attachment #423112 - Flags: review?(stuart.morgan+bugzilla)
Before I forget, we'll want to monitor startup changes; we don't want to make startup noticeably slower for people with normal histories (7-14 days, quitting occasionally), but we also don't want to make our heavy history users keep screaming at us :)

I'll try to take this patch on a test spin sometime this week with my 10 MB file.
I get a build failure with this patch:

Undefined symbols:
  ".objc_class_name_Trie", referenced from:
      literal-pointer@__OBJC@__cls_refs@Trie in AutoCompleteDataSource.o
ld: symbol(s) not found

around line 26631 in the buildlog

buildlog: http://dev.l-c-n.com/_b/buildlog_b506345.txt.zip
(that one is with the 10.5sdk, I get the same error with the 10.4usdk)
Yeah, it's missing the Xcode project change. Dan, for the next cycle (you can wait until I've done the initial review, rather than respinning now), make sure the Xcode project change is made in or copied to your source dir, not the OBJDIR, and make sure you are in the source dir when you run diff. (I know, it's a pain :( if you've forgotten the details, ping me on IM sometime.)
Sorry, I have forgotten how to do this, unfortunately, and didn't realize people would actually want to apply the patch.

If anyone does get this working, I've noticed that items will randomly get deleted from autocomplete results after you visit them, but only if Camino has been open for some time. This has to do with the fact that there are a bunch of different history datasources. Hopefully I will have a new patch up soon that addresses this, as well as adds the Xcode files.
@Smokey -- build succeeded now (and runs under a default profile) ! Thanks.
Now gotta test a little with that 150days history before I blow it away by accident…
Comment on attachment 423112 [details] [diff] [review]
Patch 5

So far I've only looked at the trie; I'll get to the rest as I have time, but I wanted to give at least partial feedback now to shorten the cycle.

>+//  Created by Daniel Weber on 1/21/10.
>+//  Copyright 2010 __MyCompanyName__. All rights reserved.

The license block on the Trie files needs fixing; you may as well do that now. Copy from a recent-ish file like one of the *BookmarkConverter classes, and fix the name and year.

>+#import "Trie.h"

The header shouldn't import itself

>+@interface Trie : NSObject {
>+  TrieNode *mRoot;
>+}
>+
>+- (void)addItem:(id)item;
>+- (void)removeItem:(id)item;
>+- (NSArray *)itemsForString:(NSString *)string;

You need a class-level comment and method-level comments for all the methods (including your private methods), and the same goes for TrieNode. An overview of the approach in a comment at the top of the implementation file would be helpful for a class like this as well.

>+const unsigned int kMaxPrefixLength = 5;

This is really the max trie depth, right? If so, calling it that would make the purpose much clearer.

>+  TrieNode*              mParent;

Unused.

>+  BOOL                   isLeaf;
>+  BOOL                   isWord;

Also unused.

>+- (NSArray *)itemsForString:(NSString *)string atDepth:(unsigned)depth;

"String" as a selector word and variable name doesn't really convey any meaning. Is this a key? Anything more specific would be better.

>+    mChildren = [[NSMutableDictionary alloc] init];

If I'm not mistaken, mChildren is only actually used up to depth-1; you are paying the memory cost of the NSMutableDictionary for all the leaf nodes even though they won't need it. Instead, you can create it conditionally in addItem:withKey:, and all the places that search or iterate it will Just Work with the nil value if it hasn't been created.

>+    if ([item visitCount] < [testItem visitCount])

With a really generic name like "Trie", and an id type on item, the call to visitCount is pretty surprising; similar things happen in other parts of the Trie code. There are a few solutions:
1) Rename this HistoryTrie, and make it special-purpose. I don't think we want to do that though; the code is very close to being completely generic.
2) Define a protocol in the Trie header (TrieItem or something), defining methods for all the queries you need to make (e.g., here: sortOrder or something). Have HistoryItem implement that protocol with passthrough calls to things like visitCount.
3) Make the calls key-value based instead, and have the Trie init take the keys to use as arguments. This would require some re-jiggering though, since the TrieNode would either have to have a pointer back to the Trie, to get the keys when necessary, or the relevant values would need to be passed in to the TrieNode by the trie (so, addItem:withKey:sortOrder:...)

3 would be ideal, since it wouldn't require changing other classes just to store them in a container, but if there's anything nasty there that I'm not seeing, 2 would be okay. (If 3 doesn't work out, let me know why so we can see if there's a solution).

>+    [node addItem:item];
[...]
>+    [node addItem:item withKey:[key substringFromIndex:1]];

It looks like you are doing multiple storage here. Do you have numbers on the memory cost vs. query time tradeoff of doing that vs collecting the sub-lists at query time and then merge-sorting by the ordering number (visitCount)?

>+  // is in the trie. Only "example.com" would return a match. It is up to the
>+  // user of the trie to truncate schemes from search strings and then check the

truncate -> remove (since truncating a string means shortening from the end).

>+    // If the url does not have a valid scheme, add it to the trie so it shows
>+    // up in search results. For example, mailto:jsmith@example.com will match
>+    // the string "mailto".

What does "valid" mean here? How is mailto not a valid scheme?

>+    // If a host is found, we iterate through each subdomain and add a copy of

"subdomain" isn't really accurate; I think you mean domain fragment.

>+    // to any fragment of the domain. The final subdomain, or top level domain,

"final subdomain" is also not actually what you mean here. Just say "The top level domain"

>+    // we just ignore the final part of the URL after the last dot, which isn't
>+    // 100% compatible with two-part TLDs such as co.uk.

"isn't 100% compatible with" -> "isn't correct for". Go ahead and add a TODO: right after this to consider replacing the approximation with a real eTLD lookup.

>+    // If prefix is longer, we need to truncate it to the max prefix length
>+    // and then check each item in the array to make sure it really does match.

For the way this will actually be used, you can potentially save a *lot* of work here by having the client pass in a maxResults parameter with their query. Because you already have them sorted in the order the caller wants, and they are only using a handful of results, if they tell you how many they want you can simply stop when you have enough and not do huge amount of extra string generation comparison. (Think about someone typing "amazon" after spending several hours the day before visiting tons of different product pages, and how much of the total work will be done to generate results you will immediately discard.) 

The wrinkle here is the multi-word search, where you can't just stop before knowing if your small result set matches the rest of the words. Options:
1) In the multi-word case, pass a no-limit value into this helper method from itemsForString:, and leave that case as it currently is. That's still inefficient, but only in the rare multi-word query case.
2) Rearrange the multi-word code as follows:
- get the arrays for all words from the trie, without doing the prefix filtering
- do the filter-by-other-lists (note, you may want to use indexOfObjectIdenticalTo: to test instead of containsObject:. I think, but you'd need to verify, that there's no case where you'd have two different history item instances that were equal for your purposes)
- then, finally, do the relatively expensive generate-all-the-keys-and-see-if-it's-really-a-match step, applying the cap here.

2 shouldn't be too bad if you split the "is it a real match" part into a helper method, but I'd be okay with that being a follow-up if you'd rather do 1 now.
Comment on attachment 423112 [details] [diff] [review]
Patch 5

Sorry, I thought I'd have time for the rest of this patch sooner.

>-@class HistoryItem;
>+@class HistoryItem, Trie;

We generally do these one per line.

+  BOOL                    isHistoryLoading;

mIsHistoryLoading

>++ (AutoCompleteDataSource *)sharedDataSource;

Needs a short method comment (even though it's a fairly standard pattern).

>+#import "nsCOMPtr.h"
>+#include "nsIBrowserHistory.h"
>+#include "nsIHistoryItems.h"
>+#include "nsIServiceManager.h"
>+#include "nsISimpleEnumerator.h"

This makes me kind of sad; per comment 4 this seems like it would better belong in the history code. However, given that all the history code in flux at the moment we should probably just file a follow-up bug about moving this, and proceed as-is for now.

>+const float kLoadNextChunkDelay = 0.1;

I believe the correct type here is NSTimeInterval. It would also be good to put a unit in the variable name for maximum clarity.

>+static NSTimeInterval time1, time2;

Remove all the timing-related code.

>+  [mHistoryTrie release];
>+  mHistoryTrie = [[Trie alloc] init];

Why create an empty Trie in init if you are immediately going to do this? Ei

>+  nsCOMPtr<nsIBrowserHistory> globalHistory = do_GetService("@mozilla.org/browser/global-history;2");
>+  if (!globalHistory)
>+    return;
>+  nsCOMPtr<nsIHistoryItems> historyItems = do_QueryInterface(globalHistory);
>+  if (!historyItems)
>+    return;

You can skip the |if (!globalHistory)| check; it's safe to do_QI a null pointer, and you'll get a null pointer back, so the second check covers both cases.

>   } else {
>+    [self performSelector:@selector(queueHistoryChange:) withObject:changeInfo afterDelay:1.0];

This is wasteful (since you already will know exactly when it's done), and also doesn't preserve order (what if a history item is added, then removed, and both those things are queued?). Instead, you should use the mHistoryChangeQueue that you've thoughtfully provided yourself, but aren't using in this patch ;)

I'd suggest renaming this method handleHistoryChange: (since it only sometimes queues), and moving all of the implementation of the non-queuing case to a method called something like updateTrieWithHistoryChange:. Then have the |else| clause just add the dictionary to the end of the mHistoryChangeQueue array, and make a new method (historyLoadingComplete maybe) that flips the boolean and then walks the array, calling the new updateTrieWithHistoryChange: on each item, then clearing the array. Call that method from the completion case of the chunked loader where you currently just change the boolean.

>+- (void)historyChanged:(NSNotification *)notification
> {
>+  [self queueHistoryChange:[notification userInfo]];
> }

In fact, you could just axe what I've called handleHistoryChange: above, and move its body into this method.

>+// If not nil, this userInfo object is the parent of the changed item.
>+extern NSString* const kNotificationHistoryDataSourceChangedUserInfoChangedRoot;

If it's the parent, why call it root instead of parent? Seems like a potential source of confusion.

>+  //if (root || itemOnly)
>+  //{

Remove, don't comment out.

>+    [self notifyChange:kHistoryChangeIconLoaded item:nil root:theItem itemOnly:YES];

This seems wrong; if only the item is changing, then the item should be non-nil and the root nil. You may have to adjust listeners, but that should be fine.

In fact, can't you axe itemOnly: as a parameter and a notification object and convey the same information by whether root(/parent) is nil or not? i.e., if the root is non-nil, then it and everything under it may be affected; if root in nil, then only the item changed.


Also, make sure to strip trailing whitespace from lines when you re-spin.

Overall this patch looks really good. I think we're quite close here :)
Attachment #423112 - Flags: review?(stuart.morgan+bugzilla) → review-
The patch landed in 548476 definately helped on the typing speed in the location bar.
I should note though, that if you write anything that matches your bookmarks/history, then it is quick. If you write anything that isn't matched, then it is slow. Quite funny.
(In reply to comment #24)
> I should note though, that if you write anything that matches your
> bookmarks/history, then it is quick. If you write anything that isn't matched,
> then it is slow. Quite funny.

I noticed the same thing. However, even if it does match your bookmarks/history, CPU usage still seems higher than it should be (50-80%). At least it's faster though.
(In reply to comment #24)
> I should note though, that if you write anything that matches your
> bookmarks/history, then it is quick. If you write anything that isn't matched,
> then it is slow. Quite funny.

Yeah, I've noticed things like this a bit, too.

For instance, my search shortcut is "g", so "g foo" won't match anything for me (I do have one bookmark title where there's a "G C" scenario, but once I've typed "g f" (or any letter other than "c"), shouldn't autocomplete know it won't match?  Or is it still tying to process everything with "g" even as I'm slowly typing "foo"?

Hmm, if I wait a few seconds for "g " to bring up the one match, typing everything after that seems to proceed with more-or-less normal speed, which I guess implies that we *are* still trying to match everything with "g" while I've already proceeded on to typing all of "foo".  Any way we can short-circuit that initial search when we know we have lots of additional input waiting?  Or can we only try to make that initial search faster?
I don't know if this helps, but I also realized that if I type really quickly, more often than not, the text shows up quickly?

Also, the thing I wrote up above doesn't always seem to be the case. Sometimes its quick when it matches something, other times it don't. Still can't seem to figure out the nominator.
Posted patch updated patch, wip (obsolete) — Splinter Review
This is very much a WIP, but it does build against the current trunk and essentially work. Major changes:
- History loading rewritten for 1.9.2
- Bookmark searching re-added (the previous patches ignored bookmarks, which I hadn't realized) in a rough state, using a Trie.
- Trie partially generalized to allow it to hold bookmarks.

There's still plenty to do here; I haven't addressed most of my review comments, and I have a list of other things that should be changed that I found as I've been working on getting it updated. Also, the bookmark loading is going to be really painful for huge bookmark files right now, since the bookmark trie is built synchronous blob.
Comment on attachment 469109 [details] [diff] [review]
updated patch, wip

Just FYI, you've somehow managed to make spurious path and name changes to what appears to be every single plist in the project ;)
Lovely. Must have been a weird side effect of the symlink I made for that path.
Posted patch updated patch, wip 2 (obsolete) — Splinter Review
WIP 2, now with less project file corruption.

I've rewritten the search using the approach I laid out at the end of comment 21, so searching should be even faster now, even for the multi-word case. I tried with our huge bookmark file, and it only took 1-2 seconds to build the trie on my iMac, so that's definitely livable for a1. Searching was instant in my testing.

I don't have a huge history file though; did one ever get posted? Can someone send me one?

I have a bunch still to clean up here (and then follow-up bugs to file), but if people could start kicking the tires on this patch--especially people with lots of history and/or bookmarks--that would be great.
Attachment #469109 - Attachment is obsolete: true
As soon as I can get my build environment back up and running, I'll try this patch.

I can also provide a Giant History File to anyone who's interested.
My current places.sqlite file, for the record, is about 85 MB.

I finally got my build back up and running, applied this patch, and re-launched. Launch time is noticeably slower -- maybe a second or two -- but I can actually type in the location bar now without cringing. That's progress. It isn't quite as instantaneous as it was before we added bookmarks search, but it's WAY better with this patch than it's been at any time since we added bookmarks search.
So far, it works well, typing is much faster (1751 bookmarks and 2 weeks history). I'm trying to rebuild a large history file to check performance.

One thing I noticed: matching on Japanese [*] page titles. That works for the curent code, but fails with this patch.
There is a simple testfile here:
http://dev.l-c-n.com/camino/bm-test.html

The page title reads 'かたかな' (with Jpn IME active, type 'katakana', hit return to accept the characters. The autocomplete dropdown should appear.

[*] CJK and probably other non-roman languages.
When quiting the browser last night, I noticed the following logging in Console:

8/31/10 12:02:55 AM	Camino[58766]	Exception caught in OnVisit: *** Collection <NSCFArray: 0x1ced7df0> was mutated while being enumerated.<CFArray 0x1ced7df0 [0xa0682ee0]>{type = mutable-small, count = 1150, values = (
	0 : HistoryItem 0x1d009b60, url http://english.aljazeera.net/, title AJE - Al Jazeera English
	1 : HistoryItem 0x1cecd0b0, url http://lcn.phiw.dev/, title [ design and fine-art photography ] - La Chatte Noire
............

(multiple entries like that, one for each day in my history)
During the session I had sorted my history.
1. I think I was wrong in my assessment about that logging: It seems to happen more frequently. Right now, I loaded a page from bookmarks that I visit every while and then ( like every once 3 or 4 days). Console was open and I could see the log entry being added. I think it happened when the entry was moved through the history (from 'day of last visit' to today).

That log entry is quite big btw.

2. The issue of matching with Japanese characters (comment 37). It seems to work after a browser restart, but _not_ right after bookmarking the page. Although some pages don't seem to show up.
Bookmarks aren't yet live-updated, only indexed at launch.
Posted patch updated patch, wip 3 (obsolete) — Splinter Review
Updated WIP; not many functional changes, but I've factored a bunch of stuff out of the Trie, and I've made it handle surrogate pairs correctly instead of corrupting them.

Braindump of the current state:

Things I need to do before landing this:
- Re-implement the history loading (by sharing with the history menu's shared source, and making that support async loading). The current patch's history listening doesn't work right for any changes to items that already existed at load, which I missed during initial review.
- Watch for bookmark changes.
- Fix handling of schemes.

Things to do in follow-ups:
- Re-implement the "last search was a prefix of this search" optimization, but in the Trie.
- Make searching async and cancelable again.
- Run all keywords through unicode canonicalization (and possibly diacritical stripping); current code will not handle accents correctly in all cases.
- Use a real segmenter for titles. This patch will almost certainly be a regression over trunk for title searches in languages that don't use spaces to separate words (as in, you will probably only be able to search for the beginning of the title).
Attachment #470195 - Attachment is obsolete: true
This is the first easily separable piece, and I'm happy with the current state of it, so it seems like a good place to start getting pieces reviewed and landed.

(I know this doesn't add it to the project; I'll do that later when it starts being used.)
Attachment #471115 - Flags: superreview?(mikepinkerton)
(In reply to comment #39)
> Created attachment 471114 [details] [diff] [review]
> updated patch, wip 3
> 
> Updated WIP; not many functional changes, but I've factored a bunch of stuff
> out of the Trie, and I've made it handle surrogate pairs correctly instead of
> corrupting them.

Was this WIP supposed to be a complete, working patch?  It doesn't build; it dies with warnings in AutoCompleteTextField and AutoCompleteDataSource: http://pastebin.mozilla.org/780140
Odd, I thought that was on by default. It must depend on the build environment then, since it works here. I'll have to switch to raw pointers and manual ref counting then.
+@interface TrieNode : NSObject
+{

be consistent with { placement. In the header, to put it on the same line as the declaration.


+  NSString* string;
+  while ((string = [stringEnumerator nextObject])) {

Can you put the declaration inside the while loop, eg:

while ((NSString* string = [stringEnumerator....)) {

}

That keeps the scoping limited as if you used a for loop.
Comment on attachment 471115 [details] [diff] [review]
trie implementation [landed]

sr=pink
Attachment #471115 - Flags: superreview?(mikepinkerton) → superreview+
Comment on attachment 471115 [details] [diff] [review]
trie implementation [landed]

Landed http://hg.mozilla.org/camino/rev/cd3234f41dd1
Attachment #471115 - Attachment description: trie implementation → trie implementation [landed]
Oops, missed the comments. I'll incorporate them into the next patch.
(In reply to comment #39)
> Created attachment 471114 [details] [diff] [review]
> updated patch, wip 3

> Things I need to do before landing this:
> - Re-implement the history loading (by sharing with the history menu's shared
> source, and making that support async loading).

This reminded me...I don't know how much of it is the fault of testing a debug build and how much of it is the fault of an absolutely enormous history file, but launching with this patch is *really* slow. Like, 15-20 seconds slow. And hitting Cmd-Y is still painful. Is the above to-do item going to fix that at all?
(In reply to comment #43)
> Can you put the declaration inside the while loop

Sadly, no, that doesn't compile. That part of why enumeration loops are so ugly, and fast enumeration is therefore such a style win. Sadly, I don't think it's worth dropping 10.4 for ;)
(In reply to comment #47)
> Is the above to-do item going to fix that at all?

No, it's already asynchronous. Define "launching"?

If you can get me a huge history file, I can take a look.

> And hitting Cmd-Y is still painful. 

Nothing planned for this bug will have any effect on that.
(In reply to comment #49)
> (In reply to comment #47)
> > Is the above to-do item going to fix that at all?
> 
> No, it's already asynchronous. Define "launching"?

Starting from a state where Camino is not running (even with only half a dozen tabs to restore). I haven't tried it with zero tabs to restore but since it seems equally slow with 10 tabs or 100 tabs, I doubt that's making a difference. The browser window doesn't even appear for at least 10-15 seconds, whereas before this patch (with a regular nightly build) it's nearly instantaneous.

I've sent you an e-mail with a URL for my giant history file.
Posted patch updated patch, wip 4 (obsolete) — Splinter Review
No real progress here, just rebased against trunk, and with the nsCOMPtr removed from the ObjC class so that it will hopefully build for everyone.
Attachment #471114 - Attachment is obsolete: true
Posted file crashlog (obsolete) —
With WIP 4 I had this crash while paging down a document with <spacebar. (page 2 of http://arstechnica.com/microsoft/news/2010/09/inside-internet-explorer-9-redmond-gets-back-in-the-game.ars/)
8 tabs open, all sites I regularly visit.
Yeah, the changes to itemRemoved notifications are wrong. I'll go over that section in the next round.
Well done and much thanks! Things are moving much faster.

FWIW, sometimes the first letter[1] takes a little while (0.5-3 seconds?) to appear, but after that it's as quick as I type.

1: Or first few letters, depending on how you fast you type.
(In reply to comment #50)
> The browser window doesn't even appear for at least 10-15 seconds,

Calling SetContainerOpen during the initial setup takes me 15 seconds, so that's where the problem is. I'm not sure how much can be done about that... maybe doing N queries with time ranges of some number of days.

I take it clicking on the History menu or going to the History view hangs for that long too?
(In reply to comment #55)
> Calling SetContainerOpen during the initial setup takes me 15 seconds, so
> that's where the problem is. I'm not sure how much can be done about that...
> maybe doing N queries with time ranges of some number of days.
> 
> I take it clicking on the History menu or going to the History view hangs for
> that long too?

I haven't timed it, but launching seems like it takes a little longer than showing history.
I took this in a short spin in my debug build(!), and I have to say, even there I pretty much agree with Nathan.  The location bar is definitely usable for typing again :D

I didn't have a chance to do much more testing for other sorts of possible regressions, since I kept crashing.

Interestingly--and I don't know if this is related to debug vs release or 10.5 vs 10.6, but my top frames were:

Application Specific Information:
objc[41474]: FREED(id): message retain sent to freed object=0x1837ac40

Thread 0 Crashed:
0   libobjc.A.dylib               	0x97ce0bfa _objc_error + 116
1   libobjc.A.dylib               	0x97ce0c30 __objc_error + 52
2   libobjc.A.dylib               	0x97cdf637 _freedHandler + 58
3   com.apple.CoreFoundation      	0x95edc0b6 CFRetain + 86
4   com.apple.CoreFoundation      	0x95eabc42 CFDictionaryAddValue + 530
5   com.apple.CoreFoundation      	0x95eac254 CFDictionaryCreate + 132

(The reason I mention this here at all is because that's the same top 3 frames of the infamous bug 470159 and all its related crashes.)
Setting aside the random crashes, wip 4 has worked well so far. As noted typing speed is quite good on my side.

One somewhat puzzling behaviour I've noticed though: autocomplete returning different results than it usually does after a (clean) restart.
Example: typing 'ar' would normally return (from Bookmarks)
1 - armscontrolwonk.com
2 - ardisson.org/afkar/
I've now noticed several times that both these sites didn't show up at all under bookmarks; the 1st one would show up as second entry under history, but with a wrong page title '401 - authorisation required' - which would match a history entry that is several weeks old when the site was closed for maintenance/backend upgrades. I have since then regularly loaded that page.

Restarting the browser again restores the 'normal/correct/expected' pattern.
Philippe (ref #58): I may have imagined it but at one point I though that it was pulling up history quicker than bookmarks, and filling bookmarks in later (in a Google Suggest kind of way).

I'm unable to reproduce that now; does this code have some kind of optimisation in it that says "Start returning results ASAP and fill in the rest when you get them?"  If so, perhaps the "fill in the rest when you get them" code has a glitch?
Posted patch updated patch, wip5 (obsolete) — Splinter Review
Minor WIP update; I fixed the crash, and did some other tweaking/fixing in the history notification code. Should allow for more meaningful testing while I do the next part.
Attachment #475748 - Attachment is obsolete: true
Another easily separable piece to split out for review now: this adds richer information to the history notifications.
Attachment #476658 - Flags: superreview?(mikepinkerton)
(In reply to comment #58)
> I've now noticed several times that both these sites didn't show up at all
> under bookmarks

That's super-bizarre, since one of the limitations of the current WIP is that the bookmark trie is never updated after launch, and bookmark search currently happens before history so that it will "win" for any duplicate results.

(In reply to comment #59)
> I'm unable to reproduce that now; does this code have some kind of optimisation
> in it that says "Start returning results ASAP and fill in the rest when you get
> them?"

No, searching is completely synchronous in the WIP patches.
Is there any reason why performance would degrade with time?

I've been running Version 2.1a1pre (1.9.2.11pre 20100916000758) since Thu Sep 16 19:50:29 2010:
ps -ax -O lstart,stime,time | grep Camino
22687 Thu Sep 16 19:50:29 2010      16:09.85  89:43.93   ??  U    /Applications/Camino.app/Contents/MacOS/Camino -psn_0_22324553

and I'm seeing performance more characteristic of the previous build. More thoroughly:
1. If I type in a new tab, I see slow behaviour
2. Once the drop list (finally) appears, things seem fine
3. If I highlight and _overtype_, performance is fine, BUT
4. If I highlight and delete (= blank again), performance is like (1)

I've attached a sample in case something is weird for me.
Posted file Better sample for comment #63 (obsolete) —
Sorry, previous sample looks pretty empty, this one might be better. It was run whilst I typed in the location bar (although since Sampling is pretty quick, I don't know how much of it it caught)
Attachment #476781 - Attachment is obsolete: true
> 651 -[AutoCompleteDataSource loadSearchableData]

You aren't running with my patch.
(In reply to comment #62)
> (In reply to comment #58)
> > I've now noticed several times that both these sites didn't show up at all
> > under bookmarks
> 
> That's super-bizarre, since one of the limitations of the current WIP is that
> the bookmark trie is never updated after launch, and bookmark search currently
> happens before history so that it will "win" for any duplicate results.

Perhaps I wasn't completely clear: the behaviour was seen during an entire session; that is, it didn't change during that session (those session where never really long given the random crashes).

With WIP 5, I haven't been able to repro it. At least not yet :-)

--
Seen in console with WIP 5:
9/20/10 8:58:01 AM	Camino[18503]	ChimericalConsole loaded
9/20/10 8:58:01 AM	Camino[18503]	*** Starting History Load ***
9/20/10 8:58:01 AM	Camino[18503]	*** A ***
9/20/10 8:58:01 AM	Camino[18503]	*** B ***
9/20/10 8:58:01 AM	Camino[18503]	*** C ***
9/20/10 8:58:01 AM	Camino[18503]	*** D ***
9/20/10 8:58:01 AM	Camino[18503]	*** E ***
9/20/10 8:58:01 AM	Camino[18503]	*** F ***
9/20/10 8:58:01 AM	Camino[18503]	*** G ***
9/20/10 8:58:01 AM	Camino[18503]	*** Synchronous History Load Done ***
9/20/10 8:58:07 AM	Camino[18503]	*** Asynchronous History Load Done ***

A second startup, a couple of hours later, took a bit longer: around 12 seconds.
Whoops, didn't mean to leave all that logging in there :)
I really don't see what the problem with fixing this is. We already know that no-matter what the first letter typed will take time because it has to search the whole database. We already know that if something is found it only looks at the possibilities its chosen that match so it doesn't take a long time to look through them as you type the next letter. Currently if you type a letter that doesn't match anything then it  searches the whole database. Then if you type another letter it searches again. Or so it seems. The solution appears to be as simple as keeping track of the last letter's results and if they did not bring anything up you know that no-matter whats typed next there will be 0 results so it simply should not do the search.
(In reply to comment #66)
> > 651 -[AutoCompleteDataSource loadSearchableData]
> 
> You aren't running with my patch.

Seriously? That's pretty funny :-)  There must be some kind of placebo effect going on! ;-)

I thought I had observed significant performance improvements from the Thursday nightly.

FWIW, I have endeavoured to build from source with your patch, and I still see a line with 'AutoCompleteDataSource loadSearchableData' in my sample. Am I doing something very wrong or am I just confused? Happy to upload the sample if it helps. OTOH, this is my first build from source so it's more likely a mistake on my part.

BTW, my build is displaying similar performance to the nightly I was running (probably slightly slower, perhaps because of debug).
(In reply to comment #69)
> I really don't see what the problem with fixing this is. [...] The solution appears to be as
> simple as keeping track of the last letter's results and if they did not bring
> anything up you know that no-matter whats typed next there will be 0 results so
> it simply should not do the search.

That fixes part of the problem for some people, but isn't even close to enough; we couldn't ship with just that change. There's an old partial patch for doing that (bug 504876) but this work will obsolete it so I don't see any value in continuing it. If you want to, feel free.

The reason this is taking time to fix isn't that I don't know what I'm doing, it's that we are an all volunteer project, currently I'm the only one with the time and expertise to work on this, and my time is very limited.

(In reply to comment #70)
> FWIW, I have endeavoured to build from source with your patch, and I still see
> a line with 'AutoCompleteDataSource loadSearchableData' in my sample. Am I
> doing something very wrong or am I just confused?

You can't have loadSearchableData in a build that includes my patch. Either the support forms or the IRC channel should be able to help you figure out what's going wrong.
(In reply to comment #71)
> You can't have loadSearchableData in a build that includes my patch. Either the
> support forms or the IRC channel should be able to help you figure out what's
> going wrong.

IRC (http://caminobrowser.org/contact/#development) is better than the forum for this type of thing, although we're a little light on coverage currently during the daytime in your $TZ.
With thanks to Philippe and Chris, I have now applied the correct patch (!) and I am seeing even better performance than before :-)  I don't observe the delay on the first letter any more and typing the rest of the word/URL is quite quick.

Thanks you team Camino!
Comment on attachment 476658 [details] [diff] [review]
history notification improvements [landed]

sr=pink
Attachment #476658 - Flags: superreview?(mikepinkerton) → superreview+
Comment on attachment 476658 [details] [diff] [review]
history notification improvements [landed]

Landed notification improvement: http://hg.mozilla.org/camino/rev/4c39c8dc4110
Attachment #476658 - Attachment description: history notification improvements → history notification improvements [landed]
Posted patch updated patch, wip 6 (obsolete) — Splinter Review
Sorry for the long delay. Significant changes since the last WIP:
- Asynchronous loading has been merged into HistoryDataSource. That means that if you remove an item from history, it will stop autocompleting right away, instead of sticking around until relaunch. (Note: "immediately" only applies after the original Trie construction is finished, so if you have a lot of history, you will have to wait a while after launch to test this).
- It also means I can share backends with the History menu (and later, in another bug I'll file once some more of this lands, partially with History Manager). So once history has finished loading on launch, there will not be a hang when clicking on the History menu. Memory usage will also be lower since there won't be two copies of all history items.
- Trie creation is throttled more smoothly, and is also more efficient, so less eating of the CPU during the async work post-launch.


Some rough numbers for reference, with Chris's ginormous history file:
- hang at startup: 15-20 seconds (unchanged from last WIP)
- finishing loading history: 25-30 seconds (low CPU usage)
- building the history trie: ~3.5-4 mins (~50% of one CPU)
Attachment #423112 - Attachment is obsolete: true
Attachment #475790 - Attachment is obsolete: true
Attachment #476657 - Attachment is obsolete: true
Attachment #476782 - Attachment is obsolete: true
A small separable piece: this is just the perf improvement to the trie (and a smidge of style cleanup).
Attachment #482854 - Flags: superreview?(mikepinkerton)
A larger separable piece: this adds an asynchronous loading mode to HistoryDataSource. It's not used in this fragment patch, but it has been tested in the context of the current overall WIP.
Attachment #482860 - Flags: superreview?(mikepinkerton)
(In reply to comment #76)
> Created attachment 482853 [details] [diff] [review]
> updated patch, wip 6

wip 6 triggers this:
10/14/10 9:47:12 PM	Camino[21541]	Exception caught in OnTitleChanged: *** -[NSURL initWithString:relativeToURL:]: nil string parameter

I'm not sure (yet) how though :-(.
First log entry occurred shortly after I fired up the build.

Works smoothly otherwise.
Every time? If so, you could run under gdb and break on -[NSException raise] and objc_exception_throw to see where. From code inspection I'm not finding where that would happen.
(In reply to comment #80)

It is - so far - completely random. Console just logged 35 entries or so over a timespan of 2 seconds (I don't think it is the result of something I actively did). For the whole day, I have about 85 entries.

(the only other 'new' code this build includes is bug 601440).
I made a fresh build with gcc 4.0/10.4sdk (comment 79 was with gcc4.2/10.5 sdk).

Running with Troubleshoot Camino (set to open about:blank on launch)

str:
1. open Camino with Trc
2. load the 'technews' tab group (from the bookmarks menu)

wait a fraction of a moment to allow console to log...

10/15/10 10:33:47 AM	Camino[85073]	Exception raised during posting of notification.  Ignored.  exception: '*** -[NSURL initWithString:relativeToURL:]: nil string parameter'  invoked observer method: '*** -[AutoCompleteDataSource historyChanged:]'  observer: 0x1bc75d0  notification name: 'history_changed'

(repeated 12 times)

I tested the same with a completely new OS user account and got the same result.
Stuart, I get these two exceptions during "loading" (the debug build's profile has 3 windows of a few tabs each, and I copied places.sqlite from my regular profile so that it has a current large history):

Oct 14 23:07:43 Qalaat-Samaan Camino[90919]: Exception raised during posting of notification.  Ignored.  exception: '*** -[NSURL initWithString:relativeToURL:]: nil string parameter'  invoked observer method: '*** -[AutoCompleteDataSource historyChanged:]'  observer: 0x16bee770  notification name: 'history_changed'

Oct 14 23:24:00 Qalaat-Samaan Camino[90919]: Exception caught in OnDeleteURI: *** -[NSCFDictionary removeObjectForKey:]: attempt to remove nil key

When I catch these in gdb, I get the attached results (four backtraces; the first two seem to be the same for the NSURL exception, and the third seems to be a different trace for NSURL exception; the fourth is for removeObjectForKey exception).

At this point, under gdb I don't see any windows in that Camino.  Note also we've got one of those NS_FOO failed warnings right before we hit the first exception, assuming things aren't too out-of-order due to buffering.
(In reply to comment #83)
> At this point, under gdb I don't see any windows in that Camino.

That's not strictly true; they were well-hidden behind other windows.  I ran it again (without re-replacing the places.sqlite, so any startup maintenance that had run last run had happened), and it looked like most pages were at least partially loaded before I hit the NSURL exception.
(In reply to comment #83)
> When I catch these in gdb, I get the attached results (four backtraces; the
> first two seem to be the same for the NSURL exception, and the third seems to
> be a different trace for NSURL exception; the fourth is for removeObjectForKey
> exception).

Bleh, I was reading that output wrong; the logspam about the exception shows up only *after* continuing, so it's two backtraces for the NSURL exception and two for the other one.
Comment on attachment 482860 [details] [diff] [review]
history data source async loading

Oops, I'm an idiot. I removed some safety checks when I was restructuring. I'll need to make a new version of the stand-alone patch too.
Attachment #482860 - Attachment is obsolete: true
Attachment #482860 - Flags: superreview?(mikepinkerton)
Posted patch updated patch, wip 6 v2 (obsolete) — Splinter Review
This should restore the equivalent of the old safety checks. I'm curious to find out what's happening though before we paper back over this, so I added #if DEBUG logging that should tell us what sites the exception (and the assertion, which may well be the ultimate source) are happening for.
Assignee: dan.j.weber → stuart.morgan+bugzilla
Attachment #482853 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Here's my scary logging results.  Still using the same history file I copied over to the test profile last night.

I trimmed everything that wasn't obviously related to history (all the DOMWINDOW crap, "missing dictionary" logspam, repeat plug-in discovery, etc.).  "Loading" is from startup until it appeared that all pages had finished loading, "After Loading" is from that point until I went to Cmd-Q (examining the initial flood of logspam), and "On Quit" is stuff that appeared after I hit Cmd-Q.  "Open tabs" is the list of restored tabs (tabs that loaded on launch) direct from this profile's WindowState.plist

The "History title change notification" entries seem mostly to be for iframe-like things that, AFAIK, shouldn't even be showing up in history, except the MXR link in there is clearly not an iframe (I also saw one of these notifications for a bug URL in a second run, which again isn't likely to be an iframe).

The "History removal notification" entries look like they're about URLs that are in fact not in history any longer.

The "0 result nodes in OnVisit" entries look like they're all for the pages that are currently being loaded (the tabs being restored on launch).
(In reply to comment #88)
> The "History title change notification" entries seem mostly to be for
> iframe-like things that, AFAIK, shouldn't even be showing up in history

Okay, I feel better know that there are normal cases that can happen.

> The "History removal notification" entries look like they're about URLs that
> are in fact not in history any longer.

Hm, I wonder why didn't have history items for them then. Maybe:

> The "0 result nodes in OnVisit" entries look like they're all for the pages
> that are currently being loaded (the tabs being restored on launch).

This is worrying. Maybe the 1-visit query logic is wrong somehow? I'll make a test patch that does the query with out it if the first one fails and logs some info.
(In reply to comment #90)
> (In reply to comment #88)
> > The "History title change notification" entries seem mostly to be for
> > iframe-like things that, AFAIK, shouldn't even be showing up in history
> 
> Okay, I feel better know that there are normal cases that can happen.

Is it possible that you're hitting a variant of bug 148794 or bug 293417 (or bug 421131) ?
Comment on attachment 483795 [details] [diff] [review]
history data source async loading, v2 [landed]

Doh, pressed Return in the wrong field. This is just the async loading (see above) with the safety checks re-added.
Attachment #483795 - Flags: superreview? → superreview?(mikepinkerton)
The failures in onTitleChanged (comment 79) are presumably partly related to bug 553156, since those are the cases the safety checks were already complaining about in March.  I'm concerned by the fact that that check seems to fail pretty randomly for no good reason (including on a second visit, where it really shouldn't).
Comment on attachment 482854 [details] [diff] [review]
trie improvements [landed]

sr=pink
Attachment #482854 - Flags: superreview?(mikepinkerton) → superreview+
Comment on attachment 483795 [details] [diff] [review]
history data source async loading, v2 [landed]

this is more of a rubber-stamp than an sr, as I'm not too familiar with what's been going on here.

I would like to see some more function-level comments that describe the implementation and what's going on, as it looks reasonably complex and more top-level comments would help maintainability in the future.
Attachment #483795 - Flags: superreview?(mikepinkerton) → superreview+
Comment on attachment 482854 [details] [diff] [review]
trie improvements [landed]

Trie improvement landed as http://hg.mozilla.org/camino/rev/73ac95b72851
Attachment #482854 - Attachment description: trie improvements → trie improvements [landed]
Comment on attachment 483795 [details] [diff] [review]
history data source async loading, v2 [landed]

Landed as http://hg.mozilla.org/camino/rev/a207f88ff418 with more function-level and high-level comments as requested.
Attachment #483795 - Attachment description: history data source async loading, v2 → history data source async loading, v2 [landed]
Also landed http://hg.mozilla.org/camino/rev/e8516c0f640d since I forget to hg add the new file in my commit workspace :P
Posted patch updated patch, wip 7 (obsolete) — Splinter Review
Same as wip6, but rebased against the new trunk, and with one of the query restrictions on the history add query removed so we can get better logging of the weird failure cases.
Attachment #483390 - Attachment is obsolete: true
Posted file exception+gdb output (obsolete) —
(In reply to comment #100)
> Created attachment 488429 [details] [diff] [review]
> updated patch, wip 7

Stuart, this patch throws exceptions on any attempt to edit any bookmark properties:

Exception raised during posting of notification.  Ignored.  exception: '*** -[NSURL initWithString:relativeToURL:]: nil string parameter'  invoked observer method: '*** -[AutoCompleteDataSource bookmarkChanged:]'  observer: 0x160104e0  notification name: 'bookmark_changed'

Backtrace attached.

philippe also had this WIP eat his tab-group shortcuts and discard any editing attempts; I couldn't repro in my debug build, but I believe the static/opt could behave differently (and/or the 10.5sdk)
An opt build with 10.5sdk/gcc4.2 - running on 10.6

* at startup, ~1 second after ChimericalConsole is loaded, I always get one
*** -[NSURL initWithString:relativeToURL:]: nil string parameter

and randomly during the session, I get again one as above, or
Mozilla has caught an Obj-C exception [NSInvalidArgumentException: *** -[NSURL initWithString:relativeToURL:]: nil string parameter]

* attempting to edit a bookmark in the BM manager logs:
11/7/10 11:08:38 AM	Camino[87747]	Exception detected while handling key input.
Visually, the input is not accepted (somehow the field remains focussed after pressing enter).
Cmd-I on the affected BM shows that the input was actually accepted.
(happens for both tab groups and regular bookmarks).

An opt build with 10.4usdk/gcc4.0

* at startup I get an endless series of

11/7/10 12:27:42 PM	Camino[67256]	Exception raised during posting of notification.  Ignored.  exception: '*** -[NSURL initWithString:relativeToURL:]: nil string parameter'  invoked observer method: '*** -[AutoCompleteDataSource bookmarkChanged:]'  observer: 0x1b644ec0  notification name: 'bookmark_changed'

This does not happen with TroubleShoot Camino (but in that case I have 0 history).

The bookmark editing issue does not happen with the 10.4sdk.

----
I'm not clear why my tab group shortcuts suddenly vanished this AM. I didn't see anything logged in console. Yesterday, with a previous build they were working properly.
Sorry, I must have botched something during the rebase to ToT. I'll spin a new patch when I get a chance.
Comment on attachment 488429 [details] [diff] [review]
updated patch, wip 7

Doh, I didn't botch a merge, I'm just an idiot. I apparently had some random early stage in-progress changes in the tree and forgot while bouncing between patches.
Attachment #488429 - Attachment is obsolete: true
Posted patch updated patch, wip 7v2 (obsolete) — Splinter Review
Assuming no other half-finished code slipped in, this should be usable.
(In reply to comment #105)
> Created attachment 489415 [details] [diff] [review]
> updated patch, wip 7v2

This works fine now.
When all is said and done, we'll need some people to take things for a spin on PPCs, because apparently, even though we haven't heard from anyone, the performance on PPCs of the current impl is absolutely deathly: http://forums.mozillazine.org/viewtopic.php?f=12&t=2033081
Posted patch updated patch, wip 8 (obsolete) — Splinter Review
Now with async bookmark loading, bookmark change watching, async handling of history and bookmark changes, and an attempt at handling schemes reasonably.

I think other than needing to do some reworking once my big HistoryDataSource patch lands this is pretty much ready to go for a1 as it stands now, so please give it a good tire-kicking.

There's still plenty that could be done (some more perf improvements, better handling of non-ascii title queries, returning to async search), but I think they can all be done post-alpha.
Some weird logging & sequence of events happened today (see attached log)
1. start up browser
1a. (browser opens with a blank tab, but two tabs were still open on quit - session saving didn't kick in)
2. during the session, clicking on links in NNW wouldn't open in Camino.
3. quit browser (with 1 open tab)
4. open browser again --> post crash session recovery dialog.
5. browser starts up with the two tabs that were open before [1]…
6. everything runs fine, no data loss that I can detect.

Prior session (before [1] above) I had 2 Flash crashes on
http://videos.arte.tv/fr/videos/faiseurs_d_euro-3577250.html
then some more browsing around, including adding one bookmark. The 2 crashes are the only out-of-the-ordinary that happened.

Note: the collection listed in the log is the one that was open in the bookmark manager (Bookmark Menu)

Wip 8 has been running smoothly ever since it was posted - no other patches in the build (up-to-date build.
Looks like we may have thrown an exception during initial load, which could cause lots of weird behavior. Given than this patch's stuff is all async I'm not convinced it's actually related.
Posted patch FixSplinter Review
I think we're finally ready to go here. I'll file bugs for all the TODOs once this lands.

I'm afraid this is another large patch, but I don't have a great way to break it up since all the intermediate stages are hopelessly out of date at this point.

Here's the overview of what it does:
- It switches the autocomplete system to using two tries, one for bookmarks and one for history
- It makes the autocomplete system a shared object, and adds history/bookmark change listeners, so the tries aren't being rebuilt all the time.
- It makes all the bookmark and history trie-building async (it does all the bookmarks, then all the history, to avoid CPU contention). Because of this, it adds a extra layer to change notification handling, passing them all through a helper object that can be paused and resumed (so that changes that happen while the trie is still being built are processed when it's done).

To make scheme handling work out there is some trickery with reserved unicode characters; hopefully the comments there are clear about what's being done and why.
Attachment #488719 - Attachment is obsolete: true
Attachment #489415 - Attachment is obsolete: true
Attachment #490559 - Attachment is obsolete: true
Attachment #500708 - Flags: superreview?(mikepinkerton)
Comment on attachment 500708 [details] [diff] [review]
Fix

sr=pink

i wish this all didn't have to be on the main thread, but it would still need to be chunked in order to avoid locking the tries for the entire build time. maybe moving one day to GCD might help this some (since it's already chunked)

are we allowed to use fast-enumeration yet?
Attachment #500708 - Flags: superreview?(mikepinkerton) → superreview+
Landed http://hg.mozilla.org/camino/rev/49f6d269589e

I'll update with links to follow-up bugs shortly.
OS: Mac OS X → Windows 7
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: camino2.1a1? → camino2.1a1+
OS: Windows 7 → Mac OS X
Resolution: --- → FIXED
Follow-ups:
Bug 504876: Optimize the common case--incremental typing--where subsequent queries share a prefix with the previous query.
Bug 622559: Try to fix the long-ish hang at startup for people with large history.
Bug 622560: Make autocomplete queries async again
Bug 622563: Optimize updates that don't change keywords
Bug 622565: Improve non-ASCII autocomplete
Bug 622569: Use a segmenter for generating title keywords
You need to log in before you can comment on or make changes to this bug.