Closed Bug 572030 Opened 14 years ago Closed 14 years ago

Use a memory cache for keywords associations

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b4
Tracking Status
blocking2.0 --- -

People

(Reporter: mak, Assigned: mak)

References

Details

(Keywords: perf)

Attachments

(1 file, 3 obsolete files)

Attached patch patch v1.0 (obsolete) — Splinter Review
We don't expect to have an incredible amount of keywords.  Since one of the most common operations is to check if a keyword exists, a memory hash (keyword->url) can be populated on first need and maintained sync.

The attached patch implements using hash for all keywords methods.
- test_sorting.js is failing on keywords sorting, check why (maybe same keyword is set on multiple bookmarks?)
- removing a bookmark could remove the keyword if it's the only bookmark associated with that url, the hash needs to be updated in such a case.
- setting keyword for a bookmark will set it for all bookmarks with the same URI (the idl was unclear about this, but most of keywords functionality is per URI and comments were indeed pointing half one direction, half the other one.

An alternative approach could just have a property that invalidates the cache on changes and refills the hash if invalid.
Keywords: perf
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Attached patch patch v1.1 (obsolete) — Splinter Review
This fixes everything I've found.

The new behavior is that a keyword is associated to a url, this was the right behavior from start. So when associating to a bookmark it is associating to all bookmarks with that uri. When removed from a bookmark it is removed from all bookmarks with that uri. Unfortunatly the database sucks regarding this, but the old code was not much better than the new one.

The test is picky enough to check sync between the hash and the db.
From places stats I've seen almost any user has less than 32 keywords, some users have between 32 and 64, very rare users have about 100, 1 user has 600 (but I don't care about this use-case because it just means user is not using keywords for what they were intended for).

Advantages of this approach is that we hit the db only on first request, next ones are handled from memory, so locationbar getShortcutOrURI can be fast enough and does not cause locking (See the blocked bug). Plus I can implement the above correct handling since all checks are in memory.

Disadvantage is that on bookmark addition/removal I have to do a bit of additional work to check if the hash has to be updated. This stuff hits the memory hash for the most part.
Attachment #451182 - Attachment is obsolete: true
Attachment #452123 - Flags: review?(dietrich)
Flags: in-testsuite?
Comment on attachment 452123 [details] [diff] [review]
patch v1.1

asking ui-review on the behavior change.

While now it is possible to associate the same keyword to different URIs and different keywords to the same URI, with this patch the keyword<->URI association would be a simple 1:1.
Would like to know if makes more/less sense from a UX point of view.
Attachment #452123 - Flags: ui-review?(faaborg)
(In reply to comment #2)
> with this patch the keyword<->URI association would be a simple 1:1.
Autocomplete code will probably need to change as it currently shows one entry for each keyword it finds. So if a uri is bookmarked multiple times with a keyword, we'll want to make sure only one result shows up.

Also, autocomplete will find multiple bookmarks of the same keyword, so at least autocomplete/test_keyword_search.js needs to be updated to not find both results assuming only one url can have a keyword.

Also, does/should this do anything with the keyworded search-engine keyword namespace?
Previously the last-created-keyword-to-URI mapping would win, right?
For just typing a keyword into the location bar without selecting any results, it seems to use the oldest bookmark by creation time or id -- not whichever one was given the keyword last. (Create 2 bookmarks with keywords and give a new keyword for the one that was winning then switch it back, and it's the same one.)

The suggestions will show each keyworded bookmark to pick.
(In reply to comment #5)
> The suggestions will show each keyworded bookmark to pick.

That doesn't make much sense, if both uris are the same. If uris are different than user is using keywords as tags, that does not make sense still.
Comment on attachment 452123 [details] [diff] [review]
patch v1.1

a very small fraction of our user base will likely be vocal on the change, but a 1:1 mapping will make it easier for users to anticipate the results of their actions, and also sets us up better for potentially transferring existing keywords to a ubiquity-ish graphical command line in the future.
Attachment #452123 - Flags: ui-review?(faaborg) → ui-review+
As long as someone can have something like:

foo http://foo.com/q?%s
foo2 http://foo.com/q?style=2+%s

I agree with comment 8
(In reply to comment #9)
> foo http://foo.com/q?%s
> foo2 http://foo.com/q?style=2+%s

yeah, those are different uris
I think a much better approach would be to just cache the entire list of keywords in an array, and add a method like keywordExists to the nsINavBookmarksService interface which looks for the keyword in that cache, and then call that function from PlacesUtils.getURLAndPostDataForKeyword before actually hitting the DB to look for the keyword.  This won't cause any behavioral change whatsoever.  I'll write a patch which does that tonight.
Assignee: mak77 → ehsan
Blocks: 581978
So, from ng, we found that there are interesting use-cases for having a keyword linked to multiple uris (like having a "map" keyword that can be linked to Google maps as well as Live maps). So we don't want behavioral changes for now, these are nice replacements for actions.

Ehsan's approach was the first try. Probably there is no reason for an additional interface since the check can just be done inside getURIForKeyword. But we are still spending a bunch of time building the cache, and adding one additional step hash check before the old stuff. That's why I thought of making that time, that we have to spend regardless, more useful, caching uris too, so it was an expense to do just once, but has the problem with non 1:1 relations.

Still, filing the cache and maintaining it updated is a pain regardless the approach, we have users with >100 keywords.
> we have users with >100 keywords.

You're going to have a lot more with Firefox Sync -- I migrated my entire delicious.com database which has 144 tags into Places keywords (resulting in bug 582396).

It's not trivial to migrate tagged bookmarks because you need to import/export/merge bookmarks in JSON, not the default HTML format, but there are conversion tools out there on the web like http://delicious-to-firefox3.heroku.com/
(In reply to comment #12)
> So, from ng, we found that there are interesting use-cases for having a keyword
> linked to multiple uris (like having a "map" keyword that can be linked to
> Google maps as well as Live maps). So we don't want behavioral changes for now,
> these are nice replacements for actions.
> 
> Ehsan's approach was the first try. Probably there is no reason for an
> additional interface since the check can just be done inside getURIForKeyword.
> But we are still spending a bunch of time building the cache, and adding one
> additional step hash check before the old stuff. That's why I thought of making
> that time, that we have to spend regardless, more useful, caching uris too, so
> it was an expense to do just once, but has the problem with non 1:1 relations.

Caching the URIs in addition to keywords means that we need to take care of updating the cache any time the places entries for that change, and that's not so much fun.  If we only cache the keywords, we only need to rebuild the cache when the keywords set changes.

> Still, filing the cache and maintaining it updated is a pain regardless the
> approach, we have users with >100 keywords.

I don't see why that is.  100 string comparisons is still a huge win compared to hitting the disk, right?
(In reply to comment #13)
> You're going to have a lot more with Firefox Sync -- I migrated my entire
> delicious.com database which has 144 tags into Places keywords (resulting in
> bug 582396).

So, why are those being converted to keywords rather than proper tags?

(In reply to comment #14)
> Caching the URIs in addition to keywords means that we need to take care of
> updating the cache any time the places entries for that change, and that's not
> so much fun.  If we only cache the keywords, we only need to rebuild the cache
> when the keywords set changes.

That we can't know because keywords are removed by a trigger when they are no more used. Otherwise any bookmark removal has to check if a keyword is associated to it and if we have to read that from the db any bookmark change would become damn slow. Maybe we could put the bookmark id as the hash value, would still help a bit cache updates.

> > Still, filing the cache and maintaining it updated is a pain regardless the
> > approach, we have users with >100 keywords.
> 
> I don't see why that is.  100 string comparisons is still a huge win compared
> to hitting the disk, right?

First we would hit the disk on first call to read all of them, and there's no shortcut. The only one I could think of is to copy the table to a temp table, not sure it's cheaper though.
maintaining the cache updated is expensive for the above reasoning, any bookmark removal can cause changes and we should read keyword association for each removed bookmark. So could be my above hypothesis of caching bookmark ids rather than URIs could help here a bit.
So apparently things are more complicated than I realized first.  Marco, are you willing to pick this up again?
Assignee: ehsan → nobody
Assignee: nobody → mak77
Summary: Use a memory cache for keyword to uri associations → Use a memory cache for keywords associations
>> You're going to have a lot more with Firefox Sync -- I migrated my entire
>> delicious.com database which has 144 tags into Places keywords (resulting in
>> bug 582396).

> So, why are those being converted to keywords rather than proper tags?

Sorry, they're not. I came into this bug from 563538, which this blocks, and didn't read it carefully.

I had 15 search keywords, which I just deleted, and I still get startup hangs every time with the profile I sent you. Keywords may still be slow, but they're not the cause of my hangs.
ok, thanks for the follow-up.
Attachment #452123 - Attachment is obsolete: true
Attachment #452123 - Flags: review?(dietrich)
Attached patch patch v2.0 (obsolete) — Splinter Review
this adds an itemId->keyword hash, since a bookmark can have just one keyword but the opposite would not be true. As I said before storing only keywords would pay same price for less benefit, since then on each bookmark removal we should hit the db to check if ti has a keyword. On the other side this approach causes us to have to walk all hash entries till we find a keyword, but it's still better than locking on the db and common cases have a few entries.

Note: this applied on top of temp tables removal, making this land before is not hard though, just matter of fixing the queries, but I have to order patches in my queue in some way.
Attachment #463269 - Flags: review?(dietrich)
Comment on attachment 463269 [details] [diff] [review]
patch v2.0

notify once when updating a keyword, as talked about on irc. r=me otherwise.
Attachment #463269 - Flags: review?(dietrich) → review+
Attached patch patch v2.1Splinter Review
Attachment #463269 - Attachment is obsolete: true
Comment on attachment 464066 [details] [diff] [review]
patch v2.1

asking approval, this code is touched in various parts of the browser ui and locationbar and this avoids hitting the database for each request.
Attachment #464066 - Flags: approval2.0?
Comment on attachment 464066 [details] [diff] [review]
patch v2.1

a=sdwilsh
Attachment #464066 - Flags: approval2.0? → approval2.0+
This bug's patch, which landed this morning, looks like it might have
caused nsINavBookmarksService.getKeyworkForURI failures in a bunch of
xpcshell tests.  Here are a couple of examples (the first on OS X, the
second on Linux):

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281372345.1281373414.16202.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281371418.1281372723.13221.gz
Yes, Marco has a fix for this, but build asked him to wait to push it because IT is playing with the proxy for hg (bug 585645)
http://hg.mozilla.org/mozilla-central/rev/5037343f92fc
and follow-up
http://hg.mozilla.org/mozilla-central/rev/ed6dc0b505cc
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Flags: in-testsuite? → in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b4
backed out because conflicting with temp tables patch, will invert patches and reland apart without this dependency.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
http://hg.mozilla.org/mozilla-central/rev/e1d20276ef6d
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Hi, question, did this make keywords case-sensitive?
(In reply to comment #29)
> Hi, question, did this make keywords case-sensitive?

For sure that would be an unwanted effect, you should immediately file a regression bug, feel free to assign it to me and ask blocking on it
keywords should always be lowercased internally and case-insensitive, I suspect there is a typo here

1.320 +  searchData.keyword.Assign(aKeyword);

should not use aKeyword, but keyword. maybe it's safer to change this variables naming hell. Also, I thought we had a test for this, looks like we HAVE to add one.
Depends on: 590081
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: