Open Bug 166793 Opened 22 years ago Updated 16 years ago

Think about memoising database results.

Categories

(Bugzilla :: Bugzilla-General, enhancement)

2.17
enhancement
Not set
normal

Tracking

()

People

(Reporter: CodeMachine, Unassigned)

Details

We should think long term about memoising results from the database in order to
gain performance benefits.

In other words, remember the database results as long as you're sure they
haven't changed.

This is kind of like the versioncache, but is much more reliable, and is fully
compatible with transactions.

The first way to do things is to use serialised transactions, so we know the
database won't appear to change beneath us during the transaction.  In this
case, any functions that have memoised can register an observer with the
"transaction manager".  When the transaction completes, the transaction manager
flushes the memoes by calling the observers.

Apparently some databases have a NOTIFY feature to observe changes, which we
could use.  With this, you could hold onto data between transactions (until you
get notified of changes), which will be increasingly useful when mod_perl
support arrives.  Although only some databases support this, it's a performance
feature only, so we could comfortably support it.

I think it could be racy, but I think as long as we use serialised transactions
we are OK, because if we haven't been notified of changes at the start of a
transaction we won't be during the transaction.   We should check this out if we
go down this road.  But in any case the chance of this is assumedly much smaller
than the chance of getting races in our current system with no transactions.
Priority: -- → P4
Target Milestone: --- → Future
Sample API

Plan 1: Within Serialised Transactions Only

Bugzilla::Transactions::RegisterMemoisation($tableslistref, $observerref)

If we're in a serialised transaction, register the observer, return true.
Else, do nothing, return false.

Bugzilla::Transactions::EndTransaction

Call all observers, flush the observer list.

Plan 2: Between Transactions With NOTIFY

Bugzilla::Transactions::RegisterMemoisation($tableslistref, $observerref)

Register the observer.
Remember what tables the observer watches.
Increment the reference counts for the NOTIFY for these tables.
If we want NOTIFY on new tables (ie ref counts go from 0->1), do so.

Bugzilla::Transactions::Notify($changedtablelistref)

Work out all observers that were watching an of these tables.
Call the observers.
Decrease the reference counts for each watched table.
UNNOTIFY tables whose refcount is now 0.
Remove all the observers we found.


This doesn't take into account my notion that the use of a memoised value might
need to be within a serialised transaction, even though its computation might
have been a previous serialised transaction.
Actually we don't need refcounts.  Whenever we are notified about a table change
all the observers watching the table go away.

I'm not entirely sure to what granularity you can notify but I think it would be
at least as fine grained as table by table.

One application of this is remembering about groups to speed up the new groups
system.  For example if you remember there are no groups at all (which means
watching the uncommonly changed groups tables), you know everyone can see every bug.

Similarly, if you remember a user is in all groups, you can remember they can
access all bugs.  Changing the users table is probably more common, but still
probably not too common, especially if we split off user prefs.
Eww.

This iws _way_ overdesigned.

Look, you're forgetting something important - with the exception of buglist.cgi,
our queries are really quite simple.

buglist.cgi wouldn't benefit from this, because the filtering would be too
expensive, and when we have to query, getting the summary out of the db when we
get the result is baiscally free.

NOTIFY is pg only, and we don't support that. Plus, there aren't perl bindings
for it AFAIK.

You're solving a non problem. I can get the product_id based on a product at the
rate of 1500-1700 times _per second_. If I preprepare the statement handle, then
I can do so at almost 6500 queries _per second_, and just using prepare_cached
gives me 4400.  We get an increase of about 20% on the other case when I get rid
of the perl wrappers I currently have.

Yes, hashs are faster. But honestly, bugzilla has other problems in reaching
6500 queries per second.
It was only the other day you talked about remembering results.  How is this
unnecessary?

As for NOTIFY, which part of "long term" didn't you read?

> buglist.cgi wouldn't benefit from this, because the filtering would be too
> expensive, and when we have to query, getting the summary out of the db when
> we get the result is baiscally free.

I don't understand what you're talking about.

My point with the buglist stuff, is that say we spend X seconds doing a query.
now, of that X seconds, the vast majority is spent on the query constraints
rather than just getting the summary out of the db, so a cache wouldn't help us...
The primary example of what I was talking about _was_ about caching query
conditions, ie remembering what groups a user was in.
Reassigning all of my "future" targetted bugs to indicate that I'm not presently
working on them, and someone else could feel free to work on them. (sorry for
the spam if you got this twice, it didn't take right the first time)
Assignee: justdave → nobody
QA Contact: mattyt-bugzilla → default-qa
Target Milestone: Future → ---
Assignee: nobody → general
Priority: P4 → --
You need to log in before you can comment on or make changes to this bug.