Last Comment Bug 391294 - Shared Database implementation really slow on network file systems
: Shared Database implementation really slow on network file systems
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: trunk
: All All
: -- normal (vote)
: 3.12
Assigned To: Robert Relyea
:
Mentors:
Depends on:
Blocks: 217538
  Show dependency treegraph
 
Reported: 2007-08-07 17:31 PDT by Robert Relyea
Modified: 2013-12-20 04:03 PST (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Several performance improvements, most notably the use of a cache for shared file systems. (35.89 KB, patch)
2007-09-11 15:17 PDT, Robert Relyea
no flags Details | Diff | Splinter Review
Version 2: Several performance improvements, most notably the use of a cache for shared file systems. (32.86 KB, patch)
2007-09-11 17:07 PDT, Robert Relyea
nelson: review-
Details | Diff | Splinter Review
Version 3: Several performance improvements, most notably the use of a cache for shared file systems (36.04 KB, patch)
2007-09-21 12:18 PDT, Robert Relyea
nelson: review+
Details | Diff | Splinter Review

Description Robert Relyea 2007-08-07 17:31:15 PDT
sqlite3 is pretty slow when running on Shared file systems. Combine that issue with the bug 391292, and you get really slow response on shared file system.
Comment 1 Nelson Bolyard (seldom reads bugmail) 2007-08-07 18:17:30 PDT
Bob, 
by "Shared file systems", I gather you mean network-accessed file systems?
Comment 2 Robert Relyea 2007-09-11 15:17:49 PDT
Created attachment 280518 [details] [diff] [review]
Several performance improvements, most notably the use of a cache for shared file systems.

This patch contains the following changes:

1) Creation of a cached copy of the database using sqlite's temporary table functionality. This cache is only used for read operations, and only in a shared filesystem. Whether or not the sqlite database is in a "shared filesystem" is determined by the relative performance between getting the access status of a non-existant file between the filesystem where the sqlite database resides and the local temp filesystem. Profile determined the majority of the time spent on shared filesystems was trying to determine if a rollback file existed.

2) changing the underlying model from one new database handle per operation to
a common one opened at startup. This is necessary because the cache is only visible in the handle it was created in and disappears when that handle is closed. The code now documents the correct thread sharing rules for database
 handles. Write operations still go through their own database handle (also bypassing the cache). NOTE: This also significately increased the performance of the non-shared database case. 

3) Creation of explicit indexes for the most commonly accessed attributes. The combination of 2 and 3 moves the sqlite performance back to sub-second response for listing my over 100 entry database file when running on a local filesystem. (and down to under 2 seconds when running on a shared filesystem).

How the cache works.

On startup, if we find that the filesystem the database lives on significantly slower than the one the temporary database files live on, we initialize a temporary Table which is a copy of our database.

Whenever the table is older than 10 seconds, we create a new copy before we continue. NOTE: since we are using SQLITE primitives to create the cache, SQLITE guarrentees our cache will be self-consistant. Monitors protect access to the
cache (as well as access to the shared readonly database handle).

All writes happen in a transaction. On the closing of a transaction, we update the database so local threads will immediately see our change.

Some global 'cosmetic' changes in the patch as well:

-Removal of SQLITE_THREAD_SHARED_DB define. This code attempted to do what this patch now does correctly in change 2 (namely go to a single database).
-PRLock is now a PRMonitor (sigh).
-sqlite3_pepare changed to sqlite3_prepare_v2 (slightly nicer semantic on database SCHEMA changes when running).
-rolled several sqlite3_open();sqlite3_busy_timeout(); calls into a single function. this function now takes open flags. Currently the flags are unused, but in the future when we have access to sqlite3 3.5.0 we can make the global ReadDB handle truly readonly.
-sdb_openDBLocal() now has a new optional paramter: table. This tells the underlying code what table to use (the direct database table, or the temporary cache table). Only FindObjectsInit and GetAttributes uses the table (all other accesses are either write accesses or access the metaData table, which is not cached).

nelson, if you are to overloaded, feel free to pass this off to someone else.

bob
Comment 3 Robert Relyea 2007-09-11 15:21:10 PDT
Oh, one more thing: some of the comments have /*. and .*/. These were not in the original code, I added them to get the diff to format more nicely for humans.
Comment 4 Robert Relyea 2007-09-11 15:23:23 PDT
I never answered nelson's question: Yes shared filesystem == network filesystem in the comments above.

bob
Comment 5 Robert Relyea 2007-09-11 16:33:43 PDT
Comment on attachment 280518 [details] [diff] [review]
Several performance improvements, most notably the use of a cache for shared file systems.

There is a minor flaw in this patch I'll attach a new one shortly.

(no big deal, just a crash in the sdb_init;).
Comment 6 Robert Relyea 2007-09-11 17:07:25 PDT
Created attachment 280531 [details] [diff] [review]
Version 2: Several performance improvements, most notably the use of a cache for shared file systems.

OK, this one passes all.sh for all database types.
Comment 7 Nelson Bolyard (seldom reads bugmail) 2007-09-12 17:30:15 PDT
All file systems are shared.  Please don't use "shared" to mean "network" 
or "slow".
Comment 8 Robert Relyea 2007-09-12 18:34:24 PDT
err.. that's common usage just google "shared filesystem" and you'll find all sorts of networked filesystem products (/share on unix meant things shared amoung serveral machines, etc.).

I do recognize that talking about shared filesystem (meaning networked filesystem) and shared database (meaning shared among processes on single machine) could lead to confusion, so I'll try to remember to use the term 'networked filesystem' (though that may be confused with a particular implementation;). If I slip blame in 22 years of prior training....;)


bob
Comment 9 Nelson Bolyard (seldom reads bugmail) 2007-09-12 21:20:48 PDT
Function sdb_strndup will undoubtedly be the source of many UMR error 
reports, since it copies data past the terminating NULL character of the 
input string.  Think of all the complaints we've had about UMRs in our
optimized RC4 code, and multiply that by some constant greater than 1.
Comment 10 Robert Relyea 2007-09-13 10:05:27 PDT
I can fix that, I didn't since it is a static function with exactly one caller that already guarrentees that it can't go passed then end and the function documents the restriction (in case someone copies it somewhere).

Still the cost to fix it and remove the restriction is pretty low.

bob
Comment 11 Nelson Bolyard (seldom reads bugmail) 2007-09-17 20:20:04 PDT
Comment on attachment 280531 [details] [diff] [review]
Version 2: Several performance improvements, most notably the use of a cache for shared file systems.

Here are my full review comments, in addition to comment 7 and comment 9
above.  I think these comments add up to r-, but am willing to be persuaded
otherwise. 

1. In struct SDBPrivateStr, there was a PRLock named "lock", and there are
numerous comments "access protected by lock" or "use protected by lock".
Your patch eliminates that lock, and adds a PRMonitor named dbMon, but the 
comments all still say "protected by lock".  They should be changed to 
say "protected by dbMon".

2. In sdb_getTempDir(), if the second call to sqlite3_exec() fails, don't 
you still want to drop the table?  Isn't this a leak of sorts?

3.  In sdb_measureAccess(), delta is signed and can be less than zero under 
infrequent and improbable circumstances.  When it is, you return a negative 
value, but the comment says:
+    /* always return 1 or greater */
+    if (delta == 0) delta = 1;
I suggest
+    if (delta <= 0) delta = 1;

4. In sdb_openDBLocal(), given that NSS will use the "system" sqlite3 on 
some platforms, and that (according to the comments, paraphased here) 
"Older versions couldn't share sqlite3 handles across threads at all", 
shouldn't there be some sqlite3 version test here, to avoid trying to 
share handles with versions that don't support it?

5. Please explain why sdb_openDBLocal uses a monitor instead of a lock.
I gather there is some circumstance in which a thread can call this function
while it is already holding the monitor.  Please explain how that can happen.
Perhaps a stack would provide a succinct explanation.  It's OK to explain it 
in a bug comment, especially if the explanation is lengthy, but a code 
comment near the declaration of struct SDBPrivateStr might be even better.  

6.     sqlerr = sqlite3_busy_timeout(*sqlDB, 1000);  
1000 is a magic number.  What are the units?  
Let's have a manifest symbolic constant instead.  
I suggest defining it near the top of the file.

7. I don't know what sqlite3_prepare_v2 does, so I'm ignoring it in this
review, perhaps to our mutual peril. :-/

8. In sdb_init() we see:

    if (enableCache) {
	/* try to set the temp store to memory. don't fail if we
         * can't do this. */

but all the error cases inside that basic block "goto loser".  It appears
to me that going to loser means that sdb_init fails.  The sdb_p and sdb 
structures don't get initialized in that case.  So, either the comments
are wrong, or the code is wrong, I think.  (but if not, please explain).
Comment 12 Robert Relyea 2007-09-18 11:29:45 PDT
Thanks fo the review!

> 1. In struct SDBPrivateStr, there was a PRLock named "lock"....
> They should be changed to say "protected by dbMon".

good point, I'll do that in a corrected patch.

> 2. In sdb_getTempDir(), if the second call to sqlite3_exec() fails, don't 
> you still want to drop the table?  Isn't this a leak of sorts?

You are correct, I'll move the test below the drop.

> 3.  In sdb_measureAccess(), delta is signed and can be less than zero under 

Actually delta is a PRIntervalTime which is defined a a PRUint32 (unsigned).
In the rare case the 'next' is actually less than 'time', it means we rolled over, and we want to treat the  result as an unsigned vaule.

> 4. In sdb_openDBLocal(), given that NSS will use the "system" sqlite3 on 
> some platforms, ....... shouldn't there be some sqlite3 version test here....

Good point, I should be able to add that. SQLITE also has an interface that tells us it it was compiled with thread safety was well...

> 5. Please explain why sdb_openDBLocal uses a monitor instead of a lock.

Here's the little that is in the code currently (above the declaration of the monitor):

 * and is used for all read operation. It's use is protected by a monitor. This
 * is because an operation can span the use of FindObjectsInit() through the
 * call to FindObjectsFinal(). In the intermediate time it is possible to call
 * other operations like NSC_GetAttributeValue */

sdb_openLocalDB()/sdb_closeLocalDB() are typically called in pairs within a single function. The only exception is sdb_FindObject*. The sdb_FindObjectsInit() function opens the database, initialized the sqlite statment, and then returns. sdb_FindObjects repeatedly executes the sqlite
statement until it has either fulfilled the callers request for objects, or it
has run out of objects to provice. The caller can repeatedly call sdb_FindObjects() to get more objects. When the caller has found everything the
caller is looking for, the caller the calls sdb_FindObjectsFinal() which frees the sqlite statement and 'closes' the database (calls sdb_closeLocalDB()). In the middle of this operation, the caller could call other sdb_ functions (like sdb_GetAttributeValue() using the objects the caller had just acquired). Since the caller is holding the lock over whole sdb_FindObjects* operation, a regular
lock would block at sdb_GetAttributeValue().

In practice the only place this happens is update. Softoken (in all other cases) calls sdb_FindObjectsInit(), repeatedly sdb_FindObjects() and finally sdb_FindObjectsFinal() without an intervening call.

> 6.     sqlerr = sqlite3_busy_timeout(*sqlDB, 1000);  

Good point, I didn't even know the units until just now (milliseconds).

> 7. I don't know what sqlite3_prepare_v2 does,

sqlite3_prepare_v2 works just like sqlite3_prepare except that it automatically
recompiles the statement if it gets a SQLITE_SCHEMA event. It basically means
that sqlite3_prepare_v2 will succeed in more cases than sqlite3_prepare.

> 8. In sdb_init() we see:
> /* try to set the temp store to memory. don't fail if we
>  * can't do this. */

That comment only applies to the statement immediately following the comment. (sqlite_exec with no error check).

The pragma tells the database, use memory instead of disk for your temp
temp storage. The rest of the block is setting up the actual cache, whose failure to set up is currently fatal. That semantic can be relaxed if you want.

Let me know your preference and I'll get a new patch to address the other issues shortly.

Thanks

bob




	







 
Comment 13 Nelson Bolyard (seldom reads bugmail) 2007-09-18 12:35:42 PDT
#3, if PRIntervalTime is signed, I think there are some bugs in other code...
#5, thanks for that explanation
#6, One second seems way too short for DB operations in the general case,
especially when processes are contending for a slow resource. 
If this statement sets a timeout that lasts for (say) the lifetime of 
the database handle, then I think the value needs to be increased.
#7, thanks for that explanation
#8, The comment in confusing, in part because it occurs at the beginning of 
a basic block.  It appears to be applicable to the whole basic block.  
I suggest splitting the comment into two comments.  Before the call, say
"try to set the temp store to memory.", and after the call say 
"ignore failure of preceding call".  
Comment 14 Robert Relyea 2007-09-18 14:37:13 PDT
> #6
any suggestions to what to increase it to?

>#8
No problem, I'll adjust the comment.

bob
Comment 15 Robert Relyea 2007-09-21 12:18:49 PDT
Created attachment 281853 [details] [diff] [review]
Version 3: Several performance improvements, most notably the use of a cache for shared file systems

Updated based on review comments.

I did not add code for the 'older' versions of shared db because the patch to fix the older versions is 3.3.1 and this code requires at least 3.3.9 just to build (based on the prepare_v2 interface).
Comment 16 Nelson Bolyard (seldom reads bugmail) 2007-09-24 21:04:38 PDT
Comment on attachment 281853 [details] [diff] [review]
Version 3: Several performance improvements, most notably the use of a cache for shared file systems

Bob, I gather that 3.3.1 and 3.3.9 are sqlite versions?  
You're saying that any version of sqlite3 with prepare_v2 also supports 
the sharing of sqlite3 handles across threads, yes?
This r+ assumes that the answers to both of those questions is: yes.
Comment 17 Robert Relyea 2007-10-02 13:35:15 PDT
yes and yes.

Checking in sdb.c;
/cvsroot/mozilla/security/nss/lib/softoken/sdb.c,v  <--  sdb.c
new revision: 1.5; previous revision: 1.4
done
Comment 18 Peter (Stig) Edwards 2013-12-20 04:03:27 PST
Hello and thank you for nss,
 
RHEL6 uses curl built with NSS and I noticed when curl (either the command line) or libcurl is used to make HTTPS requests/connections that the code in softoken/sdb.c:sdb_measureAccess that checks the existence of directories causes an increase in the dentry cache size and slab size, if many HTTPS connections/requests are made this increase can be significant.
 
  When sdb_measureAccess() is called from sdb_init it uses the environment variable NSS_SDB_USE_CACHE to control if the cache is used or not ("yes" or "no"), however in s_open it is called without examining the environment variable NSS_SDB_USE_CACHE.  Could the call to sdb_measureAccess() also be skipped in s_open if NSS_SDB_USE_CACHE is set to "yes" or "no"?
 
Thank you.
 
More details in https://bugzilla.redhat.com/show_bug.cgi?id=1044666

Note You need to log in before you can comment on or make changes to this bug.