Last Comment Bug 834449 - Sqlite.jsm should allow cleaning up of old cached statements
: Sqlite.jsm should allow cleaning up of old cached statements
Status: RESOLVED FIXED
[MemShrink]
:
Product: Toolkit
Classification: Components
Component: General (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla21
Assigned To: Richard Newman [:rnewman]
:
Mentors:
Depends on: 833609
Blocks: 832067
  Show dependency treegraph
 
Reported: 2013-01-24 13:45 PST by Gregory Szorc [:gps]
Modified: 2013-02-06 03:41 PST (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Part 1: correctly decrement pending statement count. v1 (2.71 KB, patch)
2013-01-24 19:48 PST, Richard Newman [:rnewman]
gps: review-
Details | Diff | Review
Part 2: allow cleanup of cached statements. v1 (3.40 KB, patch)
2013-01-24 19:48 PST, Richard Newman [:rnewman]
no flags Details | Diff | Review
Part 1: rework tracking of pending statements. v2 (7.69 KB, patch)
2013-01-24 22:02 PST, Richard Newman [:rnewman]
gps: review+
Details | Diff | Review
Part 2: allow cleanup of cached statements. v2 (3.35 KB, patch)
2013-01-24 22:02 PST, Richard Newman [:rnewman]
gps: review+
Details | Diff | Review
Part 3: finalize all the things. v1 (10.23 KB, patch)
2013-01-25 10:42 PST, Richard Newman [:rnewman]
no flags Details | Diff | Review
Part 3: finalize all the things. v2 (9.66 KB, patch)
2013-01-25 14:49 PST, Richard Newman [:rnewman]
gps: review+
Details | Diff | Review

Description Gregory Szorc [:gps] 2013-01-24 13:45:57 PST
+++ This bug was initially created as a clone of Bug #833609 +++

Follow-up from initial landing to implement some form of automatic statement cleanup. This could entail any of the following:

* Public APIs to flush out all cached statement objects that aren't in use.
* Automatically destroying cached statements that haven't been used in N seconds.
* LRU for cached statements / cap on # cached statements.
* ...
Comment 1 Richard Newman [:rnewman] 2013-01-24 14:12:13 PST
I recall discussion with sdwilsh and mak around trying to cache intelligently. I don't think it's an automatically solvable problem. 

I started hacking in a cache-dropping call, which allows individual applications to choose the best time. 

The only thing I could see being better is a scoped statement cache (connection wrapper), but this is really no more helpful without finalizers.
Comment 2 Marco Bonardo [::mak] 2013-01-24 14:21:29 PST
(In reply to Richard Newman [:rnewman] from comment #1)
> I recall discussion with sdwilsh and mak around trying to cache
> intelligently. I don't think it's an automatically solvable problem. 

we can try!

> I started hacking in a cache-dropping call, which allows individual
> applications to choose the best time. 

that may be more complicated than what we need, we can make this opt-in in case.

> The only thing I could see being better is a scoped statement cache
> (connection wrapper), but this is really no more helpful without finalizers.

what do you mean by finalizers?

Btw, in js we have Sqlite.jsm that wraps the connection, in cpp we have StatementCache, that may be "easily" modified to drop unused statements.
Comment 3 Gregory Szorc [:gps] 2013-01-24 14:30:46 PST
I agree that "automatic intelligent caching" is often an oxymoron. I feel the best solution to this problem is to make a number of options and APIs available to consumers and let the app make the most suitable call. Surely any solution that allows any type of cache invalidation is better than one that does not (as long as it isn't automatic).
Comment 4 Richard Newman [:rnewman] 2013-01-24 17:27:06 PST
(In reply to Marco Bonardo [:mak] from comment #2)
> > I recall discussion with sdwilsh and mak around trying to cache
> > intelligently. I don't think it's an automatically solvable problem. 
> 
> we can try!

Concur, but I don't really want to block on doing so :D


> > I started hacking in a cache-dropping call, which allows individual
> > applications to choose the best time. 
> 
> that may be more complicated than what we need, we can make this opt-in in
> case.

I think the opposite — having a call which says "drop all of those statements I asked you to cache" seems a lot simpler than having some system which independently decides whether to drop them.

I say this because it was about five lines of code for me to do it in my work on Bug 832067 :)

The current alternative is that statements are automatically finalized when the connection is closed, which is a heavyweight version of the same.


> > The only thing I could see being better is a scoped statement cache
> > (connection wrapper), but this is really no more helpful without finalizers.
> 
> what do you mean by finalizers?

I mean some kind of scoping construct such that finalization of statements can occur on a lifecycle boundary. And it would be convenient if the lifecycle stages of that object could automatically do so, as with Java/C++ finalizers and destructors.

I propose we emulate this in the short term by having a caller say "I'm done with the busy phase, you can uncache now".

We could also do it by reifying that phase as a connection object — Greg and I planned for UnopenedConnection/OpenedConnection/etc. to be that boundary, but probably without actually closing the DB connection.
Comment 5 Gregory Szorc [:gps] 2013-01-24 18:06:03 PST
I'm fine with the initial patch being a new public API to drop all cached statements.
Comment 6 Richard Newman [:rnewman] 2013-01-24 19:48:26 PST
Created attachment 706241 [details] [diff] [review]
Part 1: correctly decrement pending statement count. v1
Comment 7 Richard Newman [:rnewman] 2013-01-24 19:48:57 PST
Created attachment 706242 [details] [diff] [review]
Part 2: allow cleanup of cached statements. v1
Comment 8 Gregory Szorc [:gps] 2013-01-24 20:09:56 PST
Comment on attachment 706241 [details] [diff] [review]
Part 1: correctly decrement pending statement count. v1

Review of attachment 706241 [details] [diff] [review]:
-----------------------------------------------------------------

This breaks logging, no? One of the purposes of the counter aspect of inProgressCounter was to keep a running tally and unique identifier for each statement. I concede the variable name is a little wonky and should probably be changed.
Comment 9 Gregory Szorc [:gps] 2013-01-24 20:13:54 PST
Comment on attachment 706242 [details] [diff] [review]
Part 2: allow cleanup of cached statements. v1

Review of attachment 706242 [details] [diff] [review]:
-----------------------------------------------------------------

Cancelling until you realize inProgressStatements lives up to its name.

::: toolkit/modules/Sqlite.jsm
@@ +603,5 @@
> +   */
> +  discardCachedStatements: function () {
> +    // Do this for now. Eventually we might want to track which statements
> +    // are in use (which inProgressStatements does not actually) and only
> +    // clear the others.

inProgressStatements does in fact track which statements are in use! If it's in the map, it's in progress.

@@ +608,5 @@
> +    if (this._inProgressCounter) {
> +      throw new Error("Queries in progress; not clearing cache.");
> +    }
> +
> +    this._log.debug("Discarding cached statements.");

Would you mind tossing the number discarded into the log message?

::: toolkit/modules/tests/xpcshell/test_sqlite.js
@@ +155,5 @@
>  
>    yield c.close();
>  });
>  
> +add_task(function test_cleanup_while_active() {

s/cleanup/discard/g
Comment 10 Richard Newman [:rnewman] 2013-01-24 20:33:09 PST
Look more closely. The things tracked aren't statement objects, they're pending statement objects. You can't tell from the pending instances which storage statement instances are still in use: all you can do is blindly cancel them. 

Rather than rework everything to track which things we can and can't finalize, I opted to do the simplest thing.
Comment 11 Gregory Szorc [:gps] 2013-01-24 20:42:54 PST
I prefer to do it right. Either create a mapping of statement to in progress count or implement a proper in progress counter and only clear when it is 0.
Comment 12 Richard Newman [:rnewman] 2013-01-24 22:02:01 PST
Created attachment 706263 [details] [diff] [review]
Part 1: rework tracking of pending statements. v2
Comment 13 Richard Newman [:rnewman] 2013-01-24 22:02:42 PST
Created attachment 706264 [details] [diff] [review]
Part 2: allow cleanup of cached statements. v2
Comment 14 Gregory Szorc [:gps] 2013-01-24 22:18:04 PST
Comment on attachment 706263 [details] [diff] [review]
Part 1: rework tracking of pending statements. v2

Review of attachment 706263 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with comments addressed. I love when refactorings don't have to touch existing tests!

::: toolkit/modules/Sqlite.jsm
@@ +177,5 @@
>    this._anonymousStatements = new Map();
>    this._anonymousCounter = 0;
> +
> +  // A map from statement index to mozIStoragePendingStatement, to allow for
> +  // canceling.

I think you mean "finalizing." "Canceling" is for in progress statements.

@@ +186,3 @@
>    this._inProgressStatements = new Map();
> +
> +  // Increments for each executed statement for the life of the connection. 

Trailing WS.

@@ +342,5 @@
> +      return 0;
> +    }
> +
> +    let out = 0;
> +    for (let v of this._inProgressStatements.values()) {

.values() works?! It isn't documented on MDN.

@@ +770,5 @@
>          }
>        },
>      });
>  
> +    this._recordStatementBeginning(statement, pending, index);

You only call this here. Is it worth inlining to avoid yet more statement overhead?

::: toolkit/modules/tests/xpcshell/test_sqlite.js
@@ +184,5 @@
> +    actual = c.inProgress();
> +  });
> +
> +  do_check_eq(actual, 1);
> +  do_check_eq(c._statementCounter, c._initialStatementCount + 2);

check c.inProgress() after execution as well.

Please also queue up multiple statements with different SQL and check an inProgress() > 1 case.
Comment 15 Gregory Szorc [:gps] 2013-01-24 22:20:02 PST
Comment on attachment 706264 [details] [diff] [review]
Part 2: allow cleanup of cached statements. v2

Review of attachment 706264 [details] [diff] [review]:
-----------------------------------------------------------------

Would review again.
Comment 16 Richard Newman [:rnewman] 2013-01-24 22:49:11 PST
(In reply to Gregory Szorc [:gps] from comment #14)

> > +  // A map from statement index to mozIStoragePendingStatement, to allow for
> > +  // canceling.
> 
> I think you mean "finalizing." "Canceling" is for in progress statements.

Nope. The pending statements are what we cancel when we finish up; see:

  _finalize: function (deferred) {
    this._log.debug("Finalizing connection.");
    // Cancel any pending statements.
    for (let [k, statement] of this._pendingStatements) {
      statement.cancel();
    }

StorageStatements produce StoragePendingStatements. We cancel the latter so we can finalize the former. We have to track both because we can't go from SPS -> SS.


> .values() works?! It isn't documented on MDN.

This is as close to docs as we get :/

http://wiki.ecmascript.org/doku.php?id=harmony:simple_maps_and_sets



> > +    this._recordStatementBeginning(statement, pending, index);
> 
> You only call this here. Is it worth inlining to avoid yet more statement
> overhead?

I broke it out for symmetry and isolation of access; I figure it's more important that we don't screw up future changes in bookkeeping than save a single method call when we're already doing expensive stuff.

(The right solution, of course, is "JIT chrome code!")


> check c.inProgress() after execution as well.
> 
> Please also queue up multiple statements with different SQL and check an
> inProgress() > 1 case.

KK.
Comment 17 Gregory Szorc [:gps] 2013-01-25 00:13:36 PST
(In reply to Richard Newman [:rnewman] from comment #16)
> (In reply to Gregory Szorc [:gps] from comment #14)
> > > +  // A map from statement index to mozIStoragePendingStatement, to allow for
> > > +  // canceling.
> > 
> > I think you mean "finalizing." "Canceling" is for in progress statements.
> 
> Nope. The pending statements are what we cancel when we finish up; see:

Yeah, I just read the patch wrong.

Please commit.
Comment 19 Marco Bonardo [::mak] 2013-01-25 01:56:48 PST
I think this is more complicated than needed, there is nothing wrong with finalizing an in-progress statement, once you have invoked executeAsync on a statement you can already call finalize() on it.

That said, I'm not sure how you are going to use this API, if you don't want to cache statements why not just using execute() instead of executeCached()? cachins statements, executing them and clearing them is useful only if you do many repeated calls or the same statement.
Comment 20 Marco Bonardo [::mak] 2013-01-25 01:58:21 PST
and the API is lying, since it's called discardCachedStatements, but doesn't discard the in-progress ones.  So it's likely the caller expects to have dropped all of them when it's untrue.
Comment 21 Gregory Szorc [:gps] 2013-01-25 09:34:29 PST
(In reply to Marco Bonardo [:mak] from comment #19)
> I think this is more complicated than needed, there is nothing wrong with
> finalizing an in-progress statement, once you have invoked executeAsync on a
> statement you can already call finalize() on it.

This I did not know! We can probably drop all the code checking for in-progress state then? Please confirm.

> That said, I'm not sure how you are going to use this API, if you don't want
> to cache statements why not just using execute() instead of executeCached()?
> cachins statements, executing them and clearing them is useful only if you
> do many repeated calls or the same statement.

The current implementation almost perfectly satisfies FHR's requirements! FHR has periodic bursts of high SQL usage with the same prepared statement but with different bound parameters. It is advantageous for us to cache statements for this reason. However, these bursts are few and far between. So, it makes sense for FHR to issue a clean-up afterwards. This is exactly what we've done: we clean up after startup and after generating a payload.
Comment 22 Richard Newman [:rnewman] 2013-01-25 10:25:21 PST
(In reply to Marco Bonardo [:mak] from comment #20)
> and the API is lying, since it's called discardCachedStatements, but doesn't
> discard the in-progress ones.  So it's likely the caller expects to have
> dropped all of them when it's untrue.

I dunno, It seems sane to me to imply "except for those that are currently executing", and the docstring says so.

Still, now academic!


(In reply to Marco Bonardo [:mak] from comment #19)
> I think this is more complicated than needed, there is nothing wrong with
> finalizing an in-progress statement, once you have invoked executeAsync on a
> statement you can already call finalize() on it.

ORLY! Very well then :D


> That said, I'm not sure how you are going to use this API, if you don't want
> to cache statements why not just using execute() instead of executeCached()?
> cachins statements, executing them and clearing them is useful only if you
> do many repeated calls or the same statement.

As Greg said: it's to act as an efficiency boundary after periods of excessive work (during which time we want to cache statements), without actually closing the database connection.
Comment 23 Richard Newman [:rnewman] 2013-01-25 10:42:34 PST
Created attachment 706480 [details] [diff] [review]
Part 3: finalize all the things. v1
Comment 24 Marco Bonardo [::mak] 2013-01-25 13:09:30 PST
Thank you for the clarification, the use-case looks fine then.

Yes I confirm that once you have called asyncExecute you can immediately call finalize. I'll tell you more, you can also immediately call asyncClose() (though not our case here).
Comment 25 Marco Bonardo [::mak] 2013-01-25 13:15:26 PST
Comment on attachment 706480 [details] [diff] [review]
Part 3: finalize all the things. v1

Review of attachment 706480 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/modules/Sqlite.jsm
@@ +231,5 @@
>  
>    /**
> +   * @return (integer) the number of executing queries.
> +   */
> +  get inProgress() {

this name is poorly choosen, inProgress doesn't mean a thing, should be inProgressStatementsCount or activeStatementsCount... btw, what's the usefulness of this? why should a consumer be interested in the number of statements in progress, since it changes continuously and you have poor control over what's in progress since everything is async and can happen at any time. looks like a so volatile data that I can barely think of use-cases.
Since this is a pseudoAPI, we should manage it like that, and don't add things that complicate it with poor benefit.
Comment 26 Richard Newman [:rnewman] 2013-01-25 14:16:06 PST
(In reply to Marco Bonardo [:mak] from comment #25)

> btw, what's the usefulness of this?

It's a supported way for test code to find out the number of concurrent queries (which can verify implementation invariants, such as "queries are not serialized") -- see its five uses in test_sqlite.js.

There used to be a comment explaining that. I can move it into the test file instead.

> why should a consumer be interested in the number of
> statements in progress, since it changes continuously and you have poor
> control over what's in progress since everything is async and can happen at
> any time.

Well, it's read-only, so "poor control" isn't really a good argument IMO; it's for introspection. That it changes continuously and we have poor control is actually an argument in favor of being able to monitor it, no?

I could certainly imagine that we might want to snapshot how many concurrent statements we execute as part of a telemetry probe, or to hold off on doing some work until it drops to zero, or even just logging it as part of debugging.
Comment 27 Marco Bonardo [::mak] 2013-01-25 14:19:40 PST
(In reply to Richard Newman [:rnewman] from comment #26)
> There used to be a comment explaining that. I can move it into the test file
> instead.

APIs added for tests is not a good use-case, imo.
At that point I'd prefer an "internal" _inProgressStatementsCount getter, it still sucks but at least wouldn't look like a supported API.

> That it changes continuously and we have poor
> control is actually an argument in favor of being able to monitor it, no?

no, it's an argument that is totally pointless thinking to monitor it, imo :)

> I could certainly imagine that we might want to snapshot how many concurrent
> statements we execute as part of a telemetry probe

one. All statements are serialize, this can only be 1 or 0.
Comment 28 Marco Bonardo [::mak] 2013-01-25 14:21:41 PST
(In reply to Marco Bonardo [:mak] from comment #27)
> one. All statements are serialize, this can only be 1 or 0.

OR better, this is tracking statements you "planned" to execute, but they will be executed in a fifo queue, so this is even lying more... you may call it statementsThatWillMaybeExecuteCount :)
Comment 29 Richard Newman [:rnewman] 2013-01-25 14:25:16 PST
(In reply to Marco Bonardo [:mak] from comment #27)

> > I could certainly imagine that we might want to snapshot how many concurrent
> > statements we execute as part of a telemetry probe
> 
> one. All statements are serialize, this can only be 1 or 0.

I have a test proving that _pendingStatements.size reaches 2. That doesn't necessarily mean that they're both running against the database, but they're both extant, which is something we do care about. Not enough to leave it in the public API, but we do care.


> OR better, this is tracking statements you "planned" to execute, but they
> will be executed in a fifo queue, so this is even lying more... you may call
> it statementsThatWillMaybeExecuteCount :)

They're "statements on which executeAsync have been called". They might not interleave returned results due to storage limitations, but they're concurrently active, and there's nothing in Sqlite.jsm that demands that they not be interleaved.
Comment 30 Marco Bonardo [::mak] 2013-01-25 14:31:42 PST
(In reply to Richard Newman [:rnewman] from comment #29)
> They're "statements on which executeAsync have been called". They might not
> interleave returned results due to storage limitations, but they're
> concurrently active

Not a limitation, it's a wanted feature.  They are still not in-progress as the pseudoAPI tries to tell, they are pending or queued, whatever is more clear. I think we should not confuse the consumers that may fear to have 10 pending statements for perf reasons, when actually they will just queue up nicely.
Comment 31 Richard Newman [:rnewman] 2013-01-25 14:49:14 PST
Created attachment 706609 [details] [diff] [review]
Part 3: finalize all the things. v2
Comment 32 Gregory Szorc [:gps] 2013-01-25 14:52:18 PST
Comment on attachment 706609 [details] [diff] [review]
Part 3: finalize all the things. v2

Review of attachment 706609 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Comment 33 Marco Bonardo [::mak] 2013-01-25 14:55:05 PST
Thumbs up!
Comment 34 Richard Newman [:rnewman] 2013-01-25 15:04:57 PST
Thanks chaps!
Comment 35 Richard Newman [:rnewman] 2013-01-25 15:05:54 PST
https://hg.mozilla.org/services/services-central/rev/2e916d9dae06

Note You need to log in before you can comment on or make changes to this bug.