Sqlite.jsm should allow cleaning up of old cached statements

RESOLVED FIXED in mozilla21

Status

()

Toolkit
General
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: gps, Assigned: rnewman)

Tracking

unspecified
mozilla21
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink])

Attachments

(3 attachments, 3 obsolete attachments)

(Reporter)

Description

4 years ago
+++ This bug was initially created as a clone of Bug #833609 +++

Follow-up from initial landing to implement some form of automatic statement cleanup. This could entail any of the following:

* Public APIs to flush out all cached statement objects that aren't in use.
* Automatically destroying cached statements that haven't been used in N seconds.
* LRU for cached statements / cap on # cached statements.
* ...
I recall discussion with sdwilsh and mak around trying to cache intelligently. I don't think it's an automatically solvable problem. 

I started hacking in a cache-dropping call, which allows individual applications to choose the best time. 

The only thing I could see being better is a scoped statement cache (connection wrapper), but this is really no more helpful without finalizers.
(In reply to Richard Newman [:rnewman] from comment #1)
> I recall discussion with sdwilsh and mak around trying to cache
> intelligently. I don't think it's an automatically solvable problem. 

we can try!

> I started hacking in a cache-dropping call, which allows individual
> applications to choose the best time. 

that may be more complicated than what we need, we can make this opt-in in case.

> The only thing I could see being better is a scoped statement cache
> (connection wrapper), but this is really no more helpful without finalizers.

what do you mean by finalizers?

Btw, in js we have Sqlite.jsm that wraps the connection, in cpp we have StatementCache, that may be "easily" modified to drop unused statements.
(Reporter)

Comment 3

4 years ago
I agree that "automatic intelligent caching" is often an oxymoron. I feel the best solution to this problem is to make a number of options and APIs available to consumers and let the app make the most suitable call. Surely any solution that allows any type of cache invalidation is better than one that does not (as long as it isn't automatic).
(In reply to Marco Bonardo [:mak] from comment #2)
> > I recall discussion with sdwilsh and mak around trying to cache
> > intelligently. I don't think it's an automatically solvable problem. 
> 
> we can try!

Concur, but I don't really want to block on doing so :D


> > I started hacking in a cache-dropping call, which allows individual
> > applications to choose the best time. 
> 
> that may be more complicated than what we need, we can make this opt-in in
> case.

I think the opposite — having a call which says "drop all of those statements I asked you to cache" seems a lot simpler than having some system which independently decides whether to drop them.

I say this because it was about five lines of code for me to do it in my work on Bug 832067 :)

The current alternative is that statements are automatically finalized when the connection is closed, which is a heavyweight version of the same.


> > The only thing I could see being better is a scoped statement cache
> > (connection wrapper), but this is really no more helpful without finalizers.
> 
> what do you mean by finalizers?

I mean some kind of scoping construct such that finalization of statements can occur on a lifecycle boundary. And it would be convenient if the lifecycle stages of that object could automatically do so, as with Java/C++ finalizers and destructors.

I propose we emulate this in the short term by having a caller say "I'm done with the busy phase, you can uncache now".

We could also do it by reifying that phase as a connection object — Greg and I planned for UnopenedConnection/OpenedConnection/etc. to be that boundary, but probably without actually closing the DB connection.
(Reporter)

Comment 5

4 years ago
I'm fine with the initial patch being a new public API to drop all cached statements.
Assignee: nobody → rnewman
Status: NEW → ASSIGNED
Summary: Sqlite.jsm should automatically clean up old statements → Sqlite.jsm should allow cleaning up of old cached statements
Created attachment 706241 [details] [diff] [review]
Part 1: correctly decrement pending statement count. v1
Attachment #706241 - Flags: review?(gps)
Created attachment 706242 [details] [diff] [review]
Part 2: allow cleanup of cached statements. v1
Attachment #706242 - Flags: review?(gps)
Blocks: 832067
(Reporter)

Comment 8

4 years ago
Comment on attachment 706241 [details] [diff] [review]
Part 1: correctly decrement pending statement count. v1

Review of attachment 706241 [details] [diff] [review]:
-----------------------------------------------------------------

This breaks logging, no? One of the purposes of the counter aspect of inProgressCounter was to keep a running tally and unique identifier for each statement. I concede the variable name is a little wonky and should probably be changed.
Attachment #706241 - Flags: review?(gps) → review-
(Reporter)

Comment 9

4 years ago
Comment on attachment 706242 [details] [diff] [review]
Part 2: allow cleanup of cached statements. v1

Review of attachment 706242 [details] [diff] [review]:
-----------------------------------------------------------------

Cancelling until you realize inProgressStatements lives up to its name.

::: toolkit/modules/Sqlite.jsm
@@ +603,5 @@
> +   */
> +  discardCachedStatements: function () {
> +    // Do this for now. Eventually we might want to track which statements
> +    // are in use (which inProgressStatements does not actually) and only
> +    // clear the others.

inProgressStatements does in fact track which statements are in use! If it's in the map, it's in progress.

@@ +608,5 @@
> +    if (this._inProgressCounter) {
> +      throw new Error("Queries in progress; not clearing cache.");
> +    }
> +
> +    this._log.debug("Discarding cached statements.");

Would you mind tossing the number discarded into the log message?

::: toolkit/modules/tests/xpcshell/test_sqlite.js
@@ +155,5 @@
>  
>    yield c.close();
>  });
>  
> +add_task(function test_cleanup_while_active() {

s/cleanup/discard/g
Attachment #706242 - Flags: review?(gps)
Look more closely. The things tracked aren't statement objects, they're pending statement objects. You can't tell from the pending instances which storage statement instances are still in use: all you can do is blindly cancel them. 

Rather than rework everything to track which things we can and can't finalize, I opted to do the simplest thing.
(Reporter)

Comment 11

4 years ago
I prefer to do it right. Either create a mapping of statement to in progress count or implement a proper in progress counter and only clear when it is 0.
Created attachment 706263 [details] [diff] [review]
Part 1: rework tracking of pending statements. v2
Attachment #706241 - Attachment is obsolete: true
Attachment #706263 - Flags: review?(gps)
Created attachment 706264 [details] [diff] [review]
Part 2: allow cleanup of cached statements. v2
Attachment #706242 - Attachment is obsolete: true
Attachment #706264 - Flags: review?(gps)
(Reporter)

Comment 14

4 years ago
Comment on attachment 706263 [details] [diff] [review]
Part 1: rework tracking of pending statements. v2

Review of attachment 706263 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with comments addressed. I love when refactorings don't have to touch existing tests!

::: toolkit/modules/Sqlite.jsm
@@ +177,5 @@
>    this._anonymousStatements = new Map();
>    this._anonymousCounter = 0;
> +
> +  // A map from statement index to mozIStoragePendingStatement, to allow for
> +  // canceling.

I think you mean "finalizing." "Canceling" is for in progress statements.

@@ +186,3 @@
>    this._inProgressStatements = new Map();
> +
> +  // Increments for each executed statement for the life of the connection. 

Trailing WS.

@@ +342,5 @@
> +      return 0;
> +    }
> +
> +    let out = 0;
> +    for (let v of this._inProgressStatements.values()) {

.values() works?! It isn't documented on MDN.

@@ +770,5 @@
>          }
>        },
>      });
>  
> +    this._recordStatementBeginning(statement, pending, index);

You only call this here. Is it worth inlining to avoid yet more statement overhead?

::: toolkit/modules/tests/xpcshell/test_sqlite.js
@@ +184,5 @@
> +    actual = c.inProgress();
> +  });
> +
> +  do_check_eq(actual, 1);
> +  do_check_eq(c._statementCounter, c._initialStatementCount + 2);

check c.inProgress() after execution as well.

Please also queue up multiple statements with different SQL and check an inProgress() > 1 case.
Attachment #706263 - Flags: review?(gps) → review+
(Reporter)

Comment 15

4 years ago
Comment on attachment 706264 [details] [diff] [review]
Part 2: allow cleanup of cached statements. v2

Review of attachment 706264 [details] [diff] [review]:
-----------------------------------------------------------------

Would review again.
Attachment #706264 - Flags: review?(gps) → review+
(In reply to Gregory Szorc [:gps] from comment #14)

> > +  // A map from statement index to mozIStoragePendingStatement, to allow for
> > +  // canceling.
> 
> I think you mean "finalizing." "Canceling" is for in progress statements.

Nope. The pending statements are what we cancel when we finish up; see:

  _finalize: function (deferred) {
    this._log.debug("Finalizing connection.");
    // Cancel any pending statements.
    for (let [k, statement] of this._pendingStatements) {
      statement.cancel();
    }

StorageStatements produce StoragePendingStatements. We cancel the latter so we can finalize the former. We have to track both because we can't go from SPS -> SS.


> .values() works?! It isn't documented on MDN.

This is as close to docs as we get :/

http://wiki.ecmascript.org/doku.php?id=harmony:simple_maps_and_sets



> > +    this._recordStatementBeginning(statement, pending, index);
> 
> You only call this here. Is it worth inlining to avoid yet more statement
> overhead?

I broke it out for symmetry and isolation of access; I figure it's more important that we don't screw up future changes in bookkeeping than save a single method call when we're already doing expensive stuff.

(The right solution, of course, is "JIT chrome code!")


> check c.inProgress() after execution as well.
> 
> Please also queue up multiple statements with different SQL and check an
> inProgress() > 1 case.

KK.
(Reporter)

Comment 17

4 years ago
(In reply to Richard Newman [:rnewman] from comment #16)
> (In reply to Gregory Szorc [:gps] from comment #14)
> > > +  // A map from statement index to mozIStoragePendingStatement, to allow for
> > > +  // canceling.
> > 
> > I think you mean "finalizing." "Canceling" is for in progress statements.
> 
> Nope. The pending statements are what we cancel when we finish up; see:

Yeah, I just read the patch wrong.

Please commit.
https://hg.mozilla.org/services/services-central/rev/0f28b10053de
https://hg.mozilla.org/services/services-central/rev/dce840cda6b8
Whiteboard: [MemShrink] → [MemShrink][fixed in services]
I think this is more complicated than needed, there is nothing wrong with finalizing an in-progress statement, once you have invoked executeAsync on a statement you can already call finalize() on it.

That said, I'm not sure how you are going to use this API, if you don't want to cache statements why not just using execute() instead of executeCached()? cachins statements, executing them and clearing them is useful only if you do many repeated calls or the same statement.
and the API is lying, since it's called discardCachedStatements, but doesn't discard the in-progress ones.  So it's likely the caller expects to have dropped all of them when it's untrue.
(Reporter)

Comment 21

4 years ago
(In reply to Marco Bonardo [:mak] from comment #19)
> I think this is more complicated than needed, there is nothing wrong with
> finalizing an in-progress statement, once you have invoked executeAsync on a
> statement you can already call finalize() on it.

This I did not know! We can probably drop all the code checking for in-progress state then? Please confirm.

> That said, I'm not sure how you are going to use this API, if you don't want
> to cache statements why not just using execute() instead of executeCached()?
> cachins statements, executing them and clearing them is useful only if you
> do many repeated calls or the same statement.

The current implementation almost perfectly satisfies FHR's requirements! FHR has periodic bursts of high SQL usage with the same prepared statement but with different bound parameters. It is advantageous for us to cache statements for this reason. However, these bursts are few and far between. So, it makes sense for FHR to issue a clean-up afterwards. This is exactly what we've done: we clean up after startup and after generating a payload.
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [:mak] from comment #20)
> and the API is lying, since it's called discardCachedStatements, but doesn't
> discard the in-progress ones.  So it's likely the caller expects to have
> dropped all of them when it's untrue.

I dunno, It seems sane to me to imply "except for those that are currently executing", and the docstring says so.

Still, now academic!


(In reply to Marco Bonardo [:mak] from comment #19)
> I think this is more complicated than needed, there is nothing wrong with
> finalizing an in-progress statement, once you have invoked executeAsync on a
> statement you can already call finalize() on it.

ORLY! Very well then :D


> That said, I'm not sure how you are going to use this API, if you don't want
> to cache statements why not just using execute() instead of executeCached()?
> cachins statements, executing them and clearing them is useful only if you
> do many repeated calls or the same statement.

As Greg said: it's to act as an efficiency boundary after periods of excessive work (during which time we want to cache statements), without actually closing the database connection.
Created attachment 706480 [details] [diff] [review]
Part 3: finalize all the things. v1
Attachment #706480 - Flags: review?(gps)
Thank you for the clarification, the use-case looks fine then.

Yes I confirm that once you have called asyncExecute you can immediately call finalize. I'll tell you more, you can also immediately call asyncClose() (though not our case here).
Flags: needinfo?(mak77)
Comment on attachment 706480 [details] [diff] [review]
Part 3: finalize all the things. v1

Review of attachment 706480 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/modules/Sqlite.jsm
@@ +231,5 @@
>  
>    /**
> +   * @return (integer) the number of executing queries.
> +   */
> +  get inProgress() {

this name is poorly choosen, inProgress doesn't mean a thing, should be inProgressStatementsCount or activeStatementsCount... btw, what's the usefulness of this? why should a consumer be interested in the number of statements in progress, since it changes continuously and you have poor control over what's in progress since everything is async and can happen at any time. looks like a so volatile data that I can barely think of use-cases.
Since this is a pseudoAPI, we should manage it like that, and don't add things that complicate it with poor benefit.
(In reply to Marco Bonardo [:mak] from comment #25)

> btw, what's the usefulness of this?

It's a supported way for test code to find out the number of concurrent queries (which can verify implementation invariants, such as "queries are not serialized") -- see its five uses in test_sqlite.js.

There used to be a comment explaining that. I can move it into the test file instead.

> why should a consumer be interested in the number of
> statements in progress, since it changes continuously and you have poor
> control over what's in progress since everything is async and can happen at
> any time.

Well, it's read-only, so "poor control" isn't really a good argument IMO; it's for introspection. That it changes continuously and we have poor control is actually an argument in favor of being able to monitor it, no?

I could certainly imagine that we might want to snapshot how many concurrent statements we execute as part of a telemetry probe, or to hold off on doing some work until it drops to zero, or even just logging it as part of debugging.
(In reply to Richard Newman [:rnewman] from comment #26)
> There used to be a comment explaining that. I can move it into the test file
> instead.

APIs added for tests is not a good use-case, imo.
At that point I'd prefer an "internal" _inProgressStatementsCount getter, it still sucks but at least wouldn't look like a supported API.

> That it changes continuously and we have poor
> control is actually an argument in favor of being able to monitor it, no?

no, it's an argument that is totally pointless thinking to monitor it, imo :)

> I could certainly imagine that we might want to snapshot how many concurrent
> statements we execute as part of a telemetry probe

one. All statements are serialize, this can only be 1 or 0.
(In reply to Marco Bonardo [:mak] from comment #27)
> one. All statements are serialize, this can only be 1 or 0.

OR better, this is tracking statements you "planned" to execute, but they will be executed in a fifo queue, so this is even lying more... you may call it statementsThatWillMaybeExecuteCount :)
(In reply to Marco Bonardo [:mak] from comment #27)

> > I could certainly imagine that we might want to snapshot how many concurrent
> > statements we execute as part of a telemetry probe
> 
> one. All statements are serialize, this can only be 1 or 0.

I have a test proving that _pendingStatements.size reaches 2. That doesn't necessarily mean that they're both running against the database, but they're both extant, which is something we do care about. Not enough to leave it in the public API, but we do care.


> OR better, this is tracking statements you "planned" to execute, but they
> will be executed in a fifo queue, so this is even lying more... you may call
> it statementsThatWillMaybeExecuteCount :)

They're "statements on which executeAsync have been called". They might not interleave returned results due to storage limitations, but they're concurrently active, and there's nothing in Sqlite.jsm that demands that they not be interleaved.
(In reply to Richard Newman [:rnewman] from comment #29)
> They're "statements on which executeAsync have been called". They might not
> interleave returned results due to storage limitations, but they're
> concurrently active

Not a limitation, it's a wanted feature.  They are still not in-progress as the pseudoAPI tries to tell, they are pending or queued, whatever is more clear. I think we should not confuse the consumers that may fear to have 10 pending statements for perf reasons, when actually they will just queue up nicely.
Created attachment 706609 [details] [diff] [review]
Part 3: finalize all the things. v2
Attachment #706480 - Attachment is obsolete: true
Attachment #706480 - Flags: review?(gps)
Attachment #706609 - Flags: review?(gps)
(Reporter)

Comment 32

4 years ago
Comment on attachment 706609 [details] [diff] [review]
Part 3: finalize all the things. v2

Review of attachment 706609 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #706609 - Flags: review?(gps) → review+
Thumbs up!
Thanks chaps!
https://hg.mozilla.org/services/services-central/rev/2e916d9dae06
(Reporter)

Comment 36

4 years ago
https://hg.mozilla.org/mozilla-central/rev/0f28b10053de
https://hg.mozilla.org/mozilla-central/rev/dce840cda6b8
https://hg.mozilla.org/mozilla-central/rev/2e916d9dae06
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Whiteboard: [MemShrink][fixed in services] → [MemShrink]
Target Milestone: --- → mozilla21
You need to log in before you can comment on or make changes to this bug.