Closed Bug 599641 Opened 14 years ago Closed 13 years ago

Run ANALYZE after expiration

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla5
Tracking Status
blocking2.0 --- -

People

(Reporter: sdwilsh, Assigned: sdwilsh)

References

()

Details

(Keywords: perf, Whiteboard: [places-next-wanted])

Attachments

(1 file, 3 obsolete files)

We should run ANALYZE after running expiration so the query planner ends up giving us better results in the future.  This data is persisted AFAICT, and it seems like doing it after expiration is a good time to do it.

Note: this wouldn't change existing precompiled queries, but next time we compile them (say, after a restart)
This should give us better perf results for users (not likely measurable by talos though), so requesting blocking.  Really trivial to add too.
blocking2.0: --- → ?
We'd also need to do this whenever our schema changes, FWIW.
this depends on the time analyze needs to run, if it's an heavy io task we should absolutely avoid it after expiration and add it to PlacesDBUtils instead so it runs on idle.
Indeed I think we should just run it on idle, the schema changes will be picked up on the first idle too.
(In reply to comment #3)
> this depends on the time analyze needs to run, if it's an heavy io task we
> should absolutely avoid it after expiration and add it to PlacesDBUtils instead
> so it runs on idle.
> Indeed I think we should just run it on idle, the schema changes will be picked
> up on the first idle too.
Don't we do a big expiration on idle still, or did that change?
(In reply to comment #4)
> Don't we do a big expiration on idle still, or did that change?

yes, but I doubt we need to run analyze more than once a day, expiration can potentially run at any idle (or at none if history is clean).

We also probably want SQLITE_ENABLE_STAT2
(In reply to comment #5)
> yes, but I doubt we need to run analyze more than once a day, expiration can
> potentially run at any idle (or at none if history is clean).
We should maybe reconsider that due to mobile.

> We also probably want SQLITE_ENABLE_STAT2
Possibly.  I haven't done much other research into it just yet.
(In reply to comment #6)
> (In reply to comment #5)
> > yes, but I doubt we need to run analyze more than once a day, expiration can
> > potentially run at any idle (or at none if history is clean).
> We should maybe reconsider that due to mobile.

don't think so, notice I said "potentially", expiration can understand if it needs to run on idle or not, and often it won't do it.
(In reply to comment #7)
> don't think so, notice I said "potentially", expiration can understand if it
> needs to run on idle or not, and often it won't do it.
Oh, right.  It's all coming back to me now.

Even then, I think it makes sense to do the ANALYZE simply because if we just expired a bunch of stuff, our stats have likely changed a bunch so the optimizer could certainly benefit from this.
Not blocking on this without quantification of how much win. Even then I'm not sure we should block on it, but instead take it on approval.
blocking2.0: ? → -
Keywords: perf
Well, in bug 595530 just running analyze moves the query time from 12s to 100ms!

Now that could end up being a bug in sqlite, but looks like sqlite 3.7.x uses this table much deeper than previous versions.

Also it doesn't look like an heavy operation, so doing it with expiration should be fine (I'd say once a day). But at this point looks like we could want it for all databases like vacuum.
(In reply to comment #9)
> Not blocking on this without quantification of how much win. Even then I'm not
> sure we should block on it, but instead take it on approval.
would comment 10 help change that blocking to a +, or should we still go with the approval route?
Whiteboard: [places-next-wanted]
Assignee: nobody → sdwilsh
Status: NEW → ASSIGNED
Whiteboard: [places-next-wanted] → [needs patch][places-next-wanted]
It should be noted that ANALYZE takes time similar in magnitude to VACUUM, but it is faster.

There are no privacy implications if we just use STAT1; it only stores some simple things (number of rows in an index):
CREATE TABLE sqlite_stat1(tbl,idx,stat);

However, STAT2 stores 10 samples of data from the specified table:
CREATE TABLE sqlite_stat2(tbl,idx,sampleno,sample);
(only the first 24 bytes of the data is used, but that could still contain identifying information about the user)
We have a couple of options:
1) When a user deletes something from a table, run ANALYZE tblname
2) When a user deletes something from a table, run DELETE FROM sqlite3_stat2 WHERE tbl = 'tblname'
3) Like (2) but only delete the rows from sqlite_stat2 where sample matches the user data.

(2) is going to be faster than (1), but we lose the benefit of running ANALYZE.  (3) is rather hard since only the first 24 bytes are used, and urls are almost certainly longer than that.

I filed bug 646278 on STAT2, but we may want to wait to do that until we get SQLite 3.7.6 which has some nice improvements with it (when using STAT2).


Something we may want to consider in the future is to populate the stat tables with data after creating a database.
hm, 1) doesn't sound like an option if analyze takes similar time to vacuum.
Can we do 2) without causing any trouble to sqlite?
Both are going to slowdown deletion process.
We should try to build a SQLITE_STAT2 of the db and see what usually the sample contains before calling for privacy issues, maybe the sample is not that identifiable. Privacy cases we should consider most are probably bookmarks titles and history urls and titles.
(In reply to comment #13)
> hm, 1) doesn't sound like an option if analyze takes similar time to vacuum.
Note that it is "ANALYZE tblname" which is going to be faster than just "ANALYZE"

> Can we do 2) without causing any trouble to sqlite?
Yes, but it tends to make bad choices once it finds a hole.  I believe a fix will be in for 3.7.6 though.

> Both are going to slowdown deletion process.
This won't be terrible if these APIs are async.

> We should try to build a SQLITE_STAT2 of the db and see what usually the sample
> contains before calling for privacy issues, maybe the sample is not that
> identifiable. Privacy cases we should consider most are probably bookmarks
> titles and history urls and titles.
Like I said in comment 12, it contains the first 48 bytes (first 48 characters in our case) of each sample for each index.  This will result in us having the first 48 characters of a URL there, but that's actually easy to delete, which makes me lean towards (3).

Regardless, I think STAT2 stuff should be done in a follow-up for Firefox 6, since we won't be taking SQLite 3.7.6 until then anyway.
Attached patch ANALYZE after expiration v0.1 (obsolete) — Splinter Review
Before I go down this road I want to get some high-level feedback.  This approach only ever runs ANALYZE on three tables: moz_places, moz_boomarks, and moz_historyvisits.  I'm only doing these three because they are the ones that are most used, and in the case of moz_bookmarks, have the biggest chance of being different for each user.

If this approach looks good, I can start to work on a test for this.

I think we will also want to run ANALYZE on moz_inputhistory after we tweak it for faster location bar searches, but haven't looked that far into that just yet.
Attachment #523149 - Flags: feedback?(mak77)
Whiteboard: [needs patch][places-next-wanted] → [needs feedback mak][places-next-wanted]
(In reply to comment #15)
> I think we will also want to run ANALYZE on moz_inputhistory after we tweak it
> for faster location bar searches, but haven't looked that far into that just
> yet.
Actually, we have QUERY_EXPIRE_INPUTHISTORY so we should probably run it during expiration too.
Attached patch ANALYZE after expiration v0.2 (obsolete) — Splinter Review
Attachment #523149 - Attachment is obsolete: true
Attachment #523162 - Flags: feedback?(mak77)
Attachment #523149 - Flags: feedback?(mak77)
Additionally, SQLite will gain the ability to run ANALYZE and specify an index, so we can go so far as to update specific indexes when we change their values (such as the frecency index in nsNavHistory::DecayFrecency).  I'll file a follow-up for that.
Comment on attachment 523162 [details] [diff] [review]
ANALYZE after expiration v0.2

I think the tables choice is fine, mostly we care about the locationbar, so involved tables sounds like a good choice.

>diff --git a/toolkit/components/places/nsPlacesExpiration.js b/toolkit/components/places/nsPlacesExpiration.js

>+  TIMED_OVERLIMIT: 1 << 6, // just like TIMED, but also when we too much history

typo

>+  QUERY_ANALYZE_MOZ_BOOKMARKS: {
>+    sql: "ANALYZE moz_bookmarks",
>+    actions: ACTION.CLEAN_SHUTDOWN | ACTION.DEBUG
>+  },

running this on clean shutdown worries me a bit, maybe it's not slow, but do we really need to run it at any shutdown?
We did a lot of work to reduce shutdown work, and this doesn't sound like 100% needed.
I'd prefer running it on idle.

>+  QUERY_ANALYZE_MOZ_INPUTHISTORY: {
>+    sql: "ANALYZE moz_inputhistory",
>+    actions: ACTION.TIMED | ACTION.CLEAR_HISTORY | ACTION.IDLE | ACTION.DEBUG
>+  },

I think you want to add TIMED_OVERLIMIT, I hope this is not too slow.

Globally sounds fine.
Attachment #523162 - Flags: feedback?(mak77) → feedback+
(In reply to comment #19)
> running this on clean shutdown worries me a bit, maybe it's not slow, but do we
> really need to run it at any shutdown?
> We did a lot of work to reduce shutdown work, and this doesn't sound like 100%
> needed.
I did this during shutdown so it'd be properly analyzed for the next startup, but I'm OK with punting on this.
Whiteboard: [needs feedback mak][places-next-wanted] → [places-next-wanted]
Attached patch ANALYZE after expiration v1.0 (obsolete) — Splinter Review
Attachment #523162 - Attachment is obsolete: true
Attachment #524725 - Flags: review?(mak77)
Whiteboard: [places-next-wanted] → [needs review mak][places-next-wanted]
Comment on attachment 524725 [details] [diff] [review]
ANALYZE after expiration v1.0

>diff --git a/toolkit/components/places/nsPlacesExpiration.js b/toolkit/components/places/nsPlacesExpiration.js
>+  CLEAN_SHUTDOWN: 1 << 3,  // happens at shutdown when the db has a CLEAN state

"when the db status is not DIRTY" (indeed it happens for UNKNOWN status too)

>+  // The following queries are used to adjust the sqlite3_stat table to help the
>+  // query planner create better queries.  These should always be run LAST, and
>+  // are therefore at the end of the object.

The name of the table is sqlite_stat1 according to the docs, but this should also affect sqlite_stat2 when we will enable it, right?
Then probably better say "...used to update the sqlite statistics tables to help..."

>diff --git a/toolkit/components/places/tests/expiration/test_analyze_runs.js b/toolkit/components/places/tests/expiration/test_analyze_runs.js

>+function test_timed()

>+    setInterval(3600);

I think I made a poor choice when naming this helper, sigh :(

>+    do_check_analyze_ran("moz_places", false);
>+    do_check_analyze_ran("moz_bookmarks", false);
>+    do_check_analyze_ran("moz_historyvisits", false);
>+    do_check_analyze_ran("moz_inputhistory", true);
>+    run_next_test();
>+  };
>+
>+  Services.obs.addObserver(observer, PlacesUtils.TOPIC_EXPIRATION_FINISHED,
>+                           false);
>+  setInterval(5);
>+  force_expiration_start();

Humm this looks bogus, even if it is not, most likely because the name of the helper sucks (re-sigh)
Btw, I think you don't need to start the service, because you add a visit in run_test and that will notify and start the component.
Add a comment on setInterval to explain "Set a low interval and wait for a timed expiration step."
Won't these 5 seconds make the test really slow to finish? I doubt we want a single xpcshell test taking 5 seconds...

>+  shutdownExpiration();

I really suck at choosing names for helpers :(
We should fix them in a follow-up, to make them readable and properly named (start_expiration_component, set_expiration_interval, force_shutdown_expiration, ...)

>+function run_test()

>+  let bs = PlacesUtils.bookmarks;
>+  bs.insertBookmark(bs.unfiledBookmarksFolder, TEST_URI, bs.DEFAULT_INDEX,

PlacesUtils.unfiledBookmarksFolderId (since tests are examples for add-on authors, that's a better choice to avoid xpcom crossing)

>+                    TEST_TITLE);
>+  let hs = PlacesUtils.history;
>+  hs.addVisit(TEST_URI, Date.now() * 1000, null, hs.TRANSITION_TYPED, false, 0);

really nit: hs and bs don't seem that useful since used only a couple times and you could go newline with arguments.

r+ but check the timed test please, since I'd not want a single test taking that much time, and if it doesn't take that much time then you're evaluating the wrong expiration type.
Attachment #524725 - Flags: review?(mak77) → review+
(In reply to comment #22)
> "when the db status is not DIRTY" (indeed it happens for UNKNOWN status too)
This is why I added these comments :)  That was non-obvious to me.

> The name of the table is sqlite_stat1 according to the docs, but this should
> also affect sqlite_stat2 when we will enable it, right?
> Then probably better say "...used to update the sqlite statistics tables to
> help..."
Well, right now it's sqlite_stat1 (I had it wrong before, which is why it said sqlite3_stat), and if we enable stat2, we'll have test failures (since the stat1 table would not be updated).  I've fixed the comment.

> Humm this looks bogus, even if it is not, most likely because the name of the
> helper sucks (re-sigh)
Helper is indeed named "setInterval".  I've reduced the time to be three seconds from five.  I want to make sure we don't manage to run expiration twice while we test this, hence the somewhat longish delay.

> Btw, I think you don't need to start the service, because you add a visit in
> run_test and that will notify and start the component.
This is true; I was originally not adding a history visit.

> really nit: hs and bs don't seem that useful since used only a couple times and
> you could go newline with arguments.
I use the variables twice, so I think it is worthwhile to have them.

> r+ but check the timed test please, since I'd not want a single test taking
> that much time, and if it doesn't take that much time then you're evaluating
> the wrong expiration type.
It takes a while, but I've reduced it like I said above.
Per review comments.  I will land this on Places shortly.
Attachment #524725 - Attachment is obsolete: true
Whiteboard: [needs review mak][places-next-wanted] → [places-next-wanted]
http://hg.mozilla.org/projects/places/rev/f548afb7caca
Flags: in-testsuite+
Whiteboard: [places-next-wanted] → [fixed-in-places][places-next-wanted]
http://hg.mozilla.org/mozilla-central/rev/f548afb7caca
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-places][places-next-wanted] → [places-next-wanted]
Target Milestone: --- → mozilla2.2
Blocks: 649123
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: