Closed Bug 599641 Opened 15 years ago Closed 15 years ago

Run ANALYZE after expiration

Categories

(Toolkit :: Places, defect)

Product:

Component:

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla5

Tracking Flags:

Tracking

Status

blocking2.0

---

-

People

(Reporter: sdwilsh, Assigned: sdwilsh)

References

(
URL
)

Details

(Keywords: perf, Whiteboard: [places-next-wanted])

Attachments

(1 file, 3 obsolete files)

ANALYZE after expiration v0.1 15 years ago Shawn Wilsher :sdwilsh 2.94 KB, patch		Details \| Diff \| Splinter Review
ANALYZE after expiration v0.2 15 years ago Shawn Wilsher :sdwilsh 3.10 KB, patch	mak : feedback+	Details \| Diff \| Splinter Review
ANALYZE after expiration v1.0 15 years ago Shawn Wilsher :sdwilsh 8.47 KB, patch	mak : review+	Details \| Diff \| Splinter Review
ANALYZE after expiration v1.1 15 years ago Shawn Wilsher :sdwilsh 8.80 KB, patch		Details \| Diff \| Splinter Review

Shawn Wilsher :sdwilsh

Assignee

Description

•

15 years ago

We should run ANALYZE after running expiration so the query planner ends up giving us better results in the future. This data is persisted AFAICT, and it seems like doing it after expiration is a good time to do it. Note: this wouldn't change existing precompiled queries, but next time we compile them (say, after a restart)

Shawn Wilsher :sdwilsh

Assignee

Comment 1

•

15 years ago

This should give us better perf results for users (not likely measurable by talos though), so requesting blocking. Really trivial to add too.

blocking2.0: --- → ?

Shawn Wilsher :sdwilsh

Assignee

Comment 2

•

15 years ago

We'd also need to do this whenever our schema changes, FWIW.

Marco Bonardo [:mak]

Comment 3

•

15 years ago

this depends on the time analyze needs to run, if it's an heavy io task we should absolutely avoid it after expiration and add it to PlacesDBUtils instead so it runs on idle. Indeed I think we should just run it on idle, the schema changes will be picked up on the first idle too.

Shawn Wilsher :sdwilsh

Assignee

Comment 4

•

15 years ago

(In reply to comment #3) > this depends on the time analyze needs to run, if it's an heavy io task we > should absolutely avoid it after expiration and add it to PlacesDBUtils instead > so it runs on idle. > Indeed I think we should just run it on idle, the schema changes will be picked > up on the first idle too. Don't we do a big expiration on idle still, or did that change?

Marco Bonardo [:mak]

Comment 5

•

15 years ago

(In reply to comment #4) > Don't we do a big expiration on idle still, or did that change? yes, but I doubt we need to run analyze more than once a day, expiration can potentially run at any idle (or at none if history is clean). We also probably want SQLITE_ENABLE_STAT2

Shawn Wilsher :sdwilsh

Assignee

Comment 6

•

15 years ago

(In reply to comment #5) > yes, but I doubt we need to run analyze more than once a day, expiration can > potentially run at any idle (or at none if history is clean). We should maybe reconsider that due to mobile. > We also probably want SQLITE_ENABLE_STAT2 Possibly. I haven't done much other research into it just yet.

Marco Bonardo [:mak]

Comment 7

•

15 years ago

(In reply to comment #6) > (In reply to comment #5) > > yes, but I doubt we need to run analyze more than once a day, expiration can > > potentially run at any idle (or at none if history is clean). > We should maybe reconsider that due to mobile. don't think so, notice I said "potentially", expiration can understand if it needs to run on idle or not, and often it won't do it.

Shawn Wilsher :sdwilsh

Assignee

Comment 8

•

15 years ago

(In reply to comment #7) > don't think so, notice I said "potentially", expiration can understand if it > needs to run on idle or not, and often it won't do it. Oh, right. It's all coming back to me now. Even then, I think it makes sense to do the ANALYZE simply because if we just expired a bunch of stuff, our stats have likely changed a bunch so the optimizer could certainly benefit from this.

Dietrich Ayala (:dietrich)

Comment 9

•

15 years ago

Not blocking on this without quantification of how much win. Even then I'm not sure we should block on it, but instead take it on approval.

blocking2.0: ? → -

Keywords: perf

Marco Bonardo [:mak]

Comment 10

•

15 years ago

Well, in bug 595530 just running analyze moves the query time from 12s to 100ms! Now that could end up being a bug in sqlite, but looks like sqlite 3.7.x uses this table much deeper than previous versions. Also it doesn't look like an heavy operation, so doing it with expiration should be fine (I'd say once a day). But at this point looks like we could want it for all databases like vacuum.

Shawn Wilsher :sdwilsh

Assignee

Comment 11

•

15 years ago

(In reply to comment #9) > Not blocking on this without quantification of how much win. Even then I'm not > sure we should block on it, but instead take it on approval. would comment 10 help change that blocking to a +, or should we still go with the approval route?

Marco Bonardo [:mak]

Updated

•

15 years ago

Blocks: 595530

Marco Bonardo [:mak]

Updated

•

15 years ago

Whiteboard: [places-next-wanted]

Shawn Wilsher :sdwilsh

Assignee

Updated

•

15 years ago

Assignee: nobody → sdwilsh

Status: NEW → ASSIGNED

Whiteboard: [places-next-wanted] → [needs patch][places-next-wanted]

Shawn Wilsher :sdwilsh

Assignee

Comment 12

•

15 years ago

It should be noted that ANALYZE takes time similar in magnitude to VACUUM, but it is faster. There are no privacy implications if we just use STAT1; it only stores some simple things (number of rows in an index): CREATE TABLE sqlite_stat1(tbl,idx,stat); However, STAT2 stores 10 samples of data from the specified table: CREATE TABLE sqlite_stat2(tbl,idx,sampleno,sample); (only the first 24 bytes of the data is used, but that could still contain identifying information about the user) We have a couple of options: 1) When a user deletes something from a table, run ANALYZE tblname 2) When a user deletes something from a table, run DELETE FROM sqlite3_stat2 WHERE tbl = 'tblname' 3) Like (2) but only delete the rows from sqlite_stat2 where sample matches the user data. (2) is going to be faster than (1), but we lose the benefit of running ANALYZE. (3) is rather hard since only the first 24 bytes are used, and urls are almost certainly longer than that. I filed bug 646278 on STAT2, but we may want to wait to do that until we get SQLite 3.7.6 which has some nice improvements with it (when using STAT2). Something we may want to consider in the future is to populate the stat tables with data after creating a database.

Marco Bonardo [:mak]

Comment 13

•

15 years ago

hm, 1) doesn't sound like an option if analyze takes similar time to vacuum. Can we do 2) without causing any trouble to sqlite? Both are going to slowdown deletion process. We should try to build a SQLITE_STAT2 of the db and see what usually the sample contains before calling for privacy issues, maybe the sample is not that identifiable. Privacy cases we should consider most are probably bookmarks titles and history urls and titles.

Shawn Wilsher :sdwilsh

Assignee

Comment 14

•

15 years ago

(In reply to comment #13) > hm, 1) doesn't sound like an option if analyze takes similar time to vacuum. Note that it is "ANALYZE tblname" which is going to be faster than just "ANALYZE" > Can we do 2) without causing any trouble to sqlite? Yes, but it tends to make bad choices once it finds a hole. I believe a fix will be in for 3.7.6 though. > Both are going to slowdown deletion process. This won't be terrible if these APIs are async. > We should try to build a SQLITE_STAT2 of the db and see what usually the sample > contains before calling for privacy issues, maybe the sample is not that > identifiable. Privacy cases we should consider most are probably bookmarks > titles and history urls and titles. Like I said in comment 12, it contains the first 48 bytes (first 48 characters in our case) of each sample for each index. This will result in us having the first 48 characters of a URL there, but that's actually easy to delete, which makes me lean towards (3). Regardless, I think STAT2 stuff should be done in a follow-up for Firefox 6, since we won't be taking SQLite 3.7.6 until then anyway.

Shawn Wilsher :sdwilsh

Assignee

Comment 15

•

15 years ago

Attached patch ANALYZE after expiration v0.1 (obsolete) — Details — Splinter Review

Before I go down this road I want to get some high-level feedback. This approach only ever runs ANALYZE on three tables: moz_places, moz_boomarks, and moz_historyvisits. I'm only doing these three because they are the ones that are most used, and in the case of moz_bookmarks, have the biggest chance of being different for each user. If this approach looks good, I can start to work on a test for this. I think we will also want to run ANALYZE on moz_inputhistory after we tweak it for faster location bar searches, but haven't looked that far into that just yet.

Attachment #523149 - Flags: feedback?(mak77)

Shawn Wilsher :sdwilsh

Assignee

Updated

•

15 years ago

Whiteboard: [needs patch][places-next-wanted] → [needs feedback mak][places-next-wanted]

Shawn Wilsher :sdwilsh

Assignee

Comment 16

•

15 years ago

(In reply to comment #15) > I think we will also want to run ANALYZE on moz_inputhistory after we tweak it > for faster location bar searches, but haven't looked that far into that just > yet. Actually, we have QUERY_EXPIRE_INPUTHISTORY so we should probably run it during expiration too.

Shawn Wilsher :sdwilsh

Assignee

Comment 17

•

15 years ago

Attached patch ANALYZE after expiration v0.2 (obsolete) — Details — Splinter Review

Attachment #523149 - Attachment is obsolete: true

Attachment #523162 - Flags: feedback?(mak77)

Attachment #523149 - Flags: feedback?(mak77)

Shawn Wilsher :sdwilsh

Assignee

Comment 18

•

15 years ago

Additionally, SQLite will gain the ability to run ANALYZE and specify an index, so we can go so far as to update specific indexes when we change their values (such as the frecency index in nsNavHistory::DecayFrecency). I'll file a follow-up for that.

Marco Bonardo [:mak]

Comment 19

•

15 years ago

Comment on attachment 523162 [details] [diff] [review] ANALYZE after expiration v0.2 I think the tables choice is fine, mostly we care about the locationbar, so involved tables sounds like a good choice. >diff --git a/toolkit/components/places/nsPlacesExpiration.js b/toolkit/components/places/nsPlacesExpiration.js >+ TIMED_OVERLIMIT: 1 << 6, // just like TIMED, but also when we too much history typo >+ QUERY_ANALYZE_MOZ_BOOKMARKS: { >+ sql: "ANALYZE moz_bookmarks", >+ actions: ACTION.CLEAN_SHUTDOWN | ACTION.DEBUG >+ }, running this on clean shutdown worries me a bit, maybe it's not slow, but do we really need to run it at any shutdown? We did a lot of work to reduce shutdown work, and this doesn't sound like 100% needed. I'd prefer running it on idle. >+ QUERY_ANALYZE_MOZ_INPUTHISTORY: { >+ sql: "ANALYZE moz_inputhistory", >+ actions: ACTION.TIMED | ACTION.CLEAR_HISTORY | ACTION.IDLE | ACTION.DEBUG >+ }, I think you want to add TIMED_OVERLIMIT, I hope this is not too slow. Globally sounds fine.

Attachment #523162 - Flags: feedback?(mak77) → feedback+

Shawn Wilsher :sdwilsh

Assignee

Comment 20

•

15 years ago

(In reply to comment #19) > running this on clean shutdown worries me a bit, maybe it's not slow, but do we > really need to run it at any shutdown? > We did a lot of work to reduce shutdown work, and this doesn't sound like 100% > needed. I did this during shutdown so it'd be properly analyzed for the next startup, but I'm OK with punting on this.

Whiteboard: [needs feedback mak][places-next-wanted] → [places-next-wanted]

Shawn Wilsher :sdwilsh

Assignee

Comment 21

•

15 years ago

Attached patch ANALYZE after expiration v1.0 (obsolete) — Details — Splinter Review

Attachment #523162 - Attachment is obsolete: true

Attachment #524725 - Flags: review?(mak77)

Shawn Wilsher :sdwilsh

Assignee

Updated

•

15 years ago

Whiteboard: [places-next-wanted] → [needs review mak][places-next-wanted]

Marco Bonardo [:mak]

Comment 22

•

15 years ago

Comment on attachment 524725 [details] [diff] [review] ANALYZE after expiration v1.0 >diff --git a/toolkit/components/places/nsPlacesExpiration.js b/toolkit/components/places/nsPlacesExpiration.js >+ CLEAN_SHUTDOWN: 1 << 3, // happens at shutdown when the db has a CLEAN state "when the db status is not DIRTY" (indeed it happens for UNKNOWN status too) >+ // The following queries are used to adjust the sqlite3_stat table to help the >+ // query planner create better queries. These should always be run LAST, and >+ // are therefore at the end of the object. The name of the table is sqlite_stat1 according to the docs, but this should also affect sqlite_stat2 when we will enable it, right? Then probably better say "...used to update the sqlite statistics tables to help..." >diff --git a/toolkit/components/places/tests/expiration/test_analyze_runs.js b/toolkit/components/places/tests/expiration/test_analyze_runs.js >+function test_timed() >+ setInterval(3600); I think I made a poor choice when naming this helper, sigh :( >+ do_check_analyze_ran("moz_places", false); >+ do_check_analyze_ran("moz_bookmarks", false); >+ do_check_analyze_ran("moz_historyvisits", false); >+ do_check_analyze_ran("moz_inputhistory", true); >+ run_next_test(); >+ }; >+ >+ Services.obs.addObserver(observer, PlacesUtils.TOPIC_EXPIRATION_FINISHED, >+ false); >+ setInterval(5); >+ force_expiration_start(); Humm this looks bogus, even if it is not, most likely because the name of the helper sucks (re-sigh) Btw, I think you don't need to start the service, because you add a visit in run_test and that will notify and start the component. Add a comment on setInterval to explain "Set a low interval and wait for a timed expiration step." Won't these 5 seconds make the test really slow to finish? I doubt we want a single xpcshell test taking 5 seconds... >+ shutdownExpiration(); I really suck at choosing names for helpers :( We should fix them in a follow-up, to make them readable and properly named (start_expiration_component, set_expiration_interval, force_shutdown_expiration, ...) >+function run_test() >+ let bs = PlacesUtils.bookmarks; >+ bs.insertBookmark(bs.unfiledBookmarksFolder, TEST_URI, bs.DEFAULT_INDEX, PlacesUtils.unfiledBookmarksFolderId (since tests are examples for add-on authors, that's a better choice to avoid xpcom crossing) >+ TEST_TITLE); >+ let hs = PlacesUtils.history; >+ hs.addVisit(TEST_URI, Date.now() * 1000, null, hs.TRANSITION_TYPED, false, 0); really nit: hs and bs don't seem that useful since used only a couple times and you could go newline with arguments. r+ but check the timed test please, since I'd not want a single test taking that much time, and if it doesn't take that much time then you're evaluating the wrong expiration type.

Attachment #524725 - Flags: review?(mak77) → review+

Shawn Wilsher :sdwilsh

Assignee

Comment 23

•

15 years ago

(In reply to comment #22) > "when the db status is not DIRTY" (indeed it happens for UNKNOWN status too) This is why I added these comments :) That was non-obvious to me. > The name of the table is sqlite_stat1 according to the docs, but this should > also affect sqlite_stat2 when we will enable it, right? > Then probably better say "...used to update the sqlite statistics tables to > help..." Well, right now it's sqlite_stat1 (I had it wrong before, which is why it said sqlite3_stat), and if we enable stat2, we'll have test failures (since the stat1 table would not be updated). I've fixed the comment. > Humm this looks bogus, even if it is not, most likely because the name of the > helper sucks (re-sigh) Helper is indeed named "setInterval". I've reduced the time to be three seconds from five. I want to make sure we don't manage to run expiration twice while we test this, hence the somewhat longish delay. > Btw, I think you don't need to start the service, because you add a visit in > run_test and that will notify and start the component. This is true; I was originally not adding a history visit. > really nit: hs and bs don't seem that useful since used only a couple times and > you could go newline with arguments. I use the variables twice, so I think it is worthwhile to have them. > r+ but check the timed test please, since I'd not want a single test taking > that much time, and if it doesn't take that much time then you're evaluating > the wrong expiration type. It takes a while, but I've reduced it like I said above.

Shawn Wilsher :sdwilsh

Assignee

Comment 24

•

15 years ago

Attached patch ANALYZE after expiration v1.1 — Details — Splinter Review

Per review comments. I will land this on Places shortly.

Attachment #524725 - Attachment is obsolete: true

Shawn Wilsher :sdwilsh

Assignee

Updated

•

15 years ago

Whiteboard: [needs review mak][places-next-wanted] → [places-next-wanted]

Shawn Wilsher :sdwilsh

Assignee

Comment 25

•

15 years ago

http://hg.mozilla.org/projects/places/rev/f548afb7caca

Flags: in-testsuite+

Whiteboard: [places-next-wanted] → [fixed-in-places][places-next-wanted]

Shawn Wilsher :sdwilsh

Assignee

Comment 26

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/f548afb7caca

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Whiteboard: [fixed-in-places][places-next-wanted] → [places-next-wanted]

Target Milestone: --- → mozilla2.2

Shawn Wilsher :sdwilsh

Assignee

Updated

•

15 years ago

Blocks: 649123

You need to log in before you can comment on or make changes to this bug.