Closed
Bug 870171
Opened 11 years ago
Closed 11 years ago
Expire old data
Categories
(Firefox Health Report Graveyard :: Client: Android, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
Firefox 27
People
(Reporter: rnewman, Assigned: mcomella)
References
Details
(Whiteboard: [qa+])
Attachments
(5 files, 5 obsolete files)
No description provided.
Assignee | ||
Comment 1•11 years ago
|
||
It is my understanding that we generate documents from the past six months [1]. Would old data be considered anything older than six months? [1]: https://mxr.mozilla.org/mozilla-central/source/mobile/android/base/background/healthreport/upload/AndroidSubmissionClient.java#106 (Also, I think this should probably be a constant, rather than MILLISECONDS_PER_SIX_MONTHS).
Assignee: nobody → michael.l.comella
Status: NEW → ASSIGNED
Reporter | ||
Comment 2•11 years ago
|
||
Yup!
Assignee | ||
Comment 3•11 years ago
|
||
https://github.com/mozilla-services/android-sync/pull/335 f? - Specifically, I'd like to know if my deleteEvents call (part 3) is in an appropriate location and called with an appropriate frequency (i.e. should a query be made on every upload, as I have done?) before I start writing a new file to test AndroidSubmissionClient. The remaining parts can be looked over in the actual review if you'd like.
Flags: needinfo?(rnewman)
Reporter | ||
Comment 4•11 years ago
|
||
Don't forget to prune things other than events. On every upload is probably too often. Consider that we might upload several times a day. I suggest doing it *after* upload, and not every time -- perhaps once a week? I leave it to you to figure out the best way to do periodic cleanup. Consider also that we have a few other cleanup things we might want to do: * Rolling up or pruning session data. Bug 885650. * Pruning and vacuuming the database when it gets too big. (If you're hitting thousands of events, we probably want to prune sooner than 6 months.) * Periodic vacuuming -- expensive, but possibly necessary to defeat fragmentation.
Flags: needinfo?(rnewman)
Assignee | ||
Comment 5•11 years ago
|
||
Note to self: prune orphaned (and NULL?) addons too.
Assignee | ||
Comment 6•11 years ago
|
||
Closing old PR due to a large rewrite. Here is the new link: https://github.com/mozilla-services/android-sync/pull/344
Assignee | ||
Comment 7•11 years ago
|
||
f? - There is more coming (see my GH comment for details) but I figured this would be easier to review in chunks. Note the PR link has changed.
Flags: needinfo?(rnewman)
Assignee | ||
Comment 9•11 years ago
|
||
Here are is my periodic cleanup and pruning plan. Richard, can you verify this is reasonable? ## Summary For a estimate of 2.25 MB of FHR data, we should prune when data is older than six months, or exceeds a count of 31,000 events. Pruning should occur weekly and vacuuming should occur either monthly, or when over 4200 events have been deleted since the previous vacuum. ## What to prune My profile (which is probably just-under average usage) is 44 days old, has 976 events, which is an average of 22 events/day. These events are session records and search counts. For the other tables, I have 2 measurements, 5 fields, 5 addons, and 10 environments. Since fields and measurements can only be deleted when they are not in use, they are not ideal to analyze for pruning. On mobile, addons and environments will also be fairly static and are comparatively negligible to events entries so we will only analyze events for our pruning. Overall. the size of the data now and, naively assuming constant growth, after a 1/2 year, and 1 year (for a total of 8030 events), is... Data type | 44 days | 1/2 year | 1 year Events | 48 kB | 199 kB | 398 kB Events & its indices | 72 kB | 298 kB | 597 kB All | 120 kB | 498 kB | 995 kB However, this is for 2 measurements (as opposed to the 12 current available, and others that will presumably be added). We will assume the usage per measurement is average (though, granted, these measurements may not contain as much text or may be taken less frequently and vice versa) and multiply the event storage sizes by 6 for these additional measurements (and keep the size of the remaining content constant), which is... Data type | 44 days | 1/2 year | 1 year Events | 288 kB | 1194 kB | 2388 kB Events & its indices | 432 kB | 1788 kB | 3582 kB All | 480 kB | 1988 kB | 3971 kB That's 132 events/day, 5856 events/44 days, 24090 events/(1/2) year, and 48180 events/year. We really only upload data at most 6 months old, so we can start pruning data after ~24090 events are captured, which is a DB size of 1988 kB (~1.94 MB), which seems reasonable compared to the 51.6 MB of data my Fennec currently takes up (not including the apk size). However, I stated my usage is probably just under average so we'll likely want to raise this limit for power users: n = (c - 200) / k where n = number of events c = cap in kB k ~= ~.074 kB/event (solved for 1.94 MB) Samples: 2.25 MB ~= 31067 events 2.50 MB ~= 34519 events 2.75 MB ~= 37971 events 3.00 MB ~= 41423 events My choice would be a 2.25 MB cap which means we prune the oldest events numbered greater than ~31000 events (or, of course, events of older than six months). ## Frequency If we pruned weekly, we would recover, on average (assuming the 6x multiplier above), ~68.7 kB (432 kB / 44days * 7 days), which is ~924 events (132 events * 7). If we vacuum monthly, that's ~294.5 kB (432 kB / 44 * 30 days), which is ~3960 events (132 events * 30). Additionally, we should also have a condition to vacuum if a certain number of events are deleted through weekly pruning, to prevent issues with excessive usage. Since the expected average usage per month is ~3960 events, we can make this value 4200 events. ## Unknowns * Performance: I'm unsure of how to take in the degradation to insert/query performance due to the increasing size of the database and whether pruning for less events is worthwhile to boost performance. * Indices: Do these actually bloat in constant space? Probably worse because they are layered "date, env, field". * Document upload size: 2.25 MB of event data seems like a lot to upload daily, especially if the generated document is not aggregated (see bug 885650).
Flags: needinfo?(rnewman)
Reporter | ||
Comment 10•11 years ago
|
||
I'm generally in favor of very aggressive pruning. Consider: * At-least-linear space and cost of document generation due to number of rows. We walk a cursor to spit out document fragments; the fewer rows, the less work we do, and the fewer rows, the fewer days and environments we have to process. Killing an event row will kill more than one event's worth of space: perhaps it'll kill a whole day and a whole environment, which is a big win with little cost. * At-least-linear cost of including old environments: they have their own section in the doc as well as contributing to each day's data. * Database fragmentation, inserts, indices. You mention these, and yes, they're a concern, particularly on smaller phones. * Compound wire size. That is, if you hang on to just a 1KB day's data for six months rather than three, you're spending 90KB on extra wire data (plus the extra environments you can't discard). This is compressed, but still, it's spending someone else's money. So my feeling is that: * Six months is too long on Android, at least for session data, just due to sheer volume. Bug 885650 again. * And we get compounded gains the more aggressively we collect, so we should *really* aggressively consider cutting down on data -- perhaps by maintaining alternative aggregates locally (e.g., do we really need six months of daily version history, when we could have a simple JSON timeline encoding do the job in 1/50th the space?). * Retaining more data uses more than linear space, but does not (IMO; jjensen might disagree!) linearly improve our ability to do analysis. This is particularly true when you consider FHR as a user-facing value prop: would we really spend 4MB of a user's disk space on a feature with this much benefit, when 2MB offers almost as much? In other words, we should really look at these numbers and say "seriously, we're suggesting storing multiple MB and 25,000 database rows in order to show accurate search counts from March when you open about:healthreport?". The pruning point should be set where this impact trades off against the maximum user value. My broad-stroke feeling is to set hard limits like: 20000 events? ... a number that we should tweak by timing synthetic document generation on lowish-end phones for varying DB sizes. If we're hammering your shitty phone flash for 10 seconds, it's too much. 50 environments? ... just because of sheer payload overhead. If we can figure out a scheme to discard these intelligently... 3 months (or 2 months?) ... because data older than that is of less value, right? When Fig merges it should dramatically reduce the number of sessions stored -- right now each Awesomebar access is a session split. But even so... Regarding frequency: perhaps do the routine date-wise prune every week, and check for event/env count every day? Include a hysteresis curve (prune down to 18,000 when we hit 20,000) to avoid bouncing every day. We shouldn't limit pruning because we're afraid of an expensive delete: that row needs to be deleted at some point. It might as well be now, so we save N days' document generations! Regarding vacuuming: vacuum is expensive. (It's a full DB copy, basically.) As such, we should really measure to decide when to do it: see option 3. http://lzone.de/How%20to%20Vacuum%20SQLite jjensen: what are your feelings about aggressive pruning on Android? Would you consider 1 month, or 3 months, of retained data enough to deliver the user benefit of FHR? Does your answer change if we add some mechanism for long-term aggregation? You still have an open needinfo on Bug 885650, so feel free to answer this one instead.
Flags: needinfo?(rnewman) → needinfo?(jjensen)
Assignee | ||
Comment 11•11 years ago
|
||
f? - Parts 6-11, which remove the extraneous Receivers and test as discussed on IRC. https://github.com/mozilla-services/android-sync/pull/344 Concerns: 1) I was unable to test AnnouncementsService. It shouldn't be making network requests, but the application does not change state without doing so. Thus there was nothing to test for if we want to see if the request would have been made or not (but bailing out before a request is made). There doesn't seem to be any way to insert a mock into the code and so it seems that without changing the main code, there was no way to test AnnouncementsService. I'm not sure if such changes would be worthwhile because they'll probably require additional testing. 2) We store some intents in the Alarm Manager and while I checked that they exist, I was unable to check that these same intents will actually start the associated service - startService() only starts the Service of whose test case this is and PendingIntent.send() does not seem to fire in a reasonable amount of time (I slept for 2 seconds, but that has its own problems). 3) As far as I could find, there is no proper way to check AlarmManager has a PendingIntent stored other than checking that the PendingIntent exists (ref: http://stackoverflow.com/questions/4713592/how-to-check-if-alarm-is-set), but I've done this and it seems like there should be a better way. Additionally, I had to add some code which is unnecessary in the main codebase, namely PendingIntent.cancel() calls, to do this (though this added code does not change the functionality), which also seems incorrect.
Flags: needinfo?(nalexander)
Comment 12•11 years ago
|
||
Hi all, Sorry for missing the needinfo on this -- was on PTO for a while and am still going through bugmail. Philosophically, I am not adamantly opposed to pruning before six months. However, I'd like to onlyact after a) a broader discussion with more stakeholders, ie the product owner and b) a review of some of the existing FHR data we have to better see what the real-world situation is and what the effects could be. To that end, I've added Brendan Colloran to this bug. What I suggest is that we a) come up with a better summary of what the likely growth rate of data is, similar to what Michael did in comment #9, b) come to a recommendation, and get buy-in from the product owner Karen. I'll send a note to the interested parties about this separately and update this ASAP.
Flags: needinfo?(jjensen)
Comment 13•11 years ago
|
||
Feedback provided at https://github.com/mozilla-services/android-sync/pull/335.
Flags: needinfo?(nalexander)
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #10) > Regarding vacuuming: vacuum is expensive. (It's a full DB copy, basically.) > > As such, we should really measure to decide when to do it: see option 3. It seems the "auto_vacuum" pragma is enabled to full by default (on my Galaxy Nexus, at least in our tests). I would argue we don't want this behavior if we're going to be vacuuming regularly by hand anyway (over time auto_vacuum may increase fragmentation, reduce insert speed, etc. at the benefit of a smaller overall DB size). Additionally, with the auto_vacuum pragma, "PRAGMA freelist_count" always returns 0 and thus we can't actually measure the free page ratio as per option 3 and we'd have to vacuum on time alone. In order to disable it, we have to first vacuum the database and then set the pragma to 0 [1]. My solution: we can blindly do this after each vacuum so that existing DBs will be moved to auto_vacuum = 0 (off) after their first vacuum. With a DB version bump, onCreate can also run "vacuum; pragma auto_vacuum = 0;" before creating the tables, to disable this functionality for new databases. Does this sound reasonable? [1]: http://www.sqlite.org/pragma.html#pragma_auto_vacuum
Flags: needinfo?(rnewman)
Assignee | ||
Comment 16•11 years ago
|
||
It seems onCreate is called from within a transaction and so vacuum cannot be run on new databases without compensating. Options: 1) Have a first run preference that is ticked true in onCreate(). onOpen will check for this preference and if true, set it to false, and change the pragma. This is gross because we need to give HealthReportDatabaseStorage a SharedPreferences instance and since we're overriding a method called from Android code, we can't just pass it in as a parameter. 2) Treat newly created databases as existing databases and turn auto_vacuum off on the first vacuum since we're eventually going to vacuum anyway. The one downside is that the first vacuum has to be entirely time dependent (since we can't read the free page count) but we'd have to deal with that in (1) anyway for existing databases. I prefer (2). (1) seems a bit hackish and more complex for what seem to be very minor gains for some portion of the users (although I haven't benchmarked it so I can't be sure). What do you think?
Flags: needinfo?(rnewman)
Assignee | ||
Comment 18•11 years ago
|
||
This patch aligns with Part 20 of the github PR: https://github.com/mozilla-services/android-sync/pull/344 Please also f? on Parts 13 - 20 and the 3 followups of the PR. I have not yet completed the testing, as nalexander previously mentioned I was being a bit overzealous. I'm hoping to test: * HealthReportPruneService.isIntentValid * attemptPruneBy* and attemptVacuum work as intended (they prune when time has expired, do not when not expired) * BroadcastService correctly sets and starts the PruneService alarm. Much of this I have already done for Upload and it shouldn't be too difficult to adapt here - I will do this after the feedback. In the meantime, I have tried to test that the PruneService runs by hand but have been unable to fire the event reliably. I tried making the prune check interval very short and using PendingIntent.send() but neither seem to work properly. I once got an error (PruneService was not found) and another time it worked properly so I'm not sure what to think. Do you have any ideas? Note also that the constants I added are not yet finalized - I still need to do performance testing for those values. The ones I included are just placeholders.
Attachment #795639 -
Flags: feedback?(rnewman)
Assignee | ||
Comment 19•11 years ago
|
||
Note to self: test vacuuming on low-end devices to ensure we don't OOM every time it's called.
Reporter | ||
Comment 20•11 years ago
|
||
Comment on attachment 795639 [details] [diff] [review] 01: m-c Patch Review of attachment 795639 [details] [diff] [review]: ----------------------------------------------------------------- ::: mobile/android/base/GeckoApp.java @@ +1565,3 @@ > GeckoPreferences.broadcastAnnouncementsPref(context); > GeckoPreferences.broadcastHealthReportUploadPref(context); > + GeckoPreferences.broadcastHealthReportPrune(context); We really have to unify these. At this point we're instantiating six service instances. Also, let's not do this here. What do you think about doing this prune broadcast after a session ends in BrowserHealthRecorder? ::: mobile/android/base/GeckoPreferences.java @@ +395,5 @@ > // broadcast settings for all profiles). See Bug 882182. > GeckoProfile profile = GeckoProfile.get(context); > if (profile != null) { > + intent.putExtra("profileName", profile.getName()) > + .putExtra("profilePath", profile.getDir().getAbsolutePath()); Nit: align the .putExtras. @@ +442,5 @@ > > + public static void broadcastHealthReportPrune(final Context context) { > + final String action = HealthReportConstants.ACTION_HEALTHREPORT_PRUNE; > + final Intent intent = new Intent(action) > + .putExtra("branch", GeckoApp.PREFS_NAME); One line here is fine. @@ +445,5 @@ > + final Intent intent = new Intent(action) > + .putExtra("branch", GeckoApp.PREFS_NAME); > + fillIntentWithProfileInfo(context, intent); > + Log.d(LOGTAG, "Broadcast: " + action + ", " + GeckoApp.PREFS_NAME); > + context.sendBroadcast(intent, GlobalConstants.PER_ANDROID_PACKAGE_PERMISSION); Code reuse between this fragment and your reworked broadcastPrefAction?
Attachment #795639 -
Flags: feedback?(rnewman)
Assignee | ||
Comment 21•11 years ago
|
||
> (In reply to Richard Newman [:rnewman] from comment #20) > > We really have to unify these. At this point we're instantiating six service > instances. Unify the broadcasts or the background services or both? > Also, let's not do this here. What do you think about doing this prune > broadcast after a session ends in BrowserHealthRecorder? Sounds reasonable to me - in theory, we shouldn't have such an epic browser session that we need to prune before the browser is backgrounded for the first time.
Assignee | ||
Comment 22•11 years ago
|
||
Is this what you had in mind for the code reuse?
Attachment #795639 -
Attachment is obsolete: true
Attachment #799876 -
Flags: feedback?(rnewman)
Reporter | ||
Comment 23•11 years ago
|
||
Comment on attachment 799876 [details] [diff] [review] 01a: m-c Patch Review of attachment 799876 [details] [diff] [review]: ----------------------------------------------------------------- Yeah, pretty much, though I gently question the amount of logging work!
Attachment #799876 -
Flags: feedback?(rnewman) → feedback+
Reporter | ||
Comment 24•11 years ago
|
||
(In reply to Michael Comella (:mcomella) from comment #21) > > We really have to unify these. At this point we're instantiating six service > > instances. > > Unify the broadcasts or the background services or both? Eventually, probably both. This is pretty much a channel for kicking off relevant background work, so we might start that by combining the broadcast receivers, then the broadcasts themselves, and then perhaps even the background services. (Right now we issue three broadcasts, which starts three broadcast receiver services, then three background services.) > Sounds reasonable to me - in theory, we shouldn't have such an epic browser > session that we need to prune before the browser is backgrounded for the > first time. Exactly.
Assignee | ||
Comment 25•11 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #23) > Yeah, pretty much, though I gently question the amount of logging work! I was keeping what was there (without asking for info as arguments) - should I have reduced it at all? (In reply to Richard Newman [:rnewman] from comment #24) > Eventually, probably both. This is pretty much a channel for kicking off > relevant background work, so we might start that by combining the broadcast > receivers, then the broadcasts themselves, and then perhaps even the > background services. I feel it would be best to do this in a followup. Do you concur?
Assignee | ||
Comment 26•11 years ago
|
||
Start prune service in onPause.
Attachment #799876 -
Attachment is obsolete: true
Reporter | ||
Comment 27•11 years ago
|
||
(In reply to Michael Comella (:mcomella) from comment #25) > I was keeping what was there (without asking for info as arguments) - should > I have reduced it at all? The mfinkle in me says "delete it". > I feel it would be best to do this in a followup. Do you concur? Oh hell yes.
Assignee | ||
Comment 28•11 years ago
|
||
Remove Logging statement. Also, bug 913107 to unify BroadcastReceivers and Broadcasts.
Attachment #799891 -
Attachment is obsolete: true
Attachment #800259 -
Flags: review?(rnewman)
Reporter | ||
Comment 29•11 years ago
|
||
Comment on attachment 800259 [details] [diff] [review] 01c: Patch Review of attachment 800259 [details] [diff] [review]: ----------------------------------------------------------------- ::: mobile/android/base/GeckoApp.java @@ +1980,5 @@ > > + // In theory, the first browser session will not run long enough that we need to prune > + // during it and we'd rather run it when the browser is inactive so we wait until here to > + // register the prune service. > + GeckoPreferences.broadcastHealthReportPrune(this); Let's move this right to the end of the method.
Attachment #800259 -
Flags: review?(rnewman) → review+
Assignee | ||
Comment 30•11 years ago
|
||
Move r+; move PruneService start broadcast to end of the method.
Attachment #800259 -
Attachment is obsolete: true
Attachment #800867 -
Flags: review+
Reporter | ||
Comment 31•11 years ago
|
||
Comment on attachment 800867 [details] [diff] [review] 01d: Patch And by "the end" I kinda meant "in the background runnable". Sorry for being unclear. We should do as little work as possible in onPause; it blocks the next activity.
Attachment #800867 -
Flags: review+
Assignee | ||
Comment 32•11 years ago
|
||
Move Prune Service start broadcast to background runnable.
Attachment #800867 -
Attachment is obsolete: true
Attachment #800949 -
Flags: review?(rnewman)
Reporter | ||
Updated•11 years ago
|
Attachment #800949 -
Flags: review?(rnewman) → review+
Assignee | ||
Comment 33•11 years ago
|
||
r? - github [1] parts 23-26. I abstracted the storage code away to a PrunePolicyStorage class as (generally) discussed on IRC. I am currently writing integration tests for the PrunePolicyStorage class and will follow those up with unit tests for attemptPruneBySize, attempt... etc. [1]: https://github.com/mozilla-services/android-sync/pull/344
Assignee | ||
Comment 34•11 years ago
|
||
Add 27-29 to the review (they are the unit tests).
Flags: needinfo?(rnewman)
Assignee | ||
Comment 35•11 years ago
|
||
Proposed constants: Time between... Prune by size - daily: We're not sure how much data we might collect in a day so it's good to ensure yesterday's collection was reasonable Expiration - weekly: This isn't urgent persay - this is not attempting to clean up excessive amounts of data, just unused data - so a week seems reasonable Vacuum - monthly: Honestly, no good reason, but I believe that's what desktop does [1]. Without writing an involved benchmarking suite, it may be hard to come up with good estimates. For a rough benchmarking test, I tried a series of inserts (10000) into a table with one integer column followed by a series of deletes (3000 random rows). I noticed (1) vacuuming took longer on a larger file size, despite being unfragmented (as a DB copy, this is to be expected) and (2) vacuuming 7000 events inserted directly was more or less the same as vacuuming 7000 events after 3000 were deleted (from 10000 events). Thus, with this naive usage pattern, it seems the vacuuming frequency doesn't matter too much. However, I imagine this would change with extensive fragmentation in a realistic usage pattern or with the real, more complex database. Event expiration duration: Up to jjensen and how useful the information is that is older than this value. My database on Nightly is roughly 144K and three months old - I would cap this value at 3 months for now and talk in bug 885650 as a followup for what this value should be (or if older data should be aggregated instead). freelist_count/page_count ratio limit - 0.1: Without spending considerable time writing a benchmarking suite, I'm not sure how to get this value, so I took the value from when vacuuming was first implemented on desktop [2]. However, after searching mxr, it seems like they don't use this methodology anymore [3] - would you happen to know why? Are we repeating a past mistake? Max __ count / __ count after pruning... Events/Environments: I am going benchmark how long document generation takes on both modern and crappy hardware with various numbers of these to determine the amount and report back. Again, to save time, I neglected to benchmark the amount of time between vacuums and (with considerably more challenge) the freelist_count/page_count ratio limit. If this is worthwhile to be done before landing or as followup, let me know. What do you think on the values I gave and their motivations? [1]: https://mxr.mozilla.org/mozilla-central/source/storage/src/VacuumManager.cpp?rev=8c5869c1f670#39 [2]: http://hg.mozilla.org/mozilla-central/rev/18a6c37bb588#l1.34 [3]: https://mxr.mozilla.org/mozilla-central/search?string=freelist_count&find=&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central
Assignee | ||
Comment 36•11 years ago
|
||
Benchmarks for max event count on my debug build running on my Galaxy Nexus, 3 runs per event count, where event count is additional events over pre-populated storage (i.e. storage.insertTextualEvents(count)): Event count | Avg. time (millis) 0 48 5000 5345 10000 10182 15000 13877 Ouch. I'm going to rebench on a non-debug build once the clobber runs. Keep in mind this bench is without fragmentation, only with notable entries in the event_textual table, and with unrealistic usage on the other tables. Testing source: http://pastebin.mozilla.org/3039477
Reporter | ||
Comment 37•11 years ago
|
||
(In reply to Michael Comella (:mcomella) from comment #36) > Ouch. I'm going to rebench on a non-debug build once the clobber runs. Assuming I'm reading your test code right, your findings are "document generation is linear in number of events, approx. 1msec per event". That would be good… if we didn't have so many events! Try it just grabbing the event cursor and walking it (no processing), or profiling these runs. I'm curious how much of this is JSON document construction and serialization versus DB walking. But this does imply that right now we should keep no more than ~5000 events, which is ~100 days of session data. On a less capable (read: typical) device, with searches and other recorded data, that probably drops to one month. (Notably, that's less than our release cycle, which implies that we should treat high-volume data differently so as to avoid losing low-volume high-utility events: Bug 885650.)
Flags: needinfo?(rnewman)
Reporter | ||
Comment 38•11 years ago
|
||
(In reply to Michael Comella (:mcomella) from comment #35) > Proposed constants: These all seem sane, but see comments inline. > Time between... > Prune by size - daily: We're not sure how much data we might collect in > a day so it's good to ensure yesterday's collection was reasonable We need to benchmark expiration itself. It *should* be on the order of number of items deleted, in which case we should prune as frequently as possible — we don't want to trash a user's experience by doing a big delete on their single-core phone. > Expiration - weekly: This isn't urgent persay - this is not attempting > to clean up excessive amounts of data, just unused data - so a week seems > reasonable There's also the point that expiration might render size-based pruning unnecessary (and it definitely does for steady-state accrual). The frequency of doing this depends on how cheap it is to figure out if we should do any work (by computing counts). > Vacuum - monthly Note that we shouldn't vacuum without having done a thorough prune first: copying is expensive in all kinds of ways, so we should copy as little as possible. As such, these have a sequence dependency, despite us not wanting to run them back-to-back for performance reasons. My hope is that the free page ratio will pretty much indicate that you've just done a bunch of deletes, and thus we get this for free, but time will tell. Bug 870170 should be addressed very soon after this, so that we can tune these numbers. > freelist_count/page_count ratio limit - 0.1: Without spending considerable > time writing a benchmarking suite, I'm not sure how to get this value, so I > took the value from when vacuuming was first implemented on desktop [2]. > However, after searching mxr, it seems like they don't use this methodology > anymore [3] - would you happen to know why? Are we repeating a past mistake? mak, can you speak to this? > Max __ count / __ count after pruning... > Events/Environments: I am going benchmark how long document generation > takes on both modern and crappy hardware with various numbers of these to > determine the amount and report back. Get a reasonable hysteresis…
Flags: needinfo?(mak77)
Assignee | ||
Comment 39•11 years ago
|
||
Followup to comment 36, Galaxy Nexus, non-debug build [1]: Event count | Avg. time (millis) 0 36 5000 4923 10000 9093 15000 13765 And it's more or less the same. Huh, I wonder if I set up my build environment correctly. In any case, that seems pretty bad - I'll try to get some benchmarks on an older device. I guess it's worth noting that I'm running this in the context of a JUnit test (as it was easier to set up), which could potentially influence the device via extra memory use, garbage collection, and other background processes I'm unaware of. [1]: http://pastebin.mozilla.org/3039853
Assignee | ||
Comment 40•11 years ago
|
||
> Try it just grabbing the event cursor and walking it (no processing), or profiling these runs. > I'm curious how much of this is JSON document construction and serialization versus DB walking. Galaxy Nexus, non-debug build: Event count | Time - 1 run (millis) 0 2 5000 55 10000 133 15000 258 D: Test source: http://pastebin.mozilla.org/3039976
Reporter | ||
Comment 41•11 years ago
|
||
(In reply to Michael Comella (:mcomella) from comment #39) > Followup to comment 36, Galaxy Nexus, non-debug build [1]: > And it's more or less the same. Huh, I wonder if I set up my build > environment correctly. In any case, that seems pretty bad - I'll try to get > some benchmarks on an older device. I/O doesn't care about debug/opt :P > I guess it's worth noting that I'm running this in the context of a JUnit > test (as it was easier to set up), which could potentially influence the > device via extra memory use, garbage collection, and other background > processes I'm unaware of. I doubt it. (In reply to Michael Comella (:mcomella) from comment #40) > D: So raw table walks are cheap... > Test source: http://pastebin.mozilla.org/3039976 ... but make sure you're doing all the same work that the generator does (ordering, grouping, ranges, limits…). Either some of that is turning this into a bad query, or serialization is really expensive.
Comment 42•11 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #38) > > freelist_count/page_count ratio limit - 0.1: Without spending considerable > > time writing a benchmarking suite, I'm not sure how to get this value, so I > > took the value from when vacuuming was first implemented on desktop [2]. > > However, after searching mxr, it seems like they don't use this methodology > > anymore [3] - would you happen to know why? Are we repeating a past mistake? > > mak, can you speak to this? with the very large page size we use on desktop (32k pages) the freelist count was basically never a good ratio to figure out fragmentation, cause pages were hardly being completely free. It *may* work if you use the default 1024 bytes page size. Moreover, even if you have a large freelist, it doesn't mean the db is fragmented (they may all be in the same area), or viceversa, having a few doesn't mean it's not fragmented. For example auto_vacuum moves all of the free pages to the end of the db and removes them automatically, nonetheless auto vacuumed databases are heavily fragmented (that's why we stopped using auto vacuum much time ago). Honestly there is no good ratio to figure out fragmentation, that's why we moved to a guessed time based ratio (1 month), it's a guess in both cases, but at least with time based vacuum we know it will happen.
Flags: needinfo?(mak77)
Reporter | ||
Comment 43•11 years ago
|
||
> with the very large page size we use on desktop (32k pages) the freelist > count was basically never a good ratio to figure out fragmentation, cause > pages were hardly being completely free. It *may* work if you use the > default 1024 bytes page size. We can figure this out, mcomella: http://developer.android.com/reference/android/database/sqlite/SQLiteDatabase.html#getPageSize%28%29 > Moreover, even if you have a large freelist, > it doesn't mean the db is fragmented (they may all be in the same area), or > viceversa, having a few doesn't mean it's not fragmented. We're only partly interested in fragmentation; we also want compaction, because steady-state disk usage is a concern. From that perspective auto_vacuum is good, so this is a tradeoff between auto_vacuum's tendency to fragment and the unbounded growth when you turn it off. > moved to a guessed time based ratio (1 month), it's a guess in both cases, > but at least with time based vacuum we know it will happen. Gotcha.
Reporter | ||
Comment 44•11 years ago
|
||
I should expand: for some users, events will arrive at approximately the same rate, so the free list will do its job and we'll have steady-state disk usage but increasing fragmentation. For those users, the vacuum aims to reduce fragmentation. For irregular users (or unpredictable usage) there will be spiky patterns, and we want to ensure that those peaks don't keep an inflated database size.
Assignee | ||
Comment 45•11 years ago
|
||
> ... but make sure you're doing all the same work that the generator does > (ordering, grouping, ranges, limits…). Event count | Time - 1 run (millis) 0 3 5000 271 10000 394 15000 626 Conclusion: serialization is expensive. Source: http://pastebin.mozilla.org/3046893 Query taken from: https://mxr.mozilla.org/mozilla-central/source/mobile/android/base/background/healthreport/HealthReportDatabaseStorage.java?rev=e881258c3ac5#1224
Assignee | ||
Comment 46•11 years ago
|
||
> It *may* work if you use the default 1024 bytes page size.
SQLiteDatabase.getPageSize() returns 4096 bytes on my Galaxy Nexus.
Even if this were the desired 1024 byte page size, I'm not sure we can guarantee this across devices. There is SQLiteDatabase.setPageSize(), but that must be called on database creation (before any data is inserted).
Reporter | ||
Comment 47•11 years ago
|
||
(In reply to Michael Comella (:mcomella) from comment #46) > Even if this were the desired 1024 byte page size, I'm not sure we can > guarantee this across devices. There is SQLiteDatabase.setPageSize(), but > that must be called on database creation (before any data is inserted). Vacuum also applies page size changes. (In reply to Michael Comella (:mcomella) from comment #45) > Conclusion: serialization is expensive. Sounds like we need some profiling and a follow-up…
Assignee | ||
Comment 48•11 years ago
|
||
Document generation profiling followup: bug 916297.
Assignee | ||
Comment 49•11 years ago
|
||
r? - gh parts 30-31 - Dealing with page sizes for vacuums. Still TODO in vacuuming: only vacuum if we've pruned.
Flags: needinfo?(rnewman)
Assignee | ||
Comment 50•11 years ago
|
||
> Note that we shouldn't vacuum without having done a thorough prune first:
> copying is expensive in all kinds of ways, so we should copy as little as
> possible. As such, these have a sequence dependency, despite us not wanting
> to run them back-to-back for performance reasons.
r? - gh part 32 - Don't vacuum unless we've pruned.
Assignee | ||
Comment 51•11 years ago
|
||
I realized that part 32 could be better tested with a TestPrunePolicyDatabaseStorage class - we could test to ensure we set the "needsCleanup" preference as expected (which is a good idea since we'll never vacuum if this code does not work correctly). I'll add the test.
Comment 53•11 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #47) > Vacuum also applies page size changes. it still may not be a good idea to change it, whatever page size is in use it has very likely been tested and found as the fastest on that storage (iirc it changed recently due to that)
Assignee | ||
Comment 54•11 years ago
|
||
r? - gh parts 30-31 and followup - Only vacuum if pruning has occurred.
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(rnewman)
Assignee | ||
Comment 55•11 years ago
|
||
r? - gh parts 30, followup, 31. Note that part 31 is not tested. https://github.com/mozilla-services/android-sync/pull/344
Assignee | ||
Comment 56•11 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #38) > We need to benchmark expiration itself. It *should* be on the order of > number of items deleted, in which case we should prune as frequently as > possible — we don't want to trash a user's experience by doing a big delete > on their single-core phone. Note that this benchmarking leaves out pruneEnvironments (just because it is harder to insert the appropriate dummy data), but I make the naive assumption that duration of expiration will be at most double the event prune, but probably much less. pruneEvents benchmarking, 1 run, Galaxy Nexus: 10,000 total events Count deleted | Time (millis) 0 105 2500 459 5000 524 7500 601 20,000 total events Count deleted | Time (millis) 0 182 2500 623 5000 804 10000 995 Test source code: https://pastebin.mozilla.org/3090121 Will follow up with an analysis and perhaps more benchmarks.
Assignee | ||
Comment 57•11 years ago
|
||
Some more realistic pruneEvents benchmarks, Galaxy Nexus, 1 run: Total/deleted - time (millis) 20000/2000 - 638 25000/2000 - 783 30000/2000 - 838 And deleteDataBefore (expiration) benchmarks, Galaxy Nexus, 1 run: Total/deleted - time (millis) 20000/2000 - 281 25000/2000 - 422 30000/2000 - 507 20000/5000 - 339 25000/5000 - 421 30000/5000 - 504 Huh. I wonder if we shouldn't just expire data daily (since it seems to run so quickly) and axe pruning based on size. I'll try benching on a slower device.
Reporter | ||
Comment 58•11 years ago
|
||
(In reply to Michael Comella (:mcomella) from comment #57) > Huh. I wonder if we shouldn't just expire data daily (since it seems to run > so quickly) and axe pruning based on size. If the numbers are *tiny* for, e.g., 6000 events, then yeah. Otherwise, weekly?
Flags: needinfo?(rnewman)
Assignee | ||
Comment 59•11 years ago
|
||
(And it seems a Nexus S out-performs my Galaxy Nexus - I think I'll need an older device)
> Huh. I wonder if we shouldn't just expire data daily (since it seems to run
> so quickly) and axe pruning based on size.
Oops, sorry, this didn't make much sense.
Assignee | ||
Comment 60•11 years ago
|
||
r? - gh Part 32: Remove fragmentation-based cleanup code (hope you enjoyed reviewing it the first time too! ;) https://github.com/mozilla-services/android-sync/pull/344
Flags: needinfo?(rnewman)
Assignee | ||
Comment 61•11 years ago
|
||
I tried benchmarking on a Galaxy Chat, a Droid Bionic, and a Sony Ericsson Xperia, the Xperia being the only one that was slower than the Galaxy Nexus on 20000/2000 (330). However, it was faster on 30000/5000 (433 at its worst). Thus, it seems we can use the Galaxy Nexus as a benchmark, at least with these numbers and usage patterns. Notably, as the disk fills up (as it likely will on older devices with smaller disks), access speeds may decrease with increased fragmentation so we should keep this in mind when creating these constants.
Assignee | ||
Comment 62•11 years ago
|
||
Comment 61 refers to the deleteDataBefore (expiration) benchmarks.
Assignee | ||
Comment 63•11 years ago
|
||
Re-evaluation of the constants: Max events: 20,000 - ~1.64 MB (using my analysis from comment 9), and the computation speed seems negligible. The smaller the DB, the faster the vacuums are too. Max environments: 50 - pretty arbitrary, but (via comment 9) I currently have 10, and they seem fairly cheap. Prune-by-size frequency: Daily - probably about 1-2 seconds of IO time a day, doesn't seem too bad to me Expiration frequency: Weekly - doesn't need to be urgent and pruning-by-size makes this less necessary Vacuum: Monthly - it's what desktop does and it's pretty expensive Event expiration duration: 6 months - really up to jjensen, depending on how useful the older information is (as per comment 35). Also bug 885650 to aggregate data. We can leave this at 6 months for now since we have prune-by-size in place. Event/environment count after pruning: TBD - I need to figure out how to hysteresis. :) rnewman, what say you?
Reporter | ||
Comment 64•11 years ago
|
||
(In reply to Michael Comella (:mcomella) from comment #63) > Re-evaluation of the constants: > > Max events: 20,000 - ~1.64 MB (using my analysis from comment 9), and the > computation speed seems negligible. The smaller the DB, the faster the > vacuums are too. Let's halve this. 10k events is a lot. > Max environments: 50 - pretty arbitrary, but (via comment 9) I currently > have 10, and they seem fairly cheap. They're the biggest thing we have, but yeah. > Vacuum: Monthly - it's what desktop does and it's pretty expensive Sure. > Event/environment count after pruning: TBD - I need to figure out how to > hysteresis. :) Heh, just prune down to X once you hit Y, where Y >> X! > rnewman, what say you? WFM.
Flags: needinfo?(rnewman)
Assignee | ||
Comment 65•11 years ago
|
||
r+ via github/IRC/IRL. https://github.com/mozilla-services/android-sync/pull/344
Attachment #807444 -
Flags: review+
Assignee | ||
Comment 66•11 years ago
|
||
https://hg.mozilla.org/integration/fx-team/rev/66541e3d93b7 https://hg.mozilla.org/integration/fx-team/rev/debb2fa19aec
Assignee | ||
Comment 67•11 years ago
|
||
Follow-up concerns: * Provide testing instructions for QA (and ensure it is easily testable) * The added tests can take a while to run since they do a lot of DB operations; this may be a concern when the tests get onto tbpl (bug 709353) - we can look into this if it becomes an issue * Reassess constants (bug 918527)
Assignee | ||
Comment 68•11 years ago
|
||
> https://hg.mozilla.org/integration/fx-team/rev/66541e3d93b7
Whoops, forgot the commit message and reviewer (rnewman).
Comment 69•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/66541e3d93b7 https://hg.mozilla.org/mozilla-central/rev/debb2fa19aec
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 27
Assignee | ||
Comment 70•11 years ago
|
||
# QA Test Plan ## Background info The FHR prune service runs in the background to delete unneeded info and cleanup the FHR database in order to reduce size on disk, upload size, and improve performance. The prune service currently has three specific functions: 1) Delete data from the database when the number of events/environments recorded grows too large ("prune by size") 2) Delete data from the database when that data exceeds a certain age ("expiration") 3) Vacuum the database ("vacuum" or, more generally, "cleanup") The service runs after a certain fixed interval and each function will only trigger after a certain duration has passed (note that this duration is independent for each function). This means the service can potentially run without performing any action. Note also that (3) should only run after (1) or (2) occur (this state is reset after (3) runs) since we only need to vacuum the database if something has been deleted from it. ## Testing plan The service is unit tested well enough that if the service runs, it should (hopefully) run properly. However, it is difficult to unit test the interaction between Android services and thus whether or not it will correctly run. By default, the prune service should run once a day. Using the prune-test-addon [1], you can set this value to much lower (by default, it is 60 seconds). Additionally, menu items are added that enable (1), (2), and (3) to run. When the service runs successfully, there should be some relevant log output from each function with log tag PrunePolicy. If there is no output from PrunePolicy, there may be error output from log tag HealthReportPruneService. If there is no output from either, the service has probably not run and is likely broken. [1]: https://github.com/mcomella/healthreport-prune-test-addon
Whiteboard: [qa+]
Comment 71•11 years ago
|
||
The add-on does not create the buttons because a ")" is missing from the end of https://github.com/mcomella/healthreport-prune-test-addon/blob/master/bootstrap.js#L186. Make sure to add it when testing.
Comment 72•11 years ago
|
||
Tested on Nightly 2013-09-24 using LG Nexus 4 (Android 4.2.2) with a clean profile and sync setup. We don't have any device with an older profile to have a lot of health report data to test this with. We are seeing the broadcast messages being sent, "preferences committed" and the alarm set to 60000, but we don't see any messages with the tags "PrunePolicy" or "HealthReportPruneServices". Please see the attached logs. Is there a way to create healthreportdata on a clean profile? Please let us know if we can help in any other way.
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(michael.l.comella)
Assignee | ||
Comment 73•11 years ago
|
||
> The add-on does not create the buttons... My apologies - I pushed an update to github. > Is there a way to create healthreportdata on a clean profile? Quickly, no. By hand, each search in the awesomebar will generate a "search" event. If the device or browser configuration changes, a new "environment" will be created. Notably, adding and removing addons will add entries to the "addons" table and create a new "environment". For specifics on what constitutes an environment, see [1]. If it would help, I can give you my FHR database from my personal Nightly, which you should be able to insert into the Fennec profile by hand on a rooted device. > but we don't see any messages with the tags "PrunePolicy" or "HealthReportPruneServices". I'll look into this. [1]: https://mxr.mozilla.org/mozilla-central/source/mobile/android/base/background/healthreport/Environment.java?rev=79088e422daf#42
Flags: needinfo?(michael.l.comella)
Comment 74•11 years ago
|
||
Can you please provide us your FHR database?
Flags: needinfo?(michael.l.comella)
Assignee | ||
Comment 75•11 years ago
|
||
This should go right into the base Mozilla profile directory as "health.db".
Flags: needinfo?(michael.l.comella)
Assignee | ||
Comment 76•11 years ago
|
||
> > but we don't see any messages with the tags "PrunePolicy" or "HealthReportPruneServices".
It turns out this was an issue with GeckoLogger's verbosity. By running,
`adb shell setprop log.tag.GeckoHealth VERBOSE` and restarting the Fennec process, the PrunePolicy and HealthReportPruneService output should be visible.
Note that the test addon has been updated again and you should `git pull` before testing.
Comment 78•11 years ago
|
||
We tested on the Samsung Galaxy Note (Android 4.0.4) and copied the database and made sure that the data was copied using the dump command. We tested all three options of the add-on and you can find the output in the log file. We didn't see any error messages. On HTC Desire (Android 2.2 ) the add-on does not add the buttons into the menu list.
Updated•6 years ago
|
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•