Closed Bug 721898 Opened 8 years ago Closed 8 years ago

Remove history truncation code

Categories

(Firefox for Android :: General, defect, P2)

11 Branch
ARM
Android
defect

Tracking

()

RESOLVED FIXED
Firefox 14
Tracking Status
firefox11 --- affected
firefox12 --- affected
blocking-fennec1.0 --- beta+
fennec 11+ ---

People

(Reporter: elan, Assigned: gcp)

References

(Depends on 1 open bug)

Details

(Keywords: perf)

Attachments

(1 file)

mconnor mentioned this in showcase and we all thought it would be a good idea. this bug is to give it legs.
Is this heading to beta?
tracking-fennec: --- → ?
Keywords: perf
Assignee: nobody → gpascutto
tracking-fennec: ? → 11+
Priority: -- → P2
We already truncate History (at 250 entries), don't we?
I think the limit should be as high as possible for the device. Some users have 250 Gmail history items!
Blocks: 708005
gcp mentioned that apparently Android's own browser doesn't truncate bookmarks. I'm not sure if that's true, because it uses the same table for both, but I don't know how it prunes.

IMO we shouldn't truncate bookmarks, at least for any sane quantity; yes, there are some users with 30,000 bookmarks, but we should probably limit how many we download in Sync (as we plan to for History: Bug 720304), rather than arbitrarily deleting some ein Fennec.

Truncating history is another matter, because it's more likely that users will have a lot of it, but I don't have any numbers for what quantities have what effect on UX, and whether this is a problem that would be better solved by database tuning and custom indices.
>gcp mentioned that apparently Android's own browser doesn't truncate bookmarks. 
>I'm not sure if that's true, because it uses the same table for both, but I don't 
>know how it prunes.

http://hi-android.info/src/android/provider/Browser.java.html
see truncateHistory

lucasr reimplemented this in LocalBrowserDB. I gather from the discussion on IRC that this should have been in BrowserProvider instead, as Sync won't use BrowserDB.
Summary: Truncate History and Bookmark items to 300 for Sync in Fennec Native → Truncate History to 300 items in BrowserProvider
(In reply to Gian-Carlo Pascutto (:gcp) from comment #5)

> lucasr reimplemented this in LocalBrowserDB. I gather from the discussion on
> IRC that this should have been in BrowserProvider instead, as Sync won't use
> BrowserDB.

On the other hand, the next history visit generated by Fennec will cause a truncation, no?
Yes. But Sync (and Profile Migration, once it's been rewritten) will want to know what the internal limit is, so they don't start pushing stuff in the DB that will be deleted immediately after.

This functionality was added to BrowserDB last week, so this is also something that needs to move to BrowserProvider instead.
(In reply to Gian-Carlo Pascutto (:gcp) from comment #7)
> Yes. But Sync (and Profile Migration, once it's been rewritten) will want to
> know what the internal limit is, so they don't start pushing stuff in the DB
> that will be deleted immediately after.

The limit doesn't help us here for anything but an upper bound on how many records to fetch. We might be pulling down a record that should replace whatever's in the database.

I wrote down some notes on data loss bugs here:

https://bugzilla.mozilla.org/show_bug.cgi?id=720304#c1
(In reply to Richard Newman [:rnewman] from comment #8)

> The limit doesn't help us here for anything but an upper bound on how many
> records to fetch. We might be pulling down a record that should replace
> whatever's in the database.

The implication of what I just wrote: whatever pruning Fennec does, it should be based on frecency, not recency.

And the implication of my comments in the other bug: pruning should be elastic and delayed, so that a sync doesn't trigger truncation.
>The limit doesn't help us here for anything but an upper bound on how many records 
>to fetch. We might be pulling down a record that should replace whatever's in the 
>database.

You can resolve in Sync what records will end up living inside the limit and push only those. Saves you some inserts into the DB. But see also below.

>The implication of what I just wrote: whatever pruning Fennec does, it should be based on 
>frecency, not recency.

Don't you risk running into some kind of lock where all sites have high visits counts * reasonable recency, so there's no actual room to add new sites that you've just visited once? Because you prune them, their visit count can't go up, etc....

>And the implication of my comments in the other bug: pruning should be elastic and 
>delayed, so that a sync doesn't trigger truncation.

That's problematic. Presuming that Sync runs in the background, it runs the risk that something in the foreground triggers a truncation in the history while it's busy. Currently, this will happen when visiting a site, for example.
(In reply to Gian-Carlo Pascutto (:gcp) from comment #10)

> Don't you risk running into some kind of lock where all sites have high
> visits counts * reasonable recency, so there's no actual room to add new
> sites that you've just visited once? Because you prune them, their visit
> count can't go up, etc....

Yes. This is why pruning sucks; we can't keep everything, so do we keep useful things or recent things? A bias in either direction is bad. And of course pretty much every sync user is going to immediately hit whatever history threshold we set; I doubt there's a few percent of desktop users that doesn't have more than 250 history items.

We just have to make sure that Fennec's frecency algorithm is useful and addresses this situation. There will still be sites that are visited _just_ infrequently enough to slip through the net, of course, no matter what tuning happens. This is a real UX concern for Fennec. People care about both ends of the spectrum.

These concerns are why I'm in favor of database tuning and applying indexing techniques to get our truncation limit as high as possible, or even making history expiration a tunable parameter for users: the higher it is, the less likely users are to encounter undesirable behaviors.

> >And the implication of my comments in the other bug: pruning should be elastic and 
> >delayed, so that a sync doesn't trigger truncation.
> 
> That's problematic. Presuming that Sync runs in the background, it runs the
> risk that something in the foreground triggers a truncation in the history
> while it's busy. Currently, this will happen when visiting a site, for
> example.

Yup.

What we could do is have two related pruning regimes. For example, Fennec's UI actions could prune down to 600 during use.

During routine use by Sync users, Sync itself could notify the content provider that a sync is complete (delete with a custom URI parameter on the CP URI) to say "OK, you can prune down to the lower limit now".

Soon after startup, if Fennec decides that Sync isn't configured (a check it does already in order to show the sign-up banner) it can prune down to 300.

The consequence of this is that most users will hover around the 300-record mark, spiking to 600 if they are heavy users without quitting Fennec (but that actually makes a lot of sense to me! Keep my history if I'm still browsing!).

And we can tune those numbers up as we make things better.
Target Milestone: Firefox 11 → ---
Just a nnote. We do not want to truncate to a tiny number. It will kill the usefullness of the awesomebar. We need to try to save as much data as possible. Bug 725914 fixed a slow query that affected the speed of awesomebar and top-sites. Let's re-evaluate based on the current speed to get a higher number. 300 is way too few IMO.
Summary: Truncate History to 300 items in BrowserProvider → Truncate history to a manageable number of items in BrowserProvider
Note that Sync currently limits its fetched items to match Fennec. Whatever we tune this to, we should adjust Sync's limit to match.
(In reply to Richard Newman [:rnewman] from comment #13)
> Note that Sync currently limits its fetched items to match Fennec. Whatever
> we tune this to, we should adjust Sync's limit to match.

Richard, maybe the limit should be fetched from the Content Provider instead of hardcoding it in sync?
(In reply to Lucas Rocha (:lucasr) from comment #14)

> Richard, maybe the limit should be fetched from the Content Provider instead
> of hardcoding it in sync?

That works for me. (I try not to sign other people up for work in *every* bug! :D)
No longer blocks: 723257
Depends on: 723257
Just capturing the current thinking in a comment: The ideal situation would be not truncating at all. We currently use a 100 item LIMIT in the SQL query.
(In reply to Mark Finkle (:mfinkle) from comment #16)
> Just capturing the current thinking in a comment: The ideal situation would
> be not truncating at all. We currently use a 100 item LIMIT in the SQL query.

Should I regard that as the plan of record, and start planning larger-limit-with-batched-downloads for Sync?

I'm guessing this is why this bug now depends on in-database frecency values, mm?
(In reply to Richard Newman [:rnewman] from comment #17)
> (In reply to Mark Finkle (:mfinkle) from comment #16)
> > Just capturing the current thinking in a comment: The ideal situation would
> > be not truncating at all. We currently use a 100 item LIMIT in the SQL query.
> 
> Should I regard that as the plan of record, and start planning
> larger-limit-with-batched-downloads for Sync?

Depends on the performance we see with Sync'd DBs. We will be watching and adjust as needed, but planning couldn't hurt.

> I'm guessing this is why this bug now depends on in-database frecency
> values, mm?

We initially thought in-provider frecency bug was about helping with truncation - getting the best data. We thought a no-limit DB would not bee the in-provider frecency. Is that wrong?
(In reply to Mark Finkle (:mfinkle) from comment #18)

> Depends on the performance we see with Sync'd DBs. We will be watching and
> adjust as needed, but planning couldn't hurt.

Filed Bug 730142.

I'm in no hurry to raise our fetch limit right now, because it makes the initial sync more expensive, but it's good to know that what we fetch will stick around!

> We initially thought in-provider frecency bug was about helping with
> truncation - getting the best data. We thought a no-limit DB would not need
> the in-provider frecency. Is that wrong?

Having frecency in the DB will make a LIMIT query more useful, just as it would make truncation less damaging. It's not required, but I would imagine that it would help.

Unless you're retrieving everything and sorting in memory, having some way to select the most valuable rows can only improve the experience (and reduce the amount of runtime calculation of scores -- you query more often than you write).
blocking-fennec1.0: --- → +
Status: NEW → ASSIGNED
Let's try to match our default behavior and not truncate at all.
Status: ASSIGNED → RESOLVED
blocking-fennec1.0: + → -
Closed: 8 years ago
Resolution: --- → WONTFIX
>Let's try to match our default behavior and not truncate at all.

That's not what is going to happen: the BrowserDB/LocalDB does the truncation. So there will be no truncation until anything from the GUI calls through that API, at which point we will cut to 250 entries.

If you don't want a cut, at the very least the cut in BrowserDB must go.

If the current behavior (which is essentially racy!) is OK, feel free to re-close the bug.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
There, that title should be more accurate.
Summary: Truncate history to a manageable number of items in BrowserProvider → Specify history truncation behavior, ensure that implementation matches specification
making the title even more clear
Summary: Specify history truncation behavior, ensure that implementation matches specification → Remove history truncation code
Depends on: 743923
Note that without this patch, we will cut the history DB and (for users without sync) cause all history to be lost, so this is essentially a dataloss bug.
Attachment #614203 - Flags: review?(lucasr.at.mozilla)
Comment on attachment 614203 [details] [diff] [review]
Patch 1. Remove history truncation code

Review of attachment 614203 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Given the data-loss aspect, maybe you should num this bug again to be reconsidered as a blocker? It's a low-risk change anyway.
Attachment #614203 - Flags: review?(lucasr.at.mozilla) → review+
Agree on re-nominating it as a beta blocker. I think the - in comment 21 was based on a misunderstanding of the current behavior anyway.
blocking-fennec1.0: - → ?
blocking-fennec1.0: ? → beta+
https://hg.mozilla.org/mozilla-central/rev/1c3e7cbe44ee
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 14
You need to log in before you can comment on or make changes to this bug.