Closed Bug 1126244 Opened 7 years ago Closed 9 months ago

Create a maximum reader mode cache size and evict records when necessary

Categories

(Firefox for Android Graveyard :: Reader View, defect)

35 Branch
All
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: Margaret, Assigned: vivek)

References

Details

Attachments

(2 files, 2 obsolete files)

Attached image Pocket screenshot
In bug 1113454, we're going to start automatically downloading and storing content for reading list items. However, we don't currently support any maximum cache size, so the user's storage space could theoretically fill up with reading list articles.

antlam, do you have any ideas about how we should expose this to the user? I attached a screenshot of the Pocket UI for doing this.
Flags: needinfo?(alam)
I've been thinking about this and it seems like it should be in Settings, under it's own item. There, we can tuck a lot of these settings for Reading List/Mode.
Flags: needinfo?(alam)
Assignee: nobody → margaret.leibovic
Depends on: 1129029
tracking-fennec: --- → ?
I started looking into this, and I was thinking it would be pretty straightforward to add logic for this in ReaderMode.jsm, but then I realized that we need some extra information in order to know which articles to purge from the cache.

So, maybe ReaderMode.jsm should stay pretty dumb. We could add helper methods to get back the size of the cache directory, but then maybe the reading list client logic should be in charge of saying which articles are okay to boot from the cache if we need to make more room.

We currently have some limits in place for how big of a DOM we are willing to parse, but I wonder how large these cache files are in practice. Sounds like a good place for telemetry probes! I could create a patch with some probes that we can uplift to get some more data, since luckily this cache refactoring is in 37. We won't have that much data from only Android users, but it could be interesting.
Flags: needinfo?(rnewman)
I mostly agree that the cache itself should be dumb; eviction should be largely based on the state of the RL item (archived, deleted, unread), which the cache doesn't know.

(The "mostly" is because _in principle_ this should be a cache, and caches can be re-filled when necessary. That's not the case here for two reasons: things on the internet have a bad habit of going away, and also we're not always online when we'd need to re-fill it. So we can/should do better than a strict cache -- we should probably call this an 'archive', not a 'cache'!)


To do a good job of eviction we need to know:

* How much space we want to reclaim:
  * The current cache size
  * The desired max
* The entire set of records, including:
  * Their state (=> to what extent we're OK expiring each item)
  * Their size on disk (because we'd rather throw away one big item than ten small ones).

We don't track the cached size in the DB, but we do track the state and also the content status (fetched, failed).

So things to do:

* Add telemetry, yes. Both individual item sizes and also total cache size. I'll file a dep.

* Figure out how we're going to do eviction: either have the cache send sizes to Java, or have Java send a preference list to JS, and get back a message saying what happened. Let's leave that as this bug.

* Update the database when we do evict. We'd need a content status like "evicted". We'll need to do that for this bug.

* Add a mechanism for removing a single item from the cache when it's deleted from the CP. I'll file a follow-up bug for that.
Flags: needinfo?(rnewman)
Summary: Create a maximum reader mode cache size → Create a maximum reader mode cache size and evict records when necessary
Depends on: 1133157
See Also: → 1133158
tracking-fennec: ? → 38+
Blocks: 1138494
Assignee: margaret.leibovic → vivekb.balakrishnan
Attached patch 1126244_part1.patch (obsolete) — Splinter Review
First part of the patch:
* Added new status EVICTED 
* Added basic messaging to request for freeing the cache from Java passing a list of urls
* Added response from JS to the discarded items as EVICTED
Attachment #8577286 - Flags: review?(margaret.leibovic)
Comment on attachment 8577286 [details] [diff] [review]
1126244_part1.patch

Review of attachment 8577286 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, this is going in the right direction.

Some questions we need to answer:
* When are we going to actually send this "Reader:ClearCache" message to JS?
* Should we check if we need to free up space when we're storing new items in the cache?
* How are we going to incrementally clear items, rather than just blow away the entire cache?
* What should the lifecycle look like for an item in the cache? If we add a new STATUS_EVICTED, we'll never add an item back to the cache, but I don't think that's the correct behavior. However, we don't want to evict things from the cache, and then in the next content provider change try adding those items right back to the cache. This means that we likely need to change the "Reader:FetchContent" logic.

::: mobile/android/base/ReadingListHelper.java
@@ +272,5 @@
> +        final String[] removed = message.optStringArray("urls", null);
> +        ThreadUtils.postToBackgroundThread(new Runnable() {
> +            @Override
> +            public void run() {
> +                readingListAccessor.markReadingListItemsAsEvicted(context.getContentResolver(), removed);

Instead of creating a new message and a new ReadingListAccessor API, we should be able to just update the items with a new status. This should happen automatically if we send a "Reader:UpdateList" message with the URL and new status.

::: mobile/android/base/db/BrowserContract.java
@@ +377,5 @@
>          public static final int STATUS_FETCH_FAILED_TEMPORARY = 1;
>          public static final int STATUS_FETCH_FAILED_PERMANENT = 2;
>          public static final int STATUS_FETCH_FAILED_UNSUPPORTED_FORMAT = 3;
>          public static final int STATUS_FETCHED_ARTICLE = 4;
> +        public static final int STATUS_EVICTED = 5;

For simplicity, I think we could just use STATUS_UNFETCHED here. However, this does raise on interesting question about our fetching logic - we should try to avoid getting into a cycle of trying to fetch content for items, but then finding out we have no room in the cache.

::: mobile/android/base/db/LocalReadingListAccessor.java
@@ +71,5 @@
>      }
>  
>      @Override
> +    public Cursor getReadingListToDiscard(ContentResolver cr) {
> +        // Return fetched, un-evicted, non-deleted, non-archived items, sorted by date added, oldest first.

You don't need to say "un-evicted" here, since fetched items cannot be evicted.

@@ +77,5 @@
> +        return cr.query(mReadingListUriWithProfile,
> +                new String[] { ReadingListItems.URL },
> +                ITEMS_TO_EVICT,
> +                null,
> +                SORT_ORDER_OLDEST_FIRST);

Nit: Indent these lines to align with mReadingListUriWithProfile (similar to the other methods around here).

::: mobile/android/chrome/content/Reader.js
@@ +40,5 @@
> +        let removed = [];
> +        for (let index = 0; index < urls.length; index++) {
> +          ReaderMode.removeArticleFromCache(urls[index]).catch(e => Cu.reportError("Error removing article from cache: " + e));
> +          removed.push(urls[index]);
> +        }

If I'm following this correctly, this logic will remove all articles from the cache, which I don't think is what we want.

I think we should add a new method to ReaderMode.jsm that takes this array of URLs and starts removing things until we have a certain amount of free space available (since Reader.js shouldn't know about the cache implementation details).
Attachment #8577286 - Flags: review?(margaret.leibovic) → feedback+
Attached patch 1126244.patch (obsolete) — Splinter Review
A new patch to clear earliest added N items from the cache if it is full.
As agreed over irc, max cache size is fixed at 1MB
The current implementation verifies that the cache limit has not exceeded after new contents are downloaded. It would evict items from cache based on their size starting from the largest.
Attachment #8577286 - Attachment is obsolete: true
Attachment #8579677 - Flags: review?(margaret.leibovic)
Comment on attachment 8579677 [details] [diff] [review]
1126244.patch

Review of attachment 8579677 [details] [diff] [review]:
-----------------------------------------------------------------

I want to spend some more time looking at this, but I also want to share these comments sooner rather than later. I think it would also be good for rnewman to look at this.

::: mobile/android/base/ReadingListHelper.java
@@ +93,5 @@
>                  handleReadingListStatusRequest(callback, message.getString("url"));
>                  break;
>              }
> +            case "Reader:DiscardedFromCache": {
> +                handleDiscardedItems(message);

Instead of creating a new message and new methods to handle this, let's just refactor "Reader:UpdateList" to handle updating multiple items, and just update the status of all those items.

@@ +294,5 @@
>          ThreadUtils.postToBackgroundThread(new Runnable() {
>              @Override
>              public void run() {
>                  final Cursor c = readingListAccessor.getReadingListUnfetched(context.getContentResolver());
> +                int count = c.getCount();

Nit: final.

@@ +326,5 @@
> +        try {
> +            while (c.moveToNext() && count > 0) {
> +                String url = c.getString(c.getColumnIndexOrThrow(ReadingListItems.URL));
> +                urlList.add(url);
> +                count--;

Instead of keeping track of a count like this, you could just make getReadingListToDiscard take a count parameter, then have the query use a LIMIT to only return the desired number of items.

::: mobile/android/base/db/LocalReadingListAccessor.java
@@ +163,5 @@
> +
> +        final int updated =  cr.update(mReadingListUriWithProfile, values, selection, urls);
> +
> +        Log.d(LOG_TAG, "Updated " + updated + " reading list rows to evicted state.");
> +    }

If you refactor the update methods as I mentioned previously, you can get rid of this new markReadingListItemsAsEvicted method.

::: mobile/android/chrome/content/Reader.js
@@ +39,5 @@
> +      case "Reader:ClearCache": {
> +        let urls = JSON.parse(aData);
> +        let removed = [];
> +        let totalSize = 0;
> +        let sizes = new Array();

Why the inconsistency between how you create new empty arrays here (removed vs. sizes)?

@@ +44,5 @@
> +
> +        Promise.all(urls.map(ReaderMode.getArticleSizeFromCache.bind(ReaderMode))).then(fileSizes => {
> +          // Order the returned sizes in descending order.
> +          sizes = fileSizes.sort(function (a, b) {
> +            return a['size'] < b['size'] ? 1 : a['size'] > b['size'] ? -1 : 0;

Whoa, this is tricky. I think we should just some local variables to clarify what's going on here :)

@@ +51,5 @@
> +          ReaderMode.getTotalCacheSize().then(entrySize => {
> +            // Calculate the total size.
> +            for (let index = 0; index < entrySize.length; index++) {
> +              totalSize = totalSize + entrySize[index];
> +            }

This is a bit confusing... I think we should just change getTotalCacheSize() to resolve to the total cache size, not an array.

::: toolkit/components/reader/ReaderMode.jsm
@@ +204,5 @@
> +      throw e;
> +    });
> +
> +    this.log("Paths found = " + paths.length);
> +    return Promise.all(paths.map(this._getSizeFromPath));

I believe you could use .reduce here instead of .map to return the total size, instead of an array.
Attachment #8579677 - Flags: feedback?(rnewman)
Comment on attachment 8579677 [details] [diff] [review]
1126244.patch

* Addressed review comments from Margaret.
* As agreed over irc, retained the existing api to update content status to evicted state instead of using the generic updateList api.
*now the file size in cache are found using resoved_url and reading list items are updated using id.
Attachment #8579677 - Flags: review?(margaret.leibovic)
Attachment #8579677 - Flags: feedback?(rnewman)
Attachment #8579677 - Flags: feedback?
Attachment #8579677 - Flags: review?(margaret.leibovic) → review?(rnewman)
Status: NEW → ASSIGNED
Comment on attachment 8579677 [details] [diff] [review]
1126244.patch

Review of attachment 8579677 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/ReadingListHelper.java
@@ +268,5 @@
>      /**
> +     * Updates a reading list items for the removed urls to evicted state.
> +     */
> +    private void handleDiscardedItems(final NativeJSObject message) {
> +        final String[] removed = message.optStringArray("urls", null);

If you're using optStringArray, you should be prepared for it to not be present, and short-cut in that case.

@@ +321,5 @@
> +     */
> +    private void freeCacheIfLimitExceeded(int count) {
> +        ThreadUtils.assertOnBackgroundThread();
> +        final Cursor c = readingListAccessor.getReadingListToDiscard(context.getContentResolver());
> +        final List<String> urlList = new ArrayList<>();

You should allocate this prior to creating the cursor, or inside the try. If you OOM here, the cursor won't be closed.

@@ +323,5 @@
> +        ThreadUtils.assertOnBackgroundThread();
> +        final Cursor c = readingListAccessor.getReadingListToDiscard(context.getContentResolver());
> +        final List<String> urlList = new ArrayList<>();
> +        try {
> +            while (c.moveToNext() && count > 0) {

As Margaret mention, just use a limit on the query.

@@ +324,5 @@
> +        final Cursor c = readingListAccessor.getReadingListToDiscard(context.getContentResolver());
> +        final List<String> urlList = new ArrayList<>();
> +        try {
> +            while (c.moveToNext() && count > 0) {
> +                String url = c.getString(c.getColumnIndexOrThrow(ReadingListItems.URL));

Get the column index once.

::: mobile/android/base/db/LocalReadingListAccessor.java
@@ +71,5 @@
>      }
>  
>      @Override
> +    public Cursor getReadingListToDiscard(ContentResolver cr) {
> +        // Return fetched, un-evicted, non-deleted, non-archived items, sorted by date added, oldest first.

Are you assuming that archived or deleted items have already been removed from the cache?

Is that an accurate assumption?

@@ +74,5 @@
> +    public Cursor getReadingListToDiscard(ContentResolver cr) {
> +        // Return fetched, un-evicted, non-deleted, non-archived items, sorted by date added, oldest first.
> +        // This allows us to discard items that are at the bottom of list.
> +        return cr.query(mReadingListUriWithProfile,
> +                new String[] { ReadingListItems.URL },

You should return _id here. It's the accurate way to remove something from the list.

Consider also that resolved_url is really the key for the cache -- all items in the cache have been resolved, and the document URL probably wasn't the input URL.

@@ +157,5 @@
> +            return;
> +        }
> +
> +        final ContentValues values = new ContentValues();
> +        values.put(ReadingListItems.CONTENT_STATUS, ReadingListItems.STATUS_EVICTED);

This can be constant. See the similar thing we do for deletion in ReadingListProvider.

@@ +160,5 @@
> +        final ContentValues values = new ContentValues();
> +        values.put(ReadingListItems.CONTENT_STATUS, ReadingListItems.STATUS_EVICTED);
> +        final String selection = DBUtils.computeSQLInClause(urls.length, ReadingListItems.URL);
> +
> +        final int updated =  cr.update(mReadingListUriWithProfile, values, selection, urls);

Use IDs instead of URLs.

::: mobile/android/chrome/content/Reader.js
@@ +47,5 @@
> +          sizes = fileSizes.sort(function (a, b) {
> +            return a['size'] < b['size'] ? 1 : a['size'] > b['size'] ? -1 : 0;
> +          });
> +
> +          ReaderMode.getTotalCacheSize().then(entrySize => {

You should have a rejection handler for this promise.
Attachment #8579677 - Flags: review?(rnewman)
Attachment #8579677 - Flags: review?(margaret.leibovic)
Attachment #8579677 - Flags: feedback?
Attachment #8579677 - Flags: feedback+
Attached patch 1126244.patchSplinter Review
It seems that I'd uploaded the wrong patch last time.
Anyway, this new patch addresses the review comments from both Margaret and Richard
In short this patch does
* Added LIMIT to fetch query.
* File sizes are found using resoved_url and reading list items are updated using id.
* Changed the optStringArray to StringArray as the Reader.js always sends removed items in DiscardedFromCache message
Attachment #8579677 - Attachment is obsolete: true
Attachment #8581832 - Flags: review?(rnewman)
Attachment #8581832 - Flags: feedback?(margaret.leibovic)
 
> ::: mobile/android/base/db/LocalReadingListAccessor.java
> @@ +71,5 @@
> >      }
> >  
> >      @Override
> > +    public Cursor getReadingListToDiscard(ContentResolver cr) {
> > +        // Return fetched, un-evicted, non-deleted, non-archived items, sorted by date added, oldest first.
> 
> Are you assuming that archived or deleted items have already been removed
> from the cache?
> 
> Is that an accurate assumption?
> 

Yes, I assumed that they will be addressed in separate bugs if it is yet to be implemented.
Comment on attachment 8581832 [details] [diff] [review]
1126244.patch

Review of attachment 8581832 [details] [diff] [review]:
-----------------------------------------------------------------

This is looking pretty good, but I have some doubts about exactly when we should be triggering this.

I think we should be trying to clean up:

1. When we get storage pressure events.
2. When we add content to the cache, we should estimate whether we're well over a useful threshold.

I think we should consider:

1. How much storage the user is allowing (it's not universal and hard-coded!)
2. How much free space there is.


That is: if I'm on a 64GB device, let me cache every reading list item. If I'm on a Shitphone, we might need to be really aggressive, and only keep your most recent small articles.

::: mobile/android/base/ReadingListHelper.java
@@ +261,5 @@
>          });
>      }
>  
>      /**
> +     * Updates a reading list items for the removed urls to evicted state.

Marks items that were discarded as evicted in the DB.

@@ +307,5 @@
>                  } finally {
>                      c.close();
>                  }
> +
> +                freeCacheIfLimitExceeded(count);

This doesn't seem right to me.

So I start with a reading list with two items in. I add three items via the share overlay. I launch Fennec.

This task tries to fetch content. It sees I've added three items, so it immediately throws away the two older items I have.

I strongly feel that the decision about when to throw stuff away should be informed by disk space concerns, not by numbers of items.

::: mobile/android/base/db/LocalReadingListAccessor.java
@@ +30,2 @@
>      private static final String SORT_ORDER_RECENT_FIRST = "COALESCE(" + ReadingListItems.SERVER_STORED_ON + ", " + ReadingListItems.ADDED_ON + ") DESC";
> +    private static final String SORT_ORDER_OLDEST_FIRST = "COALESCE(" + ReadingListItems.SERVER_STORED_ON + ", " + ReadingListItems.ADDED_ON + ") ASC  LIMIT ";

You should document that you're using an awful, evil hack here -- that is, including a LIMIT clause in a sortOrder string.

I would much rather have the LIMIT passed as URL parameter to the ContentResolver, which IIRC we do elsewhere.

::: mobile/android/base/db/ReadingListAccessor.java
@@ +24,5 @@
>      int getCount(ContentResolver cr);
>  
>      Cursor getReadingListUnfetched(ContentResolver cr);
>  
> +    Cursor getReadingListToDiscard(ContentResolver cr, int count);

I'd use wording like "getReadingListDiscardCandidates" here -- we're not committing to discarding them, right?

::: mobile/android/chrome/content/Reader.js
@@ +14,5 @@
>    STATUS_FETCH_FAILED_PERMANENT: 2,
>    STATUS_FETCH_FAILED_UNSUPPORTED_FORMAT: 3,
>    STATUS_FETCHED_ARTICLE: 4,
> +  STATUS_EVICTED: 5,
> +  MAX_CACHE_SIZE: 1048576, // 1MB (1024 * 1024)

That seems too small. Consider trying to be smarter here -- can we get free disk space? Are we storing in the Android cache?

@@ +51,5 @@
> +
> +        Promise.all(items.map(ReaderMode.getArticleSizeFromCache.bind(ReaderMode))).then(fileSizes => {
> +          // Order the returned sizes in descending order.
> +          sizes = fileSizes.sort(function (previous, current) {
> +            if (previous['size'] < current['size']) {

Always double quotes for strings.

@@ +63,5 @@
> +
> +          // Calculate the total size.
> +          ReaderMode.getTotalCacheSize().then(totalSize => {
> +            // Remove items from the sorted sizes until cache condition is invalid.
> +            while (totalSize > this.MAX_CACHE_SIZE && sizes.length > 0) {

&& sizes.length

@@ +67,5 @@
> +            while (totalSize > this.MAX_CACHE_SIZE && sizes.length > 0) {
> +               let itemToEvict = sizes.pop();
> +               ReaderMode.removeArticleFromCache(itemToEvict['url']).catch(e => Cu.reportError("Error removing article from cache: " + e));
> +               totalSize = totalSize - itemToEvict['size'];
> +               removed.push(itemToEvict['id']);

Pretty sure you don't want to do this if the removal failed...?

@@ +71,5 @@
> +               removed.push(itemToEvict['id']);
> +            }
> +
> +            // Only update Java side if cached items were discarded.
> +            if (removed.length > 0) {

if (removed.length) {

::: toolkit/components/reader/ReaderMode.jsm
@@ +188,5 @@
> +   * Get total size of cache.
> +   *
> +   * @return {Promise}
> +   * @resolves When the total size of all files in cache is calculated.
> +   * @rejects OS.File.Error

Philosophical questions: do we need to calculate the entire size?

Do we need to calculate it fresh every time?

@@ +197,5 @@
> +    let dir = OS.Path.join(OS.Constants.Path.profileDir, "readercache");
> +    let iterator = new OS.File.DirectoryIterator(dir);
> +
> +    yield iterator.forEach(entry => {
> +        paths.push(entry.path);

Two fewer spaces of indent.
Attachment #8581832 - Flags: review?(rnewman)
Attachment #8581832 - Flags: feedback?(margaret.leibovic)
Attachment #8581832 - Flags: feedback+
With reading list sync disabled, this is a lower priority, although I still think it's important to land this, since we can still use a lot of storage for the local reading list cache.
tracking-fennec: 38+ → ?
tracking-fennec: ? → Nightly+
tracking-fennec: Nightly+ → ---
We're not going to continue investing in the reading list, so I don't think we should do this as planned.

However, I do think we should consider continuing to cache bookmarked reader view pages, and if we do that, we'll want to way to make sure we evict these items from the cache.
Blocks: 1234328
We have completed our launch of our new Firefox on Android. The development of the new versions use GitHub for issue tracking. If the bug report still reproduces in a current version of [Firefox on Android nightly](https://play.google.com/store/apps/details?id=org.mozilla.fenix) an issue can be reported at the [Fenix GitHub project](https://github.com/mozilla-mobile/fenix/). If you want to discuss your report please use [Mozilla's chat](https://wiki.mozilla.org/Matrix#Connect_to_Matrix) server https://chat.mozilla.org and join the [#fenix](https://chat.mozilla.org/#/room/#fenix:mozilla.org) channel.
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → INCOMPLETE
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.