Closed Bug 1246159 Opened 4 years ago Closed 4 years ago

Create histogram probe for disk space used by saved reader view cache

Categories

(Firefox for Android :: General, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 49
Tracking Status
firefox49 --- fixed

People

(Reporter: barbara, Assigned: ahunt)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

No description provided.
ahunt, you should get familiar with our reader view telemetry situation :)

I would put this probe somewhere in delayed startup, near the other histogram probes we have:
http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/browser.js#603

We create a "readercache" directory in ReaderMode.jsm:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/reader/ReaderMode.jsm?force=1#442

The number of items in the cache should be the number of files in this directory.

You could expose a helper method on ReaderMode.jsm to help you get at this information more easily.
Assignee: nobody → ahunt
Attachment #8718564 - Flags: review?(margaret.leibovic)
Comment on attachment 8718564 [details]
MozReview Request: Bug 1246159 - Pre: import cleanup r=me

https://reviewboard.mozilla.org/r/34601/#review31315

::: toolkit/components/reader/ReaderMode.jsm:461
(Diff revision 1)
> +      return iterator.forEach(

Iterating through a directory on startup seems like a bad idea, even though that's what I suggested myself :/

I'm curious how much we really care about getting details on the high end here. Maybe we should just stop at 50 or so and make that the max?

Alternately, should we add another probe record the size of the cache directory? That would help us answer questions about how much disk space we're taking up on users' devices.

::: toolkit/components/telemetry/Histograms.json:3427
(Diff revision 1)
> +    "n_buckets": 10,

You can also add `"cpp_guard": "ANDROID"` since this probe is only used on Android (I'm guilty of not doing this myself on probes I've added recently, I should start doing this myself).
Comment on attachment 8718564 [details]
MozReview Request: Bug 1246159 - Pre: import cleanup r=me

I'd like rnewman's feedback on the performance implications here.

Maybe now is the time to invest in making this cache better by including a list of saved URLs somewhere, as opposed to needing the traverse this entire directory every time we need to know which URLs are in the cache.
Attachment #8718564 - Flags: feedback?(rnewman)
(In reply to :Margaret Leibovic from comment #3)

> Iterating through a directory on startup seems like a bad idea, even though
> that's what I suggested myself :/

Yup.

The best thing you can do here is not need to touch the disk much.

The only way to do that is to do bookkeeping.

If you're not storing stuff in the Android cache dir (but you should!), this is easy: maintain — in prefs, in a database, or in a flat file — a count of items and kilobytes. Update it every time something is deleted or added.

(Bear in mind that once reading list items are bookmarks, you'll need to _consider_ whether to keep the items in the cache after a deletion, and you'll need to update bookkeeping accordingly. You can't just drop on each deletion, because bookmarks can be duplicated, so a deletion doesn't mean the cache entry isn't needed.)

If you're storing stuff in the Android cache dir (and you should!) you'll want a sentinel file of some kind to detect when Android has discarded the contents of the cache. If that's missing or changed, or if you ever get a cache miss when retrieving something from the cache, invalidate your bookkeeping and walk the filesystem to repopulate it.


> I'm curious how much we really care about getting details on the high end
> here. Maybe we should just stop at 50 or so and make that the max?

The histogram bucketing method will discard granularity at the high end, so indeed, all you need to do is figure out which bucket the measurement ends up in. E.g., if the top bucket is 40+, then you can stop counting at 40.


> Alternately, should we add another probe record the size of the cache
> directory? That would help us answer questions about how much disk space
> we're taking up on users' devices.

You could use StatFs to do this[1], but you still can't get around the cost of recursively stat'ing the contents of the directory. Bookkeeping should be used to avoid having to do this more than once, and you should certainly not do it on startup.

[1] <http://developer.android.com/reference/android/os/StatFs.html>
Status: NEW → ASSIGNED
Summary: Create histogram probe the # items in your reader view cache → Create histogram probe for the # items in the Reader View disk cache
https://reviewboard.mozilla.org/r/34601/#review31785

::: toolkit/components/reader/ReaderMode.jsm:453
(Diff revision 1)
> +    return OS.File.exists(dir).then(exists => {

Double check your promise error handling.

::: toolkit/components/reader/ReaderMode.jsm:463
(Diff revision 1)
> +            count++;

Bouncing around this iterator just to increment a count is expensive. Use `nextBatch`, something like this:

```
try {
  let count = yield iterator.nextBatch().length;
} finally {
  iterator.close();
}

```

Note that you can walk the array results of `nextBatch` to get sizes, too.
Attachment #8718564 - Flags: feedback?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #5)
> (In reply to :Margaret Leibovic from comment #3)
> 
> > Iterating through a directory on startup seems like a bad idea, even though
> > that's what I suggested myself :/
> 
> Yup.
> 
> The best thing you can do here is not need to touch the disk much.
> 
> The only way to do that is to do bookkeeping.
> 
> If you're not storing stuff in the Android cache dir (but you should!), this
> is easy: maintain — in prefs, in a database, or in a flat file — a count of
> items and kilobytes. Update it every time something is deleted or added.
> 
> (Bear in mind that once reading list items are bookmarks, you'll need to
> _consider_ whether to keep the items in the cache after a deletion, and
> you'll need to update bookkeeping accordingly. You can't just drop on each
> deletion, because bookmarks can be duplicated, so a deletion doesn't mean
> the cache entry isn't needed.)
> 
> If you're storing stuff in the Android cache dir (and you should!) you'll
> want a sentinel file of some kind to detect when Android has discarded the
> contents of the cache. If that's missing or changed, or if you ever get a
> cache miss when retrieving something from the cache, invalidate your
> bookkeeping and walk the filesystem to repopulate it.
> 

I'll probably introduce a table to track our readercache items in Bug 1234315, so we'll be able to do a simple count() to get the number of items after that.

I'm guessing we'd want to look into using the Android cache dir separately. It looks like it'll be tricky to correctly maintain the readercache once we're storing the reading-list in bookmarks (especially if the bookmark is deleted by another device).
Depends on: 1234315
Review commit: https://reviewboard.mozilla.org/r/52287/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/52287/
Attachment #8718564 - Attachment description: MozReview Request: Bug 1246159 - Create histogram probe to measure items in reader view cache r=margaret → MozReview Request: Bug 1246159 - Pre: import cleanup r=me
Attachment #8751898 - Flags: review?(michael.l.comella)
Attachment #8751899 - Flags: review?(michael.l.comella)
Attachment #8718564 - Flags: review?(margaret.leibovic)
Comment on attachment 8718564 [details]
MozReview Request: Bug 1246159 - Pre: import cleanup r=me

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34601/diff/1-2/
I've repurposed this into a probe to measure how much disk space we're using (since we have UI telemetry probes for the actual usage of reader view / saving). If we get large numbers here we'd probably want to look into cache cleanup sooner rather than later.
Summary: Create histogram probe for the # items in the Reader View disk cache → Create histogram probe for disk space used by saved reader view cache
Comment on attachment 8718564 [details]
MozReview Request: Bug 1246159 - Pre: import cleanup r=me

Sorry for the erronous review request - reviewboard was confused by the obsolete previous patch...
Attachment #8718564 - Flags: review?(margaret.leibovic)
Comment on attachment 8751898 [details]
MozReview Request: Bug 1246159 - Pre: implement SavedReaderViewHelper.getBytesUsed r?mcomella

https://reviewboard.mozilla.org/r/52287/#review49269

::: mobile/android/base/java/org/mozilla/gecko/reader/SavedReaderViewHelper.java:211
(Diff revision 1)
>              return pageURL;
>          }
>      }
> +
> +    /**
> +     * Obtain the total disk space used for saved reader view items, in KB.

nit: You should mention we return MAX_VALUE on overflow.

::: mobile/android/base/java/org/mozilla/gecko/reader/SavedReaderViewHelper.java:213
(Diff revision 1)
>      }
> +
> +    /**
> +     * Obtain the total disk space used for saved reader view items, in KB.
> +     */
> +    public synchronized int getDiskSpacedUsedKB() {

I don't see any overview comments about your threading story – why does this need to be synchronized?

::: mobile/android/base/java/org/mozilla/gecko/reader/SavedReaderViewHelper.java:214
(Diff revision 1)
> +        // Overflow is highly unlikely (we will hit device storage limits before we hit integer limits),
> +        // but we should still handle this for correctness.

nit: I think this comment could be combined with the other one on overflow when we actually check overflow

::: mobile/android/base/java/org/mozilla/gecko/reader/SavedReaderViewHelper.java:220
(Diff revision 1)
> +        // but we should still handle this for correctness.
> +
> +        final Iterator<String> keys = mItems.keys();
> +        long bytes = 0;
> +
> +        while (keys.hasNext()) {

nit: `for (final String pageURL : mItems.keys())` is a bit more succinct.
Attachment #8751898 - Flags: review?(michael.l.comella) → review+
Comment on attachment 8751899 [details]
MozReview Request: Bug 1246159 - Add histogram probe for disk space used by reader view cache r?mcomella

https://reviewboard.mozilla.org/r/52289/#review49271

::: toolkit/components/telemetry/Histograms.json:3636
(Diff revision 1)
> +    "kind": "exponential",
> +    "high": 50000,
> +    "n_buckets": 20,

I don't actually know what these fields stand for – perhaps you'd like to get Margaret chime in too?
Attachment #8751899 - Flags: review?(michael.l.comella) → review+
(In reply to Michael Comella (:mcomella) from comment #14)
> Comment on attachment 8751899 [details]
> MozReview Request: Bug 1246159 - Add histogram probe for disk space used by
> reader view cache r?mcomella
> 
> https://reviewboard.mozilla.org/r/52289/#review49271
> 
> ::: toolkit/components/telemetry/Histograms.json:3636
> (Diff revision 1)
> > +    "kind": "exponential",
> > +    "high": 50000,
> > +    "n_buckets": 20,
> 
> I don't actually know what these fields stand for – perhaps you'd like to
> get Margaret chime in too?

Margaret: do you have any opinions on the histogram values here? I don't actually have any idea how much space we're likely to be using, but even getting close to 50mb (the upper limit for the histogram) signifies that cache management/cleanup would be useful?
Flags: needinfo?(margaret.leibovic)
(In reply to Andrzej Hunt :ahunt from comment #15)
> (In reply to Michael Comella (:mcomella) from comment #14)
> > Comment on attachment 8751899 [details]
> > MozReview Request: Bug 1246159 - Add histogram probe for disk space used by
> > reader view cache r?mcomella
> > 
> > https://reviewboard.mozilla.org/r/52289/#review49271
> > 
> > ::: toolkit/components/telemetry/Histograms.json:3636
> > (Diff revision 1)
> > > +    "kind": "exponential",
> > > +    "high": 50000,
> > > +    "n_buckets": 20,
> > 
> > I don't actually know what these fields stand for – perhaps you'd like to
> > get Margaret chime in too?
> 
> Margaret: do you have any opinions on the histogram values here? I don't
> actually have any idea how much space we're likely to be using, but even
> getting close to 50mb (the upper limit for the histogram) signifies that
> cache management/cleanup would be useful?

I'm not an expert here, so I don't have strong opinions. I usually follow the guidance specified here:
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Declaring_a_Histogram

50mb of storage definitely sounds like something we should clean up!
Flags: needinfo?(margaret.leibovic)
https://reviewboard.mozilla.org/r/52287/#review49269

> nit: `for (final String pageURL : mItems.keys())` is a bit more succinct.

It doesn't seem like there's any way of using for-each with JSONObject - keys() returns an Iterator which for-each can't handle, the alternative is getNames() which returns a JSONArray, which also doesn't support for-each.
https://hg.mozilla.org/mozilla-central/rev/0d8baf8b4e76
https://hg.mozilla.org/mozilla-central/rev/6afaaa5cf43b
https://hg.mozilla.org/mozilla-central/rev/199fac6509aa
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 49
You need to log in before you can comment on or make changes to this bug.