Closed Bug 1113454 Opened 7 years ago Closed 7 years ago

Download and cache reader mode content in the background

Categories

(Firefox for Android Graveyard :: Reader View, defect)

All
Android
defect
Not set
normal

Tracking

(firefox38 fixed)

RESOLVED FIXED
Firefox 38
Tracking Status
firefox38 --- fixed

People

(Reporter: Margaret, Assigned: Margaret)

References

Details

Attachments

(1 file, 2 obsolete files)

This builds on top of bug 1093172. We should find a time to periodically go through the status UNFETCHED items in the reading list provider and attempt to download and cache their content.
We'll also need bug 1107588 if we want this to work for URLs that redirect.
Depends on: 1107588
Blocks: 1117830
No longer blocks: readerv2
Assignee: nobody → margaret.leibovic
No longer blocks: 1117830
/r/2985 - Bug 1113454 - Download and cache reader mode content in the background

Pull down this commit:

hg pull review -r 2b1d7e59ecb8a2b070e4e947886eb491948fc069
Posted a WIP. Outstanding issues:

* This patch only observes content changes while BrowserApp is running. We should probably add some delayed update logic to check for unfetched items some amount of time after startup.
* We should batch requests to fetch content, so that we don't try downloading dozens of articles in the background (in the case of first encounter with this logic, or after a sync).
* We should send a message back to Java with the updated meta data (excerpt, title, etc). I think we could re-purpose the "Reader:AddToList" message to be more of a "Reader:InsertOrUpdate", although we'd have to split out some of the logic if we still want the "This item is already in your reading list" toast.
* I just realized this patch doesn't actually update the CONTENT_STATUS after we attempt to fetch an article, so we'll need to to that :) (also, updating the status for error cases).
* A test. I think I could add something to testReadingListCache to add an item to the content provider, then check to make sure it ended up in the cache.

Follow-up bugs:

* We should allow the user to specify a maximum cache size and purge old articles as necessary.
* Should we maybe only do this over wifi? Or have that as an option?


Am I missing anything?
Flags: needinfo?(rnewman)
Status: NEW → ASSIGNED
Version: Firefox 35 → Trunk
For clarity's sake: "phase one" = Gecko fetches content. "phase two" = Java fetches content, Gecko readerifies it. (Phase three might be that we ditch Readability.js altogether.)

> * This patch only observes content changes while BrowserApp is running. We
> should probably add some delayed update logic to check for unfetched items
> some amount of time after startup.

I'm not sure this is necessary during phase one. We can't fetch unless the Gecko thread is alive, and that'll only happen after browser startup.

If you want to catch items that are added while BrowserApp is still in memory but paused (e.g., via the share overlay), how 'bout having the share overlay check the GeckoThread's state?

Things can either arrive while BA is alive or not. If the latter, they'll be fetched on next launch. If the former, trigger the fetch immediately?

Or are you concerned about downloading when on on wi-fi etc.?

> * We should batch requests to fetch content, so that we don't try
> downloading dozens of articles in the background (in the case of first
> encounter with this logic, or after a sync).

You mean have a queue, and only work through it if each fetch doesn't fail for some kind of painful network reason?

> * We should allow the user to specify a maximum cache size and purge old
> articles as necessary.

Agreed. (Needs a new DB state, too -- otherwise we'll re-fetch, purge, re-fetch, purge.)

Probably also max size per article, some kind of sane bandwidth limitation, ...


> * Should we maybe only do this over wifi? Or have that as an option?

Having it as an option makes sense to me. That gets easier when we have a real fetching service (phase two) -- we can listen for network state and fetch as soon as we have Wi-Fi, regardless of whether Gecko's running.

Next up: file bugs for these things?
Flags: needinfo?(rnewman)
Attachment #8554628 - Flags: feedback?(rnewman)
Depends on: 1126244
Depends on: 1126246
(In reply to Richard Newman [:rnewman] from comment #4)
> For clarity's sake: "phase one" = Gecko fetches content. "phase two" = Java
> fetches content, Gecko readerifies it. (Phase three might be that we ditch
> Readability.js altogether.)
> 
> > * This patch only observes content changes while BrowserApp is running. We
> > should probably add some delayed update logic to check for unfetched items
> > some amount of time after startup.
> 
> I'm not sure this is necessary during phase one. We can't fetch unless the
> Gecko thread is alive, and that'll only happen after browser startup.
> 
> If you want to catch items that are added while BrowserApp is still in
> memory but paused (e.g., via the share overlay), how 'bout having the share
> overlay check the GeckoThread's state?
> 
> Things can either arrive while BA is alive or not. If the latter, they'll be
> fetched on next launch. If the former, trigger the fetch immediately?
> 
> Or are you concerned about downloading when on on wi-fi etc.?

I think I wasn't clear here. With the current version of the patch here, we only fetch content when a ContentObserver that lives in BrowserApp notices a change. So if BrowserApp isn't alive, we won't do a check to fetch content until the next time a content change event happens. I was suggesting here that we need another time besides content change to decide to fetch content (e.g. some random amount of time after the browser/gecko starts up).

> > * We should batch requests to fetch content, so that we don't try
> > downloading dozens of articles in the background (in the case of first
> > encounter with this logic, or after a sync).
> 
> You mean have a queue, and only work through it if each fetch doesn't fail
> for some kind of painful network reason?

Catching network errors is a good reason for this, but I'm also a bit concerned about the memory/performance implications of kicking off a lot of workers to be parsing content in the background (since the ReaderMode APIs for this are asynchronous).
Depends on: 1117226
A new updated version built on top of bug 1117226.
Attachment #8554628 - Attachment is obsolete: true
Attachment #8554628 - Flags: feedback?(rnewman)
Attachment #8555314 - Flags: feedback?(rnewman)
Comment on attachment 8555314 [details] [diff] [review]
WIP - Download and cache reader mode content in the background

Review of attachment 8555314 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great. Pretty much just your own TODO left!

::: mobile/android/base/ReadingListHelper.java
@@ +36,1 @@
>      public ReadingListHelper(Context context) {

I think this is a good opportunity to have BrowserApp pass the profile as an argument to RLH.

BA has the profile immediately before it calls this constructor, and we fetch it a few times throughout this file.

The profile (and, consequently, the DB) shouldn't change for the lifetime of this instance.

Let's just keep a final member for the DB, grab the name in the constructor, and kill another few singleton accesses!
Attachment #8555314 - Flags: feedback?(rnewman) → feedback+
Comment on attachment 8554628 [details]
MozReview Request: bz://1113454/margaret

/r/2985 - Bug 1113454 - Download and cache reader mode content in the background. r=rnewman

Pull down this commit:

hg pull review -r 8eca44e4cfc16e2e3805564c4c371f823d52caa2
Attachment #8554628 - Attachment is obsolete: false
Attachment #8554628 - Flags: review?(rnewman)
Attachment #8555314 - Attachment is obsolete: true
Depends on: 1129725
https://reviewboard.mozilla.org/r/2985/#review2733

Good enough to land, but read through the issues.

::: mobile/android/base/ReadingListHelper.java
(Diff revision 2)
> +        final BrowserDB db = profile.getDB();

I think I commented on this before, but can we replace the `profile` member with a `db` member, and just do this once?

::: mobile/android/chrome/content/Reader.js
(Diff revision 2)
> +        // TODO: Distinguish between temorary and permanent failures.

Nit: temporary.

::: mobile/android/chrome/content/Reader.js
(Diff revision 2)
> +          status: this.STATUS_FETCH_FAILED_UNSUPPORTED_FORMAT,

I worry a little that this is overly broad, and that a null returned from `_dACA` doesn't imply an unsupported format. Keep your eye on this.

::: mobile/android/base/ReadingListHelper.java
(Diff revision 2)
> +                                GeckoEvent.createBroadcastEvent("Reader:FetchContent", json.toString()));

What stops us calling fetchContent() twice, resulting in duplicate messages, resulting in duplicate fetches?
(In reply to Richard Newman [:rnewman] from comment #9)
> https://reviewboard.mozilla.org/r/2985/#review2733
> 
> Good enough to land, but read through the issues.
> 
> ::: mobile/android/base/ReadingListHelper.java
> (Diff revision 2)
> > +        final BrowserDB db = profile.getDB();
> 
> I think I commented on this before, but can we replace the `profile` member
> with a `db` member, and just do this once?

Sure, I can update that.

> ::: mobile/android/chrome/content/Reader.js
> (Diff revision 2)
> > +        // TODO: Distinguish between temorary and permanent failures.
> 
> Nit: temporary.

I'll just get rid of this comment, I filed bug 1129725 about this.

> ::: mobile/android/chrome/content/Reader.js
> (Diff revision 2)
> > +          status: this.STATUS_FETCH_FAILED_UNSUPPORTED_FORMAT,
> 
> I worry a little that this is overly broad, and that a null returned from
> `_dACA` doesn't imply an unsupported format. Keep your eye on this.

I believe that a null returned from _dACA does indeed mean this isn't an article, and right now that's the format we support (it throws an exception if something else goes wrong). But yes, we can look out for this and update it in the future when we start supporting different kinds of things.

> ::: mobile/android/base/ReadingListHelper.java
> (Diff revision 2)
> > +                                GeckoEvent.createBroadcastEvent("Reader:FetchContent", json.toString()));
> 
> What stops us calling fetchContent() twice, resulting in duplicate messages,
> resulting in duplicate fetches?

I suppose this makes an assumption that content changes won't happen too frequently, but that's probably not an assumption we should make. I'll file a follow-up bug to throttle this somehow.
Comment on attachment 8554628 [details]
MozReview Request: bz://1113454/margaret

r+ with the two suggested changes.
Attachment #8554628 - Flags: review?(rnewman) → review+
https://hg.mozilla.org/mozilla-central/rev/be61d1a3818e
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 38
Attachment #8554628 - Attachment is obsolete: true
Attachment #8618937 - Flags: review+
How difficult would it be to modify the behavior of the caching to cache both the 'reader mode' view and the original web view, rather than only the reader mode view?

The problem with only caching the reader mode view is that some saved webpages render better in the original web view rather than the reader mode view. Some webpages are even unreadable in the reader mode view (e.g. reddit comment pages).

When a user has added a bunch of articles to their reading list and then goes offline, it would be a bad experience to open an article and find out you can't actually read it because the reader mode view didn't render the page well. It would be great if the user could tap the reader mode icon to turn reader mode off and get the cached original web view instead.

It seems that Firefox on Android already caches open tabs to some extent, so hopefully adding this functionality wouldn't require a ton of new code.

I'm interested in contributing if others think this is a good idea, but don't have the time to implement this themselves. Should I create a new bug for this? (I'm new here)
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.