Closed Bug 1288330 Opened 4 years ago Closed 4 years ago

Evaluate: Running fathom locally vs. metadata back-end service

Categories

(Firefox for Android :: General, defect, P1)

All
Android
defect

Tracking

()

RESOLVED FIXED
Iteration:
1.4

People

(Reporter: sebastian, Assigned: sebastian)

References

Details

(Whiteboard: [MobileAS])

Attachments

(1 obsolete file)

Sooner or later we want to extract metadata and meaning from web pages. Right now there are two obvious choices:

* Running fathom locally
* Querying the page metadata service (which is using fathom internally)

Fathom:
https://github.com/mozilla/fathom

page-metadata-service:
https://github.com/mozilla/page-metadata-service/


* What are the advantages / disadvantages of both approaches?
** CPU, battery consumption, network traffic, online/offline, retry/fallback, more complex rules and classification, visited pages vs. unknown pages
* Are there other (maybe temporary) options (embedly)?
* How much effort is it to implement those solutions?
* Which one can get us implementing features sooner?
Hey ahunt, I think you've already started to evaluate some of those things?
Assignee: nobody → ahunt
See Also: → 1288448
Note that pages behind login/access control — everything from Facebook, to TheClymb and other membership purchase sites, to forums — are hard to process in a hosted service. That alone makes me think that content extraction has to occur on the device to some degree.
(In reply to Richard Newman [:rnewman] from comment #2)
> Note that pages behind login/access control — everything from Facebook, to
> TheClymb and other membership purchase sites, to forums — are hard to
> process in a hosted service. That alone makes me think that content
> extraction has to occur on the device to some degree.

I came to a similar conclusion that we should do as much as possible on device, even with just the following reasons (i.e. the login/access control aspect is an additional advantage)

- no additional data usage
-- To be extra clever we should save/cache the page image (which is likely to be the bulk of data usage)
- no need to send history outside of the device
- faster / immediate availability (as opposed to waiting for remote extraction)

We will need a separate solution for unvisited pages (suggestions, and also synced bookmarks), but either way it looks like we definitely need local processing, since the bulk of what we show will be previously visited pages.

(I'm preparing some more detailed notes on all of this, but TL;DR: is local processing is best for most of the content we need, and we probably need some form of hosted service for suggestions.)
Backlog until we need it :)
Whiteboard: [MobileAS Backlog]
Rank: 1
Rank: 1 → 2
Whiteboard: [MobileAS Backlog] → [MobileAS]
Priority: -- → P3
Assignee: ahunt → nobody
Priority: P3 → P2
Assignee: nobody → s.kaspari
Priority: P2 → P1
Status: NEW → ASSIGNED
See Also: → 1301715
Blocks: 1301717
See Also: → 1301717
See Also: → 1301718
No longer blocks: 1301717
Attachment #8789727 - Attachment is obsolete: true
I filed new bugs for the first implementation:

* Bug 1301715 - Extracting metadata
* Bug 1301717 - Storing metadata
* Bug 1301718 - Storing the image(s)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Iteration: --- → 1.4
You need to log in before you can comment on or make changes to this bug.