Closed Bug 1330973 Opened 3 years ago Closed 3 years ago

Store website 'description' in metadata store

Categories

(Firefox for Android :: General, defect, P1)

All
Android
defect

Tracking

()

RESOLVED FIXED
Firefox 54
Iteration:
1.17
Tracking Status
firefox54 --- fixed

People

(Reporter: sebastian, Assigned: sebastian)

References

Details

(Whiteboard: [MobileAS])

Attachments

(2 files)

Looking through the data needed to implement the scoring algorithm for highlights (bug 1312016), the only missing piece is the website's description. There's already a page-metadata-parser rule to extract the description - We just need to store it.
I strongly suggest measuring how big these are, first, and coming up with a strategy for cleaning them up, before you start storing.

Land a feature that just accumulates the size of each string you would have stored, when you stored it, and the numeric ID of the page. Now you can figure out the impact on a regular user.

You're going to get a lot of duplicate strings (every Gmail page is "Google's approach to email"; every Amazon product page has the same string for title, keywords, description), and many of them are going to be big.

If you're indexing the page metadata table (which presumably you are, so you can query by URL), you will end up with even more duplication.

Mentat uses a separate FTS value table with some quite scary view/trigger tricks to abstract away duplication and avoid storing strings more than once. Don't walk into the spiderweb.
Actually, at the end of the day, the scoring algorithm is only interested in the length of the description. So I might get away with just storing that for now.
(In reply to Sebastian Kaspari (:sebastian) from comment #2)
> Actually, at the end of the day, the scoring algorithm is only interested in
> the length of the description. So I might get away with just storing that
> for now.

At the moment, it seems like they're primarily interested in the presence of a description, as opposed to its length. So you _might_ get away with just a boolean flag.
(In reply to :Grisha Kruglov from comment #3)
> (In reply to Sebastian Kaspari (:sebastian) from comment #2)
> > Actually, at the end of the day, the scoring algorithm is only interested in
> > the length of the description. So I might get away with just storing that
> > for now.
> 
> At the moment, it seems like they're primarily interested in the presence of
> a description, as opposed to its length. So you _might_ get away with just a
> boolean flag.

Unless we envision a need to store and display descriptions at some point, at which Richard's suggestion of a count is a good one. But that seems like a very thorny sort of a problem to me, given _how_ a lot of the web uses these description fields.
Iteration: 1.13 → 1.14
(In reply to :Grisha Kruglov from comment #3)
> At the moment, it seems like they're primarily interested in the presence of
> a description, as opposed to its length. So you _might_ get away with just a
> boolean flag.

The current implementation uses the description length of all "candidates" and then normalizes them into the [0, 1] interval based on the min/max values. So I'd need the actual length here.

https://github.com/mozilla/activity-stream/blob/master/common/recommender/Baseline.js#L119
https://github.com/mozilla/activity-stream/blob/master/common/recommender/Baseline.js#L21
https://github.com/mozilla/activity-stream/blob/master/common/recommender/Baseline.js#L134
https://github.com/mozilla/activity-stream/blob/master/common/recommender/Baseline.js#L61
Assignee: nobody → s.kaspari
Status: NEW → ASSIGNED
Iteration: 1.14 → 1.15
Comment on attachment 8832956 [details]
Bug 1330973 - WebsiteMetadata.jsm: Add description length to Metadata JSON.

https://reviewboard.mozilla.org/r/109202/#review115432
Attachment #8832956 - Flags: review?(gkruglov) → review+
Comment on attachment 8832957 [details]
Bug 1330973 - Use website description length for highlights ranking.

https://reviewboard.mozilla.org/r/109204/#review115436
Attachment #8832957 - Flags: review?(gkruglov) → review+
Iteration: 1.15 → 1.16
Pushed by s.kaspari@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c7220d8cf144
WebsiteMetadata.jsm: Add description length to Metadata JSON. r=Grisha
https://hg.mozilla.org/integration/autoland/rev/93af928758e7
Use website description length for highlights ranking. r=Grisha
https://hg.mozilla.org/mozilla-central/rev/c7220d8cf144
https://hg.mozilla.org/mozilla-central/rev/93af928758e7
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
Iteration: 1.16 → 1.17
You need to log in before you can comment on or make changes to this bug.