Closed Bug 864918 Opened 11 years ago Closed 8 years ago

Create interests database with appropriate tables/indices

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: Mardak, Unassigned)

References

Details

(Whiteboard: p=0)

Attachments

(2 files, 2 obsolete files)

Attached patch v1 (obsolete) — Splinter Review
There's 3 tables to add:

1) interests and related metadata
2) pairs of interests and hosts to remember which are associated with each other
3) interest visits count on a per-day granularity up to 30 days

The implicit/explicit indices added:

1) moz_interests integer primary key id to for joining the other tables
2) moz_interests unique string interest for lookup by name
3) moz_interests namespace index for lookup by namespace
4) moz_interests_hosts primary key (interest_id, host_id) for lookup by both
5) moz_interests_visits primary key (interest_id, day) for lookup by both
6) moz_hosts frecency index for selecting top hosts by frecency
Attachment #740952 - Flags: review?(mak77)
Depends on: 864925
Comment on attachment 740952 [details] [diff] [review]
v1

clearing per decision to split the component apart in toolkit
Attachment #740952 - Flags: review?(mak77)
Summary: Update tables and indices for storing/accessing interests → Create interests database with appropriate tables/indices
Because we're creating a new database instead of adding tables to places, we'll probably need a service that does this creation/initialization/(future migration) work. We can probably do something similar to FormHistory:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/satchel/FormHistory.jsm

It also handles non-database-management services. We might still want to keep separate objects even within the same Interests.jsm to keep some encapsulation.
You probably want to use Sqlite.jsm if you're doing DB stuff yourself, it should make your life easier.
And Sqlite.jsm will avoid you shooting yourself in the foot, that is really easy with current Storage API.  The only example of Sqlite.jsm usage nowadays is health report, even if (imo) that database went a bit too much towards normalization, too many relational tables and indices. You can get something simpler here. I can try to do a schema review, when ready.
We somewhat rolled a custom execute/query function based on Sqlite.jsm with support for list params, output packaging, and caching:

https://github.com/Mardak/up-central/blob/88e235ae05bb0709523dfbf6912d657df46ab265/toolkit/components/interests/PlacesInterestsStorage.jsm#L475

If we're using Sqlite.jsm, we'll probably change our custom query function to call executeCached with the appropriate onRow handler.
Sqlite.jsm is not complete, so if you have generally useful helpers they can be evaluated for inclusion. Provided we want to keep it as simple as possible.
Attached patch schema (obsolete) — Splinter Review
Here's the tables/indices we're currently moving into an interests-specific database.

Note we just added moz_interests_frecent_hosts as a copy of moz_hosts but limited to the top ones, and we're still working out the details of keeping that roughly up-to-date (daily idle) but probably more important is the host id changing. We might need to change moz_interests_hosts's host_id to just plain host string. Potentially this table could be made TEMP?
Attachment #740952 - Attachment is obsolete: true
Attachment #778187 - Flags: feedback?(mak77)
We'd like to explore the idea of instead of maintaining copy of moz_hosts (which wee'd need to keep in sync,), to have our own moz_interests_hosts, which are hosts who have at least one page we've successfully categorized.

Instead, we're now thinking of augmenting our moz_interests_visits table to include the a host_id.

This makes things:

 * simpler code-wise
 * allows us to give useful feedback to the user as to what caused certain interests to be triggered

one potential concern is that this may lead to additional disk usage, because instead of the data we'd capture.

Our moz_interests_visits table looks like that today:

[interest_id, day, visits]

We want it to capture this data instead:

[interest_id, host_id, day, visits]

Which may mean multiple entries for the same interest_id/day pair.

This may apply through the user's historical data (which in our first version will be only the last 30 days or so), and that user's forward data.

This would be our version of the places database's moz_historyvisits, only for classified pages.

We expect this table to grow at a much slower rate, though the growth is unbounded.
We also expect moz_interests_hosts to grow unbounded.

Our question is: is this acceptable?
Attached patch v1 code changesSplinter Review
Attachment #778187 - Attachment is obsolete: true
Attachment #778187 - Flags: feedback?(mak77)
Attachment #798590 - Flags: review?(mak77)
Attached patch v1 testsSplinter Review
Attachment #798591 - Flags: review?(mak77)
Whiteboard: p=0
Attachment #798590 - Flags: review?(mak77) → feedback?
Attachment #798591 - Flags: review?(mak77) → feedback?
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
I believe we aren't looking to add interests to places soon or at least not in the state proposed in this bug.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Attachment #798590 - Flags: feedback?
Attachment #798591 - Flags: feedback?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: