717136 - Improved representation for history entries and visits

Reporter

Description

•

14 years ago

(Filing this in General, because it affects multiple products.) telliott, atoll, and I discussed how brain-damaged the existing history representation is. We have one record per URI, and each record contains an array of visit objects. Each looks like this: { id:"5qRsgXWRJZXr", title:"Foobar Baz", histUri:"http://foo.example.com/bar/baz", visits:[{type:1, date:1319149012372425}] } There are two dumb things about this. Firstly, adding a visit requires altering the whole record. This is a side-effect of random access, but it means that Twitter with its 4,000 visits -- and frequent re-visits! -- gets re-uploaded all the time. 4,000 visits are 120KB in cleartext. This bug does not address that. Secondly, the representation of visits is inefficient -- the type is usually redundant, the object representation verbose, and the date unnecessarily precise. Imagine instead: visits: {1: [1319149012372425, 12314, 878782, 1232113]} Store the type (a fixed set of types!) as a key, the first timestamp, and then timestamp offsets in a sorted sequence. This alone would save 20-30 bytes of cleartext per visit, and be vastly more efficient to parse and process. There are an array of better representations for timelines, of course, but advanced data structures should be balanced against client complexity. It's clear, however, that there's a lot of room for improvement.

Gregory Szorc [:gps]

Updated

•

13 years ago

Blocks: 745408

Gregory Szorc [:gps]

Comment 1

•

13 years ago

Should we morph this into "devise new history record representation?"

Marco Bonardo [:mak]

Comment 2

•

13 years ago

(In reply to Richard Newman [:rnewman] from comment #0) > Firstly, adding a visit requires altering the whole record. This is a > side-effect of random access, but it means that Twitter with its 4,000 > visits -- and frequent re-visits! -- gets re-uploaded all the time. 4,000 > visits are 120KB in cleartext. This bug does not address that. Why is it not possible to do incremental updates? crypto reasons server side? > Secondly, the representation of visits is inefficient -- the type is usually > redundant So, I couldn't figure out if your proposal is to have visits grouped by type, or completely dismiss the type, cause type is actually useful (*cough* frecency). Moreover, currently Sync doesn't sync from_visit, killing basically any form of referrer and redirects support, this proposal makes that issue basically unfixable, ... Then maybe we should just stop syncing redirects sources, 404 pages and so on, since we don't support syncing them properly.

Marco Bonardo [:mak]

Comment 3

•

13 years ago

re: dates, fwiw we could even crop them at seconds, we don't need that precision.

Gregory Szorc [:gps]

Comment 4

•

13 years ago

The default behavior of Sync is that any single record is a full representation of that record. If we wanted to switch to a model where core "places" are being synced in a different location/record/collection from visits to those places, that would be theoretically doable.

Richard Newman [:rnewman]

Reporter

Comment 5

•

13 years ago

(In reply to Marco Bonardo [:mak] from comment #2) > So, I couldn't figure out if your proposal is to have visits grouped by > type, or completely dismiss the type, cause type is actually useful (*cough* > frecency). Group by type. (See {1: [...]} in Comment 0.) This saves (asymptotically) 8 chars per visit.

Richard Newman [:rnewman]

Reporter

Updated

•

13 years ago

Blocks: 726049

Richard Newman [:rnewman]

Reporter

Updated

•

13 years ago

Whiteboard: [sync:history]

Richard Newman [:rnewman]

Reporter

Updated

•

10 years ago

Component: General → Firefox Sync: Cross-client

Richard Newman [:rnewman]

Reporter

Comment 6

•

9 years ago

Bug 1302797 discusses adding the creating/visiting/etc. device to synced data. Bug 1288858 discusses adding container metadata to visits.

Blocks: 1288858, 1302797

Richard Newman [:rnewman]

Reporter

Updated

•

9 years ago

Depends on: 623667

Richard Newman [:rnewman]

Reporter

Comment 7

•

8 years ago

Filed Bug 1384685 to cover adding icon references (and presumably other kinds of images) to Sync.

Blocks: 1384685, 623667

No longer depends on: 623667

Summary: Improved representation for history visits → Improved representation for history entries and visits

Nobody; OK to take it and work on it

Assignee

Updated

•

7 years ago

Component: Firefox Sync: Cross-client → Sync

Product: Cloud Services → Firefox

Lina Butler (ex-Mozilla)

Updated

•

7 years ago

Blocks: 1516329

Ana Medinac

Updated

•

6 years ago

Type: defect → enhancement

Priority: -- → P3

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Mr. Beedell, Roke Julian Lockhart (RJLB)

Comment 8

•

2 years ago

https://bugzilla.mozilla.org/show_bug.cgi?id=717136#c0

bugzilla@twinql.com, I don't agree with what you've stated:

We have one record per URI, and each record contains an array of visit objects. Each looks like this:
{
    id:"5qRsgXWRJZXr",
    title:"Foobar Baz",
    histUri:"http://foo.example.com/bar/baz",
    visits:[{type:1, date:1319149012372425}]
}
There are two dumb things about this:

Adding a visit requires altering the whole record. This is a side-effect of random access, but it means that Twitter with its 4,000 visits -- and frequent re-visits! -- gets re-uploaded all the time. 4,000 visits are 120KB in cleartext. This bug does not address that.

The representation of visits is inefficient -- the type is usually redundant, the object representation verbose, and the date unnecessarily precise.

Imagine instead:
{visits: {1: [1319149012372425, 12314, 878782, 1232113]}}
Store the type (a fixed set of types!) as a key, the first timestamp, and then timestamp offsets in a sorted sequence.

Because, for me:

The more precise the date recorded, the better, because I use my history for auditing purposes. The sole compromise possible here in my eyes is the ability for the user to choose how precise they want their entry records to be.
The key names for the JSON objects provide significantly better backward compatibility when the format updates, because the order doesn't matter (consider the frequence of breakage difference when consuming values from an array versus a hashtable API). Additionally, an introspecting user is significantly more easily able to interpret these values, which although can be negated using a dedicated export format, it doesn't fix the issue in terms of privacy, because the user won't understand what these values represent.

Regardless, if you're to store the values numerically, wouldn't it be more storage-efficient to base-36 encode them? Consider the difference between

{visits: {1: [1319149012372425, 12314, 878782, 1232113]}}

and

{visits: {1: [CZLKOUJQO9, 9I2, IU2M, QEPD]}}

I don't know what the effect on the CPU would be when quickly scrolling down the history though. However, this brings me onto the crux of my disagreement - your proposed modifications would save a few MiB saved in total for most people. Are any of these tradeoffs really worth that? I'm aware that not everyone has the latest 2 x 4 TiB NVMe SSD storage solution, but when would this ever been an issue?

Mark Hammond [:markh] [:mhammond]

Comment 9

•

2 years ago

Note that the comment you are replying to refers to how sync copies visits between device, but your examples talk about how history is used once on the device. Sync has special and different requirements than once on the device. On the device we have a vastly more efficient storage mechanism backed by sqlite and with extensive use of complicated indexes etc.

Bugzilla

Improved representation for history entries and visits

Categories

(Firefox :: Sync, enhancement, P3)

Tracking

()

People

(Reporter: rnewman, Unassigned)

References

(Blocks 5 open bugs)

Details

(Whiteboard: [sync:history])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Updated

Comment 6

Updated

Comment 7

Updated

Updated

Updated

Updated

Comment 8

Comment 9