Improved representation for history entries and visits
Categories
(Firefox :: Sync, enhancement, P3)
Tracking
()
People
(Reporter: rnewman, Unassigned)
References
(Blocks 5 open bugs)
Details
(Whiteboard: [sync:history])
Comment 1•13 years ago
|
||
Comment 2•13 years ago
|
||
Comment 3•13 years ago
|
||
Comment 4•13 years ago
|
||
Reporter | ||
Comment 5•13 years ago
|
||
Reporter | ||
Updated•13 years ago
|
Reporter | ||
Updated•10 years ago
|
Reporter | ||
Comment 6•9 years ago
|
||
Reporter | ||
Comment 7•8 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Updated•6 years ago
|
Updated•3 years ago
|
Comment 8•2 years ago
|
||
bugzilla@twinql.com, I don't agree with what you've stated:
We have one record per URI, and each record contains an array of visit objects. Each looks like this:
{ id:"5qRsgXWRJZXr", title:"Foobar Baz", histUri:"http://foo.example.com/bar/baz", visits:[{type:1, date:1319149012372425}] }
There are two dumb things about this:
- Adding a visit requires altering the whole record. This is a side-effect of random access, but it means that Twitter with its 4,000 visits -- and frequent re-visits! -- gets re-uploaded all the time. 4,000 visits are 120KB in cleartext. This bug does not address that.
- The representation of visits is inefficient -- the type is usually redundant, the object representation verbose, and the date unnecessarily precise.
Imagine instead:
{visits: {1: [1319149012372425, 12314, 878782, 1232113]}}
Store the type (a fixed set of types!) as a key, the first timestamp, and then timestamp offsets in a sorted sequence.
Because, for me:
- The more precise the date recorded, the better, because I use my history for auditing purposes. The sole compromise possible here in my eyes is the ability for the user to choose how precise they want their entry records to be.
- The key names for the JSON objects provide significantly better backward compatibility when the format updates, because the order doesn't matter (consider the frequence of breakage difference when consuming values from an array versus a hashtable API). Additionally, an introspecting user is significantly more easily able to interpret these values, which although can be negated using a dedicated export format, it doesn't fix the issue in terms of privacy, because the user won't understand what these values represent.
Regardless, if you're to store the values numerically, wouldn't it be more storage-efficient to base-36 encode them? Consider the difference between
{visits: {1: [1319149012372425, 12314, 878782, 1232113]}}
and
{visits: {1: [CZLKOUJQO9, 9I2, IU2M, QEPD]}}
I don't know what the effect on the CPU would be when quickly scrolling down the history though. However, this brings me onto the crux of my disagreement - your proposed modifications would save a few MiB saved in total for most people. Are any of these tradeoffs really worth that? I'm aware that not everyone has the latest 2 x 4 TiB NVMe SSD storage solution, but when would this ever been an issue?
Comment 9•2 years ago
|
||
Note that the comment you are replying to refers to how sync copies visits between device, but your examples talk about how history is used once on the device. Sync has special and different requirements than once on the device. On the device we have a vastly more efficient storage mechanism backed by sqlite and with extensive use of complicated indexes etc.
Description
•