Open Bug 408991 Opened 17 years ago Updated 2 years ago

Find a way to efficiently extract bookmarks paths

Categories

(Toolkit :: Places, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: mak, Unassigned)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [fxsearch])

This is something i already pointed out some time ago, current bookmarks implementation uses a parent-position relationship to build a n-level tree and this has some pro & cons:

pros:
 - fast inserts
 - compact table
cons:
 - to find the grandparent of a child you have to use recursion
 - to find all childrens of a folder (and his subfolders) you have to do complex loops
 - to count all childrens of a folder (and his subfolders) you have to do complex loops
 - you can end up with cycles

But for nested structures in an SQL database the best choice would be to use a nested set preordered tree, these are php/mysql article on the subject:
http://www.phpriot.com/articles/nested-trees-1/1
http://www.phpriot.com/articles/nested-trees-2 
implementing this in other languages is not a problem since the main concept is sql based

table changes required would be adding 3 columns: level, left, right, and a index on them, but probably one could discard position

pros:
 - you can always know if a node is child of another 
 - you find all childrens of a folder with a single query
 - you count all childrens of a folder with a single query
 - you easily move from one node to another 
 - you cannot create cycles
cons:
 - less compact table (2 more columns)
 - slow inserts: on every insert you have to update more than one node
Blocks: 488968
Blocks: 523523
Assignee: nobody → mak77
Summary: investigate the use of a nested tree for bookmarks table → Find a way to efficiently extract bookmarks paths (nested tree for bookmarks table?)
I would recommend trying to get a Labs' TestPilot study going to help determine just how nested our user's bookmarks hierarchies are.

The querys that will run to determine children and ancestors may not be as hindered as we think by recursion. MPTT requires some pretty hefty queries for writes, and in my experience tends to be quite fragile (with or without a comprehensive test suite).
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
Assignee: mak77 → nobody
Component: Bookmarks & History → Places
Product: Firefox → Toolkit
QA Contact: bookmarks → places
An alternative approach could be to save a cache of all the parents of each folder, in a separate closure table, this cache should be updated every time we move a folder in the hierarchy, then we should update the moved folder and all of its descendants folders.
At first glance it could be faster and require less disk space than a nested tree but it would have other disadvantages:
* harder to check coherence (with nested trees we can run some aritmetic to check if the values are out of sync)
* will only give us ancestors path, this is what we need atm, but a nested tree would also give us possibility to implement a richer bookmarks API

Nested trees are usually expensive, but we don't have so long hierarchies in bookmarks. If we can go for them (not too heavy perf loss) they are the best choice for our APIs, otherwise we can try this alternative implementation.

I was also thinking about a leaf-less nested tree implementation, where we only track folders, ignoring the fact they could have flat children, i can't tell offhand if it's easier to do (or doable at all but sounds possible), at first glance it could be faster to update but would lose some functionality (checking number of children, telling if a bookmark is descendant of a certain ancestor...). It sounds pretty similar to the closure table approach above, but should not have the coherence check problem.

So for now i'd go for the richer implementation, evaluate perf loss, and if it's not acceptable start looking into less functional implementations.
Summary: Find a way to efficiently extract bookmarks paths (nested tree for bookmarks table?) → Find a way to efficiently extract bookmarks paths
Target Milestone: Future → ---
(In reply to comment #2)
> The querys that will run to determine children and ancestors may not be as
> hindered as we think by recursion.

it will make use async damn hard, since recursion has to be done in code, so you have to query -> code -> query -> code. suppose you have 10 000 bookmarks to render and you want the path for each one of them.
unless you want to recurse at each change, and cache the value for each folder. but at this point updates will still be slow, and the stytem will still be fragile (plus we would miss a way to check coherence of the cached data). At that point i'd prefer the leaf-less implementation.
I do that when I have used MPTT, we tracked all nodes in a separate table called "hierarchy", thus making a distinction between data and hierarchy. This served us well, but we stil had to to a single join to scoop up the data:

table hierarchy
id, bookmark_id, sibling_index, has_children, depth, _right, _left, parent_id

The select: SELECT * FROM hierarchy WHERE _left BETWEEN 239 AND 278 - and of course JOIN the bookmarks 

writes were expensive, but the update query can be written pretty optimized as you only have to update _left, _right and depth:

UPDATE hierarchy
  SET _left = %d,
      _right = %d,
      depth = %d
  WHERE id = %d

I think the MozStorage async api can take an array of statements and synchronously iterate. The tricky thing was always that the next update needed to operate on a row that was updated microseconds beforehand. We had some issues with postgres where this moved nodes around in random ways. The funny thing was that when did our initial testing with SQLite and it never exhibited the random bug. This screams "test driven developm,ent" of even the tinest private methods.:)
(In reply to comment #2)
> I would recommend trying to get a Labs' TestPilot study going to help determine
> just how nested our user's bookmarks hierarchies are.

Test pilot study is here: https://testpilot.mozillalabs.com/testcases/a-week-life/results.html
Priority: -- → P3
Blocks: 469421, 469441
Whiteboard: [fxsearch]
Most of the discussion here may not be valid anymore, we still need a way to get paths, it could likely be simplified to an adjacency table. Ideally we'd need that info mirrored in memory to be able to remove GetFolderIdForItem
Blocks: 1449581
Blocks: 1420246
Blocks: 1535881

Is this a duplicate of bug 1591790?

Flags: needinfo?(mak)

(In reply to Mathew Hodson from comment #11)

Is this a duplicate of bug 1591790?

it's a partial overlap. bug 1591790 made an opt-in version of this, using a recursive query. The scope here was use a non-recursive query and return the path to a bookmark for each bookmark every time. So this still applies, but it's less critical now.

Flags: needinfo?(mak)
No longer blocks: 469441
Severity: normal → --
See Also: → 1591790
You need to log in before you can comment on or make changes to this bug.