408991 - Find a way to efficiently extract bookmarks paths

Reporter

Description

•

17 years ago

This is something i already pointed out some time ago, current bookmarks implementation uses a parent-position relationship to build a n-level tree and this has some pro & cons:

pros:
 - fast inserts
 - compact table
cons:
 - to find the grandparent of a child you have to use recursion
 - to find all childrens of a folder (and his subfolders) you have to do complex loops
 - to count all childrens of a folder (and his subfolders) you have to do complex loops
 - you can end up with cycles

But for nested structures in an SQL database the best choice would be to use a nested set preordered tree, these are php/mysql article on the subject:
http://www.phpriot.com/articles/nested-trees-1/1
http://www.phpriot.com/articles/nested-trees-2 
implementing this in other languages is not a problem since the main concept is sql based

table changes required would be adding 3 columns: level, left, right, and a index on them, but probably one could discard position

pros:
 - you can always know if a node is child of another 
 - you find all childrens of a folder with a single query
 - you count all childrens of a folder with a single query
 - you easily move from one node to another 
 - you cannot create cycles
cons:
 - less compact table (2 more columns)
 - slow inserts: on every insert you have to update more than one node

Marco Bonardo [:mak]

Reporter

Comment 1

•

16 years ago

an interesting document from mysql:
http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

Marco Bonardo [:mak]

Reporter

Updated

•

15 years ago

Blocks: 488968

Marco Bonardo [:mak]

Reporter

Updated

•

15 years ago

Blocks: 523523

Marco Bonardo [:mak]

Reporter

Updated

•

15 years ago

Assignee: nobody → mak77

Marco Bonardo [:mak]

Reporter

Updated

•

15 years ago

Summary: investigate the use of a nested tree for bookmarks table → Find a way to efficiently extract bookmarks paths (nested tree for bookmarks table?)

David Dahl :ddahl

Comment 2

•

15 years ago

I would recommend trying to get a Labs' TestPilot study going to help determine just how nested our user's bookmarks hierarchies are.

The querys that will run to determine children and ancestors may not be as hindered as we think by recursion. MPTT requires some pretty hefty queries for writes, and in my experience tends to be quite fragile (with or without a comprehensive test suite).

Gervase Markham [:gerv]

Comment 3

•

15 years ago

Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv

Component: Places → Bookmarks & History

QA Contact: places → bookmarks

Marco Bonardo [:mak]

Reporter

Updated

•

15 years ago

Assignee: mak77 → nobody

Component: Bookmarks & History → Places

Product: Firefox → Toolkit

QA Contact: bookmarks → places

Marco Bonardo [:mak]

Reporter

Comment 4

•

15 years ago

An alternative approach could be to save a cache of all the parents of each folder, in a separate closure table, this cache should be updated every time we move a folder in the hierarchy, then we should update the moved folder and all of its descendants folders.
At first glance it could be faster and require less disk space than a nested tree but it would have other disadvantages:
* harder to check coherence (with nested trees we can run some aritmetic to check if the values are out of sync)
* will only give us ancestors path, this is what we need atm, but a nested tree would also give us possibility to implement a richer bookmarks API

Nested trees are usually expensive, but we don't have so long hierarchies in bookmarks. If we can go for them (not too heavy perf loss) they are the best choice for our APIs, otherwise we can try this alternative implementation.

I was also thinking about a leaf-less nested tree implementation, where we only track folders, ignoring the fact they could have flat children, i can't tell offhand if it's easier to do (or doable at all but sounds possible), at first glance it could be faster to update but would lose some functionality (checking number of children, telling if a bookmark is descendant of a certain ancestor...). It sounds pretty similar to the closure table approach above, but should not have the coherence check problem.

So for now i'd go for the richer implementation, evaluate perf loss, and if it's not acceptable start looking into less functional implementations.

status1.9.1: --- → wontfix

status1.9.2: --- → wontfix

Summary: Find a way to efficiently extract bookmarks paths (nested tree for bookmarks table?) → Find a way to efficiently extract bookmarks paths

Target Milestone: Future → ---

Marco Bonardo [:mak]

Reporter

Comment 5

•

15 years ago

(In reply to comment #2)
> The querys that will run to determine children and ancestors may not be as
> hindered as we think by recursion.

it will make use async damn hard, since recursion has to be done in code, so you have to query -> code -> query -> code. suppose you have 10 000 bookmarks to render and you want the path for each one of them.

Marco Bonardo [:mak]

Reporter

Comment 6

•

15 years ago

unless you want to recurse at each change, and cache the value for each folder. but at this point updates will still be slow, and the stytem will still be fragile (plus we would miss a way to check coherence of the cached data). At that point i'd prefer the leaf-less implementation.

David Dahl :ddahl

Comment 7

•

15 years ago

I do that when I have used MPTT, we tracked all nodes in a separate table called "hierarchy", thus making a distinction between data and hierarchy. This served us well, but we stil had to to a single join to scoop up the data:

table hierarchy
id, bookmark_id, sibling_index, has_children, depth, _right, _left, parent_id

The select: SELECT * FROM hierarchy WHERE _left BETWEEN 239 AND 278 - and of course JOIN the bookmarks 

writes were expensive, but the update query can be written pretty optimized as you only have to update _left, _right and depth:

UPDATE hierarchy
  SET _left = %d,
      _right = %d,
      depth = %d
  WHERE id = %d

I think the MozStorage async api can take an array of statements and synchronously iterate. The tricky thing was always that the next update needed to operate on a row that was updated microseconds beforehand. We had some issues with postgres where this moved nodes around in random ways. The funny thing was that when did our initial testing with SQLite and it never exhibited the random bug. This screams "test driven developm,ent" of even the tinest private methods.:)

David Dahl :ddahl

Comment 8

•

15 years ago

(In reply to comment #2)
> I would recommend trying to get a Labs' TestPilot study going to help determine
> just how nested our user's bookmarks hierarchies are.

Test pilot study is here: https://testpilot.mozillalabs.com/testcases/a-week-life/results.html

Marco Bonardo [:mak]

Reporter

Comment 9

•

8 years ago

most of the links expired, good ref atm:
http://stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree/192462#192462

Marco Bonardo [:mak]

Reporter

Updated

•

7 years ago

Priority: -- → P3

Marco Bonardo [:mak]

Reporter

Updated

•

6 years ago

Blocks: 469421, 469441

Marco Bonardo [:mak]

Reporter

Updated

•

6 years ago

Whiteboard: [fxsearch]

Marco Bonardo [:mak]

Reporter

Comment 10

•

6 years ago

Most of the discussion here may not be valid anymore, we still need a way to get paths, it could likely be simplified to an adjacency table. Ideally we'd need that info mirrored in memory to be able to remove GetFolderIdForItem

Marco Bonardo [:mak]

Reporter

Updated

•

6 years ago

Blocks: 1449581

Mike Conca [:mconca]

Updated

•

6 years ago

Blocks: 1420246

Marco Bonardo [:mak]

Reporter

Updated

•

5 years ago

Blocks: 1535881

alex_mayorga

Updated

•

4 years ago

status-firefox73: --- → affected

status-firefox74: --- → affected

status-firefox75: --- → affected

status-firefox76: --- → affected

Mark Banner (:standard8)

Updated

•

4 years ago

status-firefox73: affected → ---

status-firefox74: affected → ---

status-firefox75: affected → ---

status-firefox76: affected → ---

Mathew Hodson

Comment 11

•

3 years ago

Is this a duplicate of bug 1591790?

Flags: needinfo?(mak)

Marco Bonardo [:mak]

Reporter

Comment 12

•

3 years ago

(In reply to Mathew Hodson from comment #11)

Is this a duplicate of bug 1591790?

it's a partial overlap. bug 1591790 made an opt-in version of this, using a recursive query. The scope here was use a non-recursive query and return the path to a bookmark for each bookmark every time. So this still applies, but it's less critical now.

Flags: needinfo?(mak)

Mathew Hodson

Updated

•

3 years ago

No longer blocks: 469441

Severity: normal → --

status1.9.1: wontfix → ---

status1.9.2: wontfix → ---