speed up rdf sequence building (bookmark parse)

NEW
Assigned to

Status

defect
18 years ago
11 months ago

People

(Reporter: tingley, Assigned: tingley)

Tracking

({perf})

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

Assignee

Description

18 years ago
I mentioned this to Waterson last week and he wanted to see the code, so
hence this bug.  

I've been spending some time trying to cut down the time taken in the
construction of the initial bookmarks datasource when the file is parsed.
Despite the fix for bug 69009, there is still an O(n**2) algorithm inherent
in the RDF code in this case.  InMemoryDataSourceImpl::Assert() contains an
implicit duplicate check, which is currently an O(n) operation.  Even if
word done on bug 35817 decreased this to O(log n), we'd still be doing work
which is entirely pointless -- if a sequence is being build through the RDF
container AppendElement() API, there's no chance of a collision, since the
property value is incremented each time.  (Their actually will be a
collision if we roll over the 10 digits that GetNextValue() allows for
_nnn; this is a risk I'm willing to take.)

I added an "UncheckedAssert()" method to a new interface called
nsIRDFHintableDataSource (the reasons for keeping it separate have little
to do with design and a lot to do with trying to limit tree cruft).

Also, I noticed that we spend a lot of time in GetNextValue() repeatedly
loading the nextVal property, incrementing it, and writing it back.  I
theorized that if the container was made aware that a number of adds we
being made in quick succession, and bypass this datasource-thrashing.  I
added an nsIRDFHintableContainer with a pair of batching APIs, and hooked
them up to nsBookmarkService.cpp.

Numbers.  The 1000+ item bookmark list from bug 69009 and I are old friends
by now; note that the flatter the bookmarks file, the greater the speedup
(a hierarchical bookmarks file contains a series of smaller nested
sequences, on which performance isn't as bad).

Since there's issues with file caching, etc; I ran each five times in hope
of some sort of convergence.  All numbers are in thousands of ticks of the
PR_Now() clock.

Original:               523     537    540     510     519
+container batching:    405     397    390     388     410
+unchecked assertions:  351     359    370     349     369

I'll post the code shortly.  It's a work in progress; many nits such as:
- common Assert/UncheckedAssert code can be factored out
- UncheckedAssert might be better off adding to the tail, rather than head,
  of the list
- container batching could probably benefit some synchronization
- etc
Assignee

Comment 2

18 years ago
Assignee

Comment 3

18 years ago
Assignee

Comment 4

18 years ago
There's a less ambitious/scary alternative, which I tested at one point and I
believe generates decent numbers -- GetNextValue() should use
datasource->Change() rather than Unassert()/Assert().  This, along with
implementing LockedChange() in InMemoryDataSource, cuts the amount of time spent
updating nextVal in half.  

I've got code for this somewhere; I'll try to dig it up.
Assignee

Updated

18 years ago
Blocks: 7251
Keywords: perf
Assignee

Comment 5

18 years ago
More modest results produced by switching GetNextValue over to Change() and
adding a LockedChange() method to the InMem datasource.

Original:               523     537    540     510     519
+LockedChange           464     454    461     448     465

Updated

17 years ago
Keywords: mozilla1.1

Comment 6

16 years ago
tever is not RDF QA anymore
QA Contact: tever → nobody
Assignee

Comment 7

16 years ago
Clearing the mozilla1.1 keyword.

I'm not thrilled with the approach in that patch any more.  I'd rather fix this
as part of a general improvement of the container implementation that pulled
nextVal out of the graph and did a better job of optimizing lookups for ordinal
arcs.
Keywords: mozilla1.1

Comment 8

15 years ago
(In reply to comment #7)
> Clearing the mozilla1.1 keyword.
> 
> I'm not thrilled with the approach in that patch any more.  I'd rather fix this
> as part of a general improvement of the container implementation that pulled
> nextVal out of the graph and did a better job of optimizing lookups for ordinal
> arcs.

Looking for any possible speedups in the upcoming releases.
QA Contact: nobody → rdf

Updated

11 months ago
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.