Closed Bug 105966 Opened 23 years ago Closed 1 month ago

speed up rdf sequence building (bookmark parse)

Categories

(Core Graveyard :: RDF, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: tingley, Assigned: tingley)

References

Details

(Keywords: perf)

Attachments

(3 files)

I mentioned this to Waterson last week and he wanted to see the code, so
hence this bug.  

I've been spending some time trying to cut down the time taken in the
construction of the initial bookmarks datasource when the file is parsed.
Despite the fix for bug 69009, there is still an O(n**2) algorithm inherent
in the RDF code in this case.  InMemoryDataSourceImpl::Assert() contains an
implicit duplicate check, which is currently an O(n) operation.  Even if
word done on bug 35817 decreased this to O(log n), we'd still be doing work
which is entirely pointless -- if a sequence is being build through the RDF
container AppendElement() API, there's no chance of a collision, since the
property value is incremented each time.  (Their actually will be a
collision if we roll over the 10 digits that GetNextValue() allows for
_nnn; this is a risk I'm willing to take.)

I added an "UncheckedAssert()" method to a new interface called
nsIRDFHintableDataSource (the reasons for keeping it separate have little
to do with design and a lot to do with trying to limit tree cruft).

Also, I noticed that we spend a lot of time in GetNextValue() repeatedly
loading the nextVal property, incrementing it, and writing it back.  I
theorized that if the container was made aware that a number of adds we
being made in quick succession, and bypass this datasource-thrashing.  I
added an nsIRDFHintableContainer with a pair of batching APIs, and hooked
them up to nsBookmarkService.cpp.

Numbers.  The 1000+ item bookmark list from bug 69009 and I are old friends
by now; note that the flatter the bookmarks file, the greater the speedup
(a hierarchical bookmarks file contains a series of smaller nested
sequences, on which performance isn't as bad).

Since there's issues with file caching, etc; I ran each five times in hope
of some sort of convergence.  All numbers are in thousands of ticks of the
PR_Now() clock.

Original:               523     537    540     510     519
+container batching:    405     397    390     388     410
+unchecked assertions:  351     359    370     349     369

I'll post the code shortly.  It's a work in progress; many nits such as:
- common Assert/UncheckedAssert code can be factored out
- UncheckedAssert might be better off adding to the tail, rather than head,
  of the list
- container batching could probably benefit some synchronization
- etc
There's a less ambitious/scary alternative, which I tested at one point and I
believe generates decent numbers -- GetNextValue() should use
datasource->Change() rather than Unassert()/Assert().  This, along with
implementing LockedChange() in InMemoryDataSource, cuts the amount of time spent
updating nextVal in half.  

I've got code for this somewhere; I'll try to dig it up.
Blocks: 7251
Keywords: perf
More modest results produced by switching GetNextValue over to Change() and
adding a LockedChange() method to the InMem datasource.

Original:               523     537    540     510     519
+LockedChange           464     454    461     448     465
Keywords: mozilla1.1
tever is not RDF QA anymore
QA Contact: tever → nobody
Clearing the mozilla1.1 keyword.

I'm not thrilled with the approach in that patch any more.  I'd rather fix this
as part of a general improvement of the container implementation that pulled
nextVal out of the graph and did a better job of optimizing lookups for ordinal
arcs.
Keywords: mozilla1.1
(In reply to comment #7)
> Clearing the mozilla1.1 keyword.
> 
> I'm not thrilled with the approach in that patch any more.  I'd rather fix this
> as part of a general improvement of the container implementation that pulled
> nextVal out of the graph and did a better job of optimizing lookups for ordinal
> arcs.

Looking for any possible speedups in the upcoming releases.
QA Contact: nobody → rdf
Product: Core → Core Graveyard
Status: NEW → RESOLVED
Closed: 1 month ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: