Open Bug 105966 Opened 22 years ago Updated 5 years ago
speed up rdf sequence building (bookmark parse)
I mentioned this to Waterson last week and he wanted to see the code, so hence this bug. I've been spending some time trying to cut down the time taken in the construction of the initial bookmarks datasource when the file is parsed. Despite the fix for bug 69009, there is still an O(n**2) algorithm inherent in the RDF code in this case. InMemoryDataSourceImpl::Assert() contains an implicit duplicate check, which is currently an O(n) operation. Even if word done on bug 35817 decreased this to O(log n), we'd still be doing work which is entirely pointless -- if a sequence is being build through the RDF container AppendElement() API, there's no chance of a collision, since the property value is incremented each time. (Their actually will be a collision if we roll over the 10 digits that GetNextValue() allows for _nnn; this is a risk I'm willing to take.) I added an "UncheckedAssert()" method to a new interface called nsIRDFHintableDataSource (the reasons for keeping it separate have little to do with design and a lot to do with trying to limit tree cruft). Also, I noticed that we spend a lot of time in GetNextValue() repeatedly loading the nextVal property, incrementing it, and writing it back. I theorized that if the container was made aware that a number of adds we being made in quick succession, and bypass this datasource-thrashing. I added an nsIRDFHintableContainer with a pair of batching APIs, and hooked them up to nsBookmarkService.cpp. Numbers. The 1000+ item bookmark list from bug 69009 and I are old friends by now; note that the flatter the bookmarks file, the greater the speedup (a hierarchical bookmarks file contains a series of smaller nested sequences, on which performance isn't as bad). Since there's issues with file caching, etc; I ran each five times in hope of some sort of convergence. All numbers are in thousands of ticks of the PR_Now() clock. Original: 523 537 540 510 519 +container batching: 405 397 390 388 410 +unchecked assertions: 351 359 370 349 369 I'll post the code shortly. It's a work in progress; many nits such as: - common Assert/UncheckedAssert code can be factored out - UncheckedAssert might be better off adding to the tail, rather than head, of the list - container batching could probably benefit some synchronization - etc
There's a less ambitious/scary alternative, which I tested at one point and I believe generates decent numbers -- GetNextValue() should use datasource->Change() rather than Unassert()/Assert(). This, along with implementing LockedChange() in InMemoryDataSource, cuts the amount of time spent updating nextVal in half. I've got code for this somewhere; I'll try to dig it up.
More modest results produced by switching GetNextValue over to Change() and adding a LockedChange() method to the InMem datasource. Original: 523 537 540 510 519 +LockedChange 464 454 461 448 465
tever is not RDF QA anymore
QA Contact: tever → nobody
Clearing the mozilla1.1 keyword. I'm not thrilled with the approach in that patch any more. I'd rather fix this as part of a general improvement of the container implementation that pulled nextVal out of the graph and did a better job of optimizing lookups for ordinal arcs.
(In reply to comment #7) > Clearing the mozilla1.1 keyword. > > I'm not thrilled with the approach in that patch any more. I'd rather fix this > as part of a general improvement of the container implementation that pulled > nextVal out of the graph and did a better job of optimizing lookups for ordinal > arcs. Looking for any possible speedups in the upcoming releases.
You need to log in before you can comment on or make changes to this bug.