Closed Bug 35817 Opened 26 years ago Closed 22 years ago

optimize in-mem RDF for small # of props leading to large # of targets

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Future

People

(Reporter: waterson, Assigned: mozilla)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [nsbeta3-])

Attachments

(2 files, 7 obsolete files)

nsIRDFInMemoryDataSource IDL - add into mozilla/rdf/base/idl/ 24 years ago Robert John Churchill 1.99 KB, text/plain		Details
Small diffs to nsRDFContainer.cpp and makefiles 24 years ago Robert John Churchill 3.33 KB, patch		Details \| Diff \| Splinter Review
nsInMemoryDataSource.cpp diffs - magic herein 24 years ago Robert John Churchill 35.64 KB, patch		Details \| Diff \| Splinter Review
nsInMemoryDataSource.cpp - entire file for those who don't enjoy applying large diffs 24 years ago Robert John Churchill 61.27 KB, text/plain		Details
nsInMemoryDataSource.cpp diffs - magic herein 24 years ago Robert John Churchill 35.69 KB, patch		Details \| Diff \| Splinter Review
nsInMemoryDataSource.cpp - entire file for those who don't enjoy applying large diffs 24 years ago Robert John Churchill 61.27 KB, text/plain		Details
RDF optimizations with \|union\| usage 24 years ago Robert John Churchill 53.25 KB, patch		Details \| Diff \| Splinter Review
RDF optimizations with \|union\| usage, tweaked for Linux 24 years ago Robert John Churchill 53.13 KB, patch	tingley : review+ waterson : superreview+	Details \| Diff \| Splinter Review
Remove unused RDF service reference 24 years ago Robert John Churchill 1.72 KB, patch	brendan : review+ brendan : superreview+	Details \| Diff \| Splinter Review

Chris Waterson

Reporter

Description

•

26 years ago

The current in-memory datasource implementation is not well tuned for large datasources. It needs the following: 1. an arena-based fixed size allocator from which Assertion objects are allocated. 2. each "source" should have a hashtable of "properties" that lead out of it if there are more than (say) ten distinct properties. 3. each triple <source, property, target> should be hashed for quick lookup if there are more than (say) ten distinct targets for a <source, property> pair. Right now, a lot of the operations degenerate into linear-time lookups, which kills performance in large datasources.

(not reading, please use seth@sspitzer.org instead)

Comment 1

•

25 years ago

right now, the entire list of newsgroups for a server gets put into the subscribe datasource, which is an in memory datasource. that a big in memory datasource, for a server like news.mcom.com before waterson spends time on this, we should make sure this work would be worth his time. I also have to make sure I'll continue to use the in memory datasource, and not a custom one.

Chris Waterson

Reporter

Updated

•

25 years ago

Status: NEW → ASSIGNED

Target Milestone: --- → M18

(not reading, please use seth@sspitzer.org instead)

Comment 2

•

25 years ago

use PLHash for this, not nsHashTable. steal from waterson, as this is hurts the subscribe dialog.

Assignee: waterson → sspitzer

Status: ASSIGNED → NEW

(not reading, please use seth@sspitzer.org instead)

Comment 3

•

25 years ago

notes from irc converstion with waterson: <waterson> anyway, if you're going to add a hashtable to remember the properties in in-memory datasource <waterson> it shouldn't be too hard <waterson> have you looked thru that code at all? <sspitzer> yes, I was debugging through it <waterson> ok <sspitzer> when trying to figure out why my HasAssertion() wasn't doing what I thought it would. <sspitzer> I'll get the HasAssertion() riddle figured out <waterson> so there are two top-level hashtables keyed by the "source" <sspitzer> and then, when I get time, add the hashtable. <waterson> and "target" respectively <sspitzer> ok... <waterson> when you assert <waterson> it creates an Assertion object <waterson> that gets added to two linked lists <waterson> the "forward" links, which comes from the source-indexed table <waterson> and <waterson> the "backwards" links, which comes from the target-indexed table <sspitzer> (this is good, keep going...) <waterson> so when doing GetTarget, for example <waterson> it looks up the source in the source-indexed table <waterson> and just walks through the list comparing properties until it finds a match <waterson> and returns the target <waterson> so <waterson> there are two cases to optimize <waterson> source with many properties <waterson> source with few properties, but many targets <waterson> your case is the latter <waterson> so you'd want to index on both property and target. <sspitzer> gotcha. <sspitzer> any need to make both fast? <sspitzer> how about other in memory datasources, like bookmarks? <sspitzer> or, are we "good enough" for them? <waterson> well, we could make both go fast if we used a tree structure instead of hashtables <sspitzer> isn't there some tree datastructures in xpcom/ds? <waterson> there are <waterson> ideally, we'd make both cases go fast <waterson> because there are some graphs where you do have a lot of properties <waterson> on the same source

Status: NEW → ASSIGNED

selmer (gone)

Comment 4

•

25 years ago

David, can you make the call on whether we want to try this for beta3?

Assignee: sspitzer → bienvenu

Status: ASSIGNED → NEW

David :Bienvenu

Comment 5

•

25 years ago

I'll defer to Chris and/or Alec. Is this still an issue, do you think?

Chris Waterson

Reporter

Comment 6

•

25 years ago

I think rjc is working on this under the guises of another bug.

Assignee: bienvenu → rjc

Daniel Veditz [:dveditz]

Comment 7

•

25 years ago

Nominating for PR3 to get this on the radar that a decision needs to be made since there appears to be some sentiment that it's necessary and some confusion over whether it's being worked on.

Keywords: nsbeta3, perf

johng

Comment 8

•

25 years ago

triage team: nsbeta3+, very important to mail

Whiteboard: [nsbeta3+]