Closed Bug 232710 Opened 21 years ago Closed 20 years ago

Implement global fact store update interfaces

Categories

(Core Graveyard :: RDF, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: nrm, Assigned: mozilla)

Details

User-Agent:       
Build Identifier: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.6) Gecko/20040113

This is a factory bug. Dump needs, analysis and references
to specific subsequent bugs here.

The Mozilla rdf fact store (the in-memory sum
of all current datasources) needs to be easily sync'able,
refresh'able, flush'able and clear'able, as for any caching
system, proxying or mirror system, or buffer. Those four
functions (S R F C) should be possible both globally and on
a per-datasource basis.

sync:    the factstore pushes all changes out synchronously.
refresh: the factstore reloads all remote content.
flush:   the factstore pushed all changes out once only.
clear:   the factstore is emptied.

The fact store is a modifiable mirror duplicate of external
data. Since the fact store has no persistent state of its own,
modifications must be fully re-distributed if they are to persist.
This is a distributed update (transaction) problem.  A big lever
(an interface?) is required for programmatic control of these
updates. Other distributed systems have such big levers, eg sync(1)
and SQL COMMIT.

What is this funtionality for?  To enable easy development and
rational design of Mozilla-based applications that manipulate
RDF data from disparate sources.

Bugs so far: bug 230825, bug 80720, bug 122846


Reproducible: Always
Steps to Reproduce:
shorter comment than the previous one, which ended up in a killed tab, fuck.

(In reply to comment #0)

> The Mozilla rdf fact store (the in-memory sum
> of all current datasources)

There is no such thing. Mozilla RDF datasources are globally unique, but 
security constraints to them are not.

> needs to be easily sync'able,
> refresh'able, flush'able and clear'able, as for any caching
> system, proxying or mirror system, or buffer. Those four
> functions (S R F C) should be possible both globally and on
> a per-datasource basis.

It should *not* be possible globally, due to security constraints.
The per-datasource basis is dealt with in other bugs.

> sync:    the factstore pushes all changes out synchronously.
You said the s-word.

> refresh: the factstore reloads all remote content.
> flush:   the factstore pushed all changes out once only.
how would this be different to sync?

> clear:   the factstore is emptied.
This would erase the users bookmarks, for example, if your proposal went in
like this.

<...> 
> What is this funtionality for?  To enable easy development and
> rational design of Mozilla-based applications that manipulate
> RDF data from disparate sources.

Not even Mozilla itself uses more than, say 4 datasources for a single task.
I don't see a big burden in controlling your synchronisations on a per
datasource basis, esp as you could easily wrap that. (I actually need that in
my own rdf application.) Or just put them into a composite datasource and fall 
back to bug 230825.
I'm not too much against fixing that, as composite datasource is the main
entry point for xul templates.

This one however is more or less introducing a security problem, so I would
a) WONTIFX it or
b) DUPE it back on bug 230825 as a workaround.
Sigh. This is an enhancement request and therefore forward-looking.
Enhancements do not imply the ruination of the status quo.
To address the remarks in comment #1:

> there's no such thing as a fact store.

It's a Computer Science concept, not an implementation term.
Sets of Mozilla datasources clearly have fact store-like properties, most
obviously in their support of Assert(). Ok, so they're partitioned,
except for nsIRDFCompositeDataSource.

> It should *not* be possible globally [ to execute S R F C operations ]

Inside the chrome a developer should have that option. Outside the
chrome, I agree.

> You said the s-word.

The XMLHttpRequest object can be used synchronously. Email can be sent
synchronously. There are many other examples. Synchronous use is
sometime useful. This bug does not request or require the removal of
standard asynchronous behaviour. It is an enhancement request. And
in any event I mean "synchronise with the external sources", not
"implement in a synchronous fashion". The former wording can be
implemented sychronously or asynchronously.

> This would erase the users bookmarks

Only in an application that implements bookmarks. Composer? Mail?
But see the next point.

> Not even Mozilla itself uses more than, say 4 datasources

The argument here is for group action on whatever datasources
are available, not on whatever datasources are used by mozilla.
In a data-oriented application there is a clear need to synchronise
one or more datasources as a unit.

Your point is well made that sometimes a subset of datasources might need
acting upon, rather than every single one. To implement SRFC or
equivalent per-datasource plus some wrapper code for coordination might
be enough. No case yet for WONTFIX, though.

- N.
The subset case for 'refresh' is addressed in bug 235866,
if the subset is exactly one nsIRDFCompositeDataSource.

If the subset intersects one or more composite datasources
incompletely, then there is no bug for that yet.

- N.
The word "global" in the summary is bizarre. From the comments here, do you mean
not a global datasource, but some theoretical abstraction of any datasources
that happen to be in use?

> This is a factory bug. Dump needs, analysis and references
> to specific subsequent bugs here.

Do you mean a tracking bug? If you're gonna use bugzilla like this, take the
time to learn the proper terminology.

> The fact store is a modifiable mirror duplicate of external
> data. Since the fact store has no persistent state of its own,

Can you please stop using random computer-science terminology when mozilla
already has well-defined terms for things? You want "RDF Datasource" or
"datasource" for short.

Datasources do not always mirror remote data. The localstore and chrome
datasources, for example, *are* the data they represent. The data is serialized
to disk, but the decision about how/when to flush to disk are the responsibility
of the datasource implementation, and shouldn't be touched by client code.

> modifications must be fully re-distributed if they are to persist.
> This is a distributed update (transaction) problem.  A big lever

Since each datasource defines its own persistence model, this is not possible in
general abstractions. For example, the history datasource is dynamically
generated from data stored in a mork database, and updates are immediately
flushed back to the mork backend. There is no intermediate step to worry about.
I am writing an rdf-triples file-store backend that will also perform immediate
updates. In these cases, the flush-refresh-transactions model makes no sense.

We already have the nsIRDFRemoteDataSource interface, which provides control
over persistence where applicable. Iterating over the members of a composite
datasource and manually reloading/persisting them, as appropriate, is not hard
and is a lot better than trying to hack persistence on to a generic interface.

As you can see from existing bugs, there are problems with the RDF model that
need to be solved, in particular relating to non-trusted access. 
> "global" is bizarre terminology

Probably is. I do mean "some theoretical abstraction of all datasources that may
be in use" as you point out.

> Is this a tracking bug?

No, more like bug 206358 where issues are beaten out, yielding other bugs. If
there's better terminology than 'factory bug', then please advise.

> Datasources do not always mirror remote data

I said "external", not "remote". But I stand corrected - the
in-memory-datasource and window-mediator datasources have no externally-stored
equivalent. All other datasources do.

I think saying that the in-memory copy (eg chrome) is the "real" copy is just a
matter of perspective.  Clearly the only copy that is "real" when mozilla is
shut down is the serialised copy. Certainly the in-memory copy is the most
useable copy once mozilla gets going.

> each datasource defines its own persistence model

My point exactly. It is very complex for users of datasources to understand all
the (semi-documented, or entirely undocumented) persistence models. It is easier
to understand that flush() will serialise out data in all cases. If that happens
to be implemented as a no-op for some datasources, because they are
self-synchronising internally, then there is no harm in that. sync(1) is an
equivalent on Unix. You can still sync a filesystem that is uncached. What the
user gets is a reliable interface that saves having to study datasource
implementation details - details that are not available in APIs anyway.

> avoid Comp.Sci. terms.

No one would balk at the use of "stack" in a bug. "fact store" is the equivalent
term for an un-ordered collection of tuples. But I take your point about being
specific about datasources.

> We already have the nsIRDFRemoteDataSource interface, which provides control
> over persistence where applicable. Iterating over the members of a composite
> datasource and manually reloading/persisting them, as appropriate, is not hard
> and is a lot better than trying to hack persistence on to a generic interface.

You correctly point out the main issue is remote datasources. What the current
interface does not provide, nor does nsIRDFCompositeDataSource, is the case
where two or more XUL templates (for example) have some datasources in common.
That is common in master-detail and cross-tab GUI screens, for example. If a
user makes changes, then they will want to see all changes flushed out as a
result of a single action. There is no need to force the app developer to write
a custom multi-collection iterator for each such screen when a single API cal
would do.

In summary, this bug seeks to be about supporting data-oriented processing in
new mozilla applications rather than about renovating the way mozilla works
internally. It's a higher layer of abstraction. I appreciate that other work on
the RDF model might be required before this proposed abstract layer can be
implemented fully.

Another prespective is to note how the RDF interfaces are somewhat fragmented.
nsIRDFRemoteDataSource, nsIRDFPurgeableDataSource and
nsIRDFPropogateableDataSource all have very few methods each and all attempt to
deal with replication or de-replication issues of one kind or another.

This bug only advocates that cetain functionality be made available. It would be
nice to see an implementation that provides an overall solution that
(eventually) could consolidate these fragmented interfaces.

Neil, what do you think of all this?

- N.
The only part of this which I've understood would be some sort of global
datasource enumerator, although I haven't been able to think of a use case.
Resolving WONTFIX.
Yes, we want to document the persistance model of the several datasources while
documenting them, like their vocabulary and stuff.
But we wont force any persistance model on any datasource, that would be just
wrong, and lead to really bad performance in general.
Plus, the "global" term in this bug just doesn't work in the security model of
the browser.
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.