Closed Bug 216797 Opened 22 years ago Closed 8 years ago

specify, document and fix RDF Commands goop

Categories

(Core Graveyard :: RDF, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: tingley, Assigned: tingley)

References

Details

First suggested in bug 174583 comment 4, and more recently in bug 216788. Nobody uses the Command system except a few specialized datasources (bookmarks, search), so we can reduce bloat by splitting it off into nsIRDFCommandDataSource. Note that the CompositeDataSource needs to implement this interface, since it is mostly accessed from javascript through |foo.database|. Patch should be ready tonight.
As I mentioned in bug 174583 comment 5, this smells like "busy-work" for negligible gain. I would want to see hard measurement(s) of any suggested savings (taking into account the addition of C++ code footprint for things such as the Composite datasource as well as extra QIs in both C++ and JS to support "yet-another-interface".
One bug definitly benefiting from this would be bug 122846, scriptability of rdf. The common judgement on exposing methods to js was "don't without explicit use". So one conclusion was to not expose the command methods. That'd be much easier if they were on a private interface. Which is what drove me into getting them separated. From a design perspective, I never understood the general idea of commands and they still seem to be rather tightly bound to their ds implementation. Kinda justifies the idea of not having them around at each and every end to me.
this is so not busy work. this is making RDF, as implemented in mozilla, more easily accessible to the general public, by making the API sane enough to freeze.
It is a good idea to ask questions when something isn't well understood before drawing conclusions. By virtue of using a common vocabulary of RDF nodes (which is how the current datasources implementing the RDF command APIs work), aggregation of operations across multiple datasources becomes possible... which is the exact opposite of the last statement in comment 2. Most people, when writing a DS, implement the interface then tack on interface-specific methods for their particular purpose. That makes RDF aggregation impossible for various tasks... thus, the command APIs. This bug (as I read it) is about reducing bloat... and I don't necessarily see much, if any, bloat reduction here... instead, I see a "squeeze the balloon here, but ignore when it expands over there" situation. If Chase can obtain a significant bloat-reduction win, then great. Otherwise, I would suggest that precious developer time would be better spent elsewhere. As for Alec's comment about making RDF more easily accessible... "yeah, right, whatever". If your goal is to make RDF more easily accessibly, figure out how to get people up to speed on what a graph is, how it functions, how to leverage aggregation, etc. Trying to factor out a few interface methods (which are virtually required for fully functional, read/write datasources) doesn't win you anything.
I don't understand how commands are required for read/write datasources. I also don't understand why you object to moving commands to a seperate interface. the point in reducing bloat is to remove implementations of DoCommand() for all the datasource implementations that don't use them. this may balance out with the extra QI's needed, but it seems like a valid architectural change to me.
Alec: I object to additional interfaces without obvious benefit. Until someone can show how bloat is *reduced* by creating more interfaces in a situation such as this, I'm going to push against it. What's the current situation? As I see it, we are talking about moving two or three methods where, if not used [the reason why you seem to want the separate interface... a "valid architectural change" for you], the C++ code default is usually either to simply "return" or to forward to an "mInner" in-memory datasource, a small # of execution bytes. Is it a "valid architectural change" for you if bloat increases? Yes, you are saving a few bytes for those DSs that don't use commands, but commands ARE used in some of the DSs as well. So, not only is there the additional interface, but adding QI'ing in all relevant spots of usage (JS and C++) as well as addition code in places such as the composite datasource, etc. Where is the bloat win with such a change? There is another reason for me (besides an issue of bloat) to not want to see the interface addition. As I mentioned, RDF aggregation practically REQUIRES something equivalent to the RDF command APIs. If your datasource doesn't use RDF aggregation, great, no big deal. As various code sites (Mail, for example) have done in the past, just add new interface-specific methods to allow whatever functionality you need. However, once you start down that path, you have made the firm decision to never support RDF aggregation. That is a shame though, because future-proofing the code would mean trying to build in support just in case. Granted, the RDF command APIs aren't exactly the "sweetest" design. Historically, Warren added them I believe. However, they do serve a practical purpose for RDF as having them on the main RDF interface means that they need to at least be considered by the developer. I'll try and briefly give you an imaginary scenario: let's say you wanted to aggregate Mail & News folders into Bookmarks. Today, bookmarks uses RDF commands. Mail doesn't though, so you are in a world of pain, also known as "damn, what to do, I've got all these methods for selection, deletion, etc". However, by adding support for RDF commands, aggregation would just "happen". That's the "win" with RDF aggregation (and therefore RDF commands)... datasources can serve up their data, merged anywhere, into the UI via RDF templates. I don't want to see the RDF interfaces become heavily fragmented and thereby more confusing. Going along with your "it seems like a valid architectural change to me" rational, would you break out other methods such as "GetAllResources, ArcLabelsIn, ArcLabelsOut, etc" that aren't quite as commonly used as "GetTarget, GetTargets, etc" onto their own interface(s) as well? Such a similar move would (to me) seem to go against what you said in comment 3 where "this is making RDF, as implemented in mozilla, more easily accessible to the general public..." Just food for thought. :^)
Hrm. Actually. Is that example broken in nsCompositeDataSource? If any of the aggregated datasources report a command to be disabled, the composite datasource claims that the command is disabled. So in the example of bookmarks and mail folders (or a rdfxmlds) in one bin, the composite ds would report disabled, though executing the command would prolly succeed and do something. (looked a tad closer, the return values for IsCommandEnabled for the various datasources not bothering about commands are more or less random. Evil minds would take that as a sign that this is not the right place to have it) Anyway, I still have to see a good example of a datasource other than the bookmarks ds to make something with the command add-bookmark. Even if we aggregated two bookmark sources (one company, one private shared, one local on the machine, for example), add-bookmark can have several meanings. And may even yield strange results. Looking at nsBookmarksService::DoCommand, I don't see anything that serves well for aggregation. To clarify, I never understood this to be a bug primarily about bloat. It is about design. There is no way that a comment in nsIRDFDataSource could indicate what calling DoCommand would do to the nsIRDFDataSource implementation. And for most implementations, it doesn't do anything. That justifies moving that functionality out of nsIRDFDataSource. If a datasource of a specific service would want to implement commands, that'd be part of it's contract and the commands and their impact should be stated therein. Whether nsCompositeDataSource should really implement commands, and how, is kinda open. I bet at least DoCommand should check for more return values so that an aggregated datasource could indicated that the command is handled. I am not convinced that there is a general answer how commands and aggregation interact. I see a "ease of use" argument for having composite ds implement commands, though. On the aspect of QIs, there is a good chance to have at least partially dom classinfo for datasources, so js callers wouldn't need to call QI.
> To clarify, I never understood this to be a bug primarily about bloat. Beginning with comment 1 (re-read it if you care to), this bug began with a concern about addressing bloat. > It is about design. A good design is about more than just how things currently work, of course. The architecture should try and plan for the future. For reasons I stated previously, I have heard nothing yet that makes me want to consider factoring RDF's interface. Anyway, from the historical implementation, if one datasource says "no, disallow" a command, the command is rejected. (BTW: To give another example of common vocabularies and aggregation, consider a "delete" command sent across a composite datasource aggregating two other datasources which implement "file" and "ftp" support.)
Ok, the word "bloat" is misleading, and I probably should have used something else. I have partial numbers, though they are not completed. From what I've got so far, it seems likely that this change will be a bloat (conventional sense) win, although a small one -- the amount of code saved from removing dummy Command implementations is greater than the amount of new interface overhead (which is tiny) plus the amount of extra code to QI in a couple of new places. There is also interface bloat and conceptual bloat, which is more what I was thinking when I wrote the sentence. As in: nsIRDFDataSource is a bloated interface. The Command APIs are understood by very few people, are essentially undocumented, are rarely used, and (in my opinion) provide functionality that is not integral to the operation of a datasource. (It seems strange to consider them a crucial part of the RDF experience, when there is -nothing- in rdf/ that actually implements them, except for the Composite.) Splitting them off will make the code slightly smaller and simplify the datasource API, with no loss of functionality. But about the future. [n.b. rjc's usage of "aggregation", as I read it, seems to be in the sense of data aggregation, not XPCOM aggregation. Please let me know if I'm interpreting this wrong.] The scenarios that rjc is describing for possible future use seem like pipe dreams to me. The Commands APIs let you define certain named operations on a datasource; the implementation of a given command must be tailored to the structure of a particular datasource. But more importantly, the semantics of the commands is arbitrary, and defined entirely by the implementor of that datasource. To "aggregate" multiple datasources, each with their own command set, would require that these commands -- by coincidence, perhaps -- have the same semantics. My file system datasource might support a "delete" command that takes 1 argument; Axel's FTP datasource might support a "delete" that took 2 arguments. The meaning of "delete" might not even be the same! Trying to combine these would be a code maintenance nightmare. At the moment, the datasources which actually do implement Commands don't seem to have much in the way of documentation or formalization of the commands they support. If there actually were two datasources in agreement, there's absolutely nothing preventing a code change to one of them that changed the "signature" (how many/what type of parameters were expected in aArguments). This strikes me as an end-run around XPCOM: if datasource implementations Foo and Bar have a shared set of operations that can be performed on them, stick those operations in an interface and sleep soundly with the knowledge that things aren't going to mutate during the night. [I might even go so far as to argue -- contrary to rjc -- that Commands actually make "RDF aggregation" *harder*, by introducing usage-specific semantics.] But in case I'm wrong in my assessment of Commands, I don't even understand how splitting this off precludes anything. The current CompositeDataSourceImpl::IsCommandEnabled logic loops over the datasources and does this: + PRBool enabled = PR_TRUE; + rv = mDatasSources[i]->IsCommandEnabled(aSources, aCommand, aArguments, + &enabled); + if (NS_FAILED(rv) && (rv != NS_ERROR_NOT_IMPLEMENTED)) + return(rv); + if (! enabled) { + *aResult = PR_FALSE; + return NS_OK; + } Because |enabled| is true by default, if an IsCommandEnabled implementation just returns NS_ERROR_NOT_IMPLEMENTED, the composite still considers itself to support the command. It's only in the case that a datasource implements IsCommandEnabled *and* does not support that particular command that |aResult| is set false. Given this behavior, adding a QI to nsIRDFCommandDataSource is harmless. If the QI fails, the composite behaves exactly as if it had found a "stub" implementation that returned NS_ERROR_NOT_IMPLEMENTED. Furthermore, it would standardize the behavior of a gaggle of interfaces that don't support Commands in a variety of different ways -- failure codes, empty enumerators, forward to an inner InMemoryDataSource, etc. I'll post the numbers when I post the patch (I need to test the bookmarks stuff more, and as I'm away for the weekend this will get pushed out a couple days); in the meantime, the only argument against this that I'm really conducive to is that it's a waste of my time. But I will keep my own counsel on that issue.
I'll be interested in seeing what the size of bloat change is... I still expect it to be negligible (as I said in comment 1). :^) Yes, when I say aggregation, I am speaking of RDF aggregation using one or more datasources. > The scenarios that rjc is describing for possible future use seem like pipe dreams to me. Heh! The funny part is, there was a time in the past where this was definitely NOT a pipe dream, it was implemented. We had composite datasources that aggregated all of the following: bookmarks, file, ftp, internet search, and local search datasources! Yes, that is FIVE separate datasources. And, it was done in multiple locations: the bookmarks window, the bookmarks sidebar, the personal toolbar, the "Find in bookmarks" dialog, etc. To be fair, I am not sure of the current state of the world, as I left Netscape in February of this year. When I read "pipe dream", I hear "that'll never happen". However, the truth is this: without the RDF command APIs, it *definitely* won't happen. With the APIs, the possibility is there... and the decision is in the hands of the developer. > I might even go so far as to argue -- contrary to rjc -- that Commands actually make "RDF aggregation" *harder*, by introducing usage-specific semantics. There is really no more difference here than with using RDF resources. There has always been a push to standardize RDF vocabularies, such as Dublin Core. > To "aggregate" multiple datasources, each with their own command set, would require that these commands -- by coincidence, perhaps -- have the same semantics. My file system datasource might support a "delete" command that takes 1 argument; Axel's FTP datasource might support a "delete" that took 2 arguments. The meaning of "delete" might not even be the same! For any one specific command, the list of arguments could be such that one datasource would only respond some of them, and another datasource might respond to a different set. Check out how the bookmark's datasource handles looking for parameters for any given command. While you can term that to be "harder", during the development of the browser that flexibility was essential. Do you feel it is harder because it isn't documented? Sadly, sometimes the documentation is left up to the implementation. Anyway, Chase, your major point that I'd like to address was this: "But in case I'm wrong in my assessment of Commands, I don't even understand how splitting this off precludes anything." Splitting off the commands into an interface doesn't preclude much... but only if you have privileged access to also get to the new command interface. As Axel points out, he wants to see the command APIs on a separate interface so that external datasources *won't* have access to them. That's bad. Re-read comment 6, please. My point is that if Commands are factored out, it becomes a detriment to one of the significant features of RDF: aggregation. Without the command API, RDF is reduced to triplet assertions. There is no other way in the current APIs to express such things as commands acting against multiple resources from multiple data sources (imagine a XUL tree built from RDF data which allows multiple selection, with 2 or more rows selected) with arbitrary parameters as an example. I much prefer an RDF API that allows the type of functionality that we found was needed to write a fully functional product... and I want to enable others in the future to do similiar things.
Robert if you are "not sure of the state of the world" then I really don't see how your objections to splitting this interface are relevant. You haven't had to deal with the RDF code in forever, and are basing your objections on blue sky visions we had very early in the design of this product. We may get there eventually, but right now we have the real world issue of bloat and code maintenance. People who HAVE been working on this product for the last few years all agree that nsIRDFDataSource is an overcomplicated interface - the "overhead" to splitting the interface is minimal (a few extra QI's) but the benefits are great - a clearer interface which separates out actions (commands) from queries, and so forth. The benefits go beyond bloat, they are about readability and maintenance of existing code - stuff that has gone on for the last few years without you. Furthermore, commands can still be "supported" by making the composite datasource behave as though a command was rejected (as you describe, "no, disallow") when a QI fails. As you admit, this does NOT preclude much. All this does is add a finer granularity to the RDF APIs instead of having this enormous monolithic design. Finally, you're trying to justify the existence of commands for this blue sky, and we've got security and policy issues to address right now. If in the future, we start making security decisions about commands, protecting external datasources and so forth, that's where you should be complaining - don't enforce policy via API design. Let this code refactoring happen, and we can discuss the command policy crap somewhere else.
> People who HAVE been working on this product for the last few years all agree Alec: You know, the above statement includes me, thank you very f****** much... and no, we DON'T all agree. Get off your frickin' high horse. Step up to the plate and show hard proof about how this is a wonderful bloat win or quiet down. As RDF module owner (until I can convince Chase to do it! haha), I have the right to question the changes, and the burden is on *you*, not me, to prove their worth.
Whatever. My points have been made, and they didn't include bloat.
"Whatever" indeed, Alec. Since you have very limited experience with RDF aggregation and RDF commands (as evidenced by your statement in comment 5), I'll put the appropriate weight on your "points". Feel free to return to your 'happy little world' of trying to solve the bloat problem one byte at a time. Fortunately, I value Chase's opinions in this area a lot more and will keep an open mind to what patches/data he comes up with regarding bloat, although as he states in comment 9 any bloat win will probably be "a small one" at best. He and I can talk about some other ideas regarding reducing memory footprint. I'll also be talking with Axel out-of-band (as to try and keep the "buzz" to a minimum) of this bug about security issues he is dealing with.
*sigh* looks like I need to reiterate after all. 1) Everyone (Axel, Chase, myself) except Robert has agreed that code bloat is the least of our concerns at this point. That said, adding an interface in and of itself does not INCREASE bloat except by perhaps 4 bytes for the extra vtable. So lets drop this stupid bloat argument.. nobody cares! Robert you seem to be saying that the rest of us are worried about bloat. If anything we're saying that bloat is negligable relative to the value of a clear API. 2) Nobody (except Robert in comment 10) is suggesting dropping the command APIs entirely. We don't need to argue for their existence or not. 2a) [regarding the value of commands, orthogonal to this bug, orthogonal to 2] Nobody has shown an actual reason that the command APIs are actually needed/required for proper RDF aggregation or "fully functional read/write datasources". Even Robert, in comment 4, refers to them as "virtually required" and "practically requir[ed]" in comment 6, but gives no specific reason why they are actually required, in practice. Examples have been given as to how they are useful, but most datasources that are actually used in the product do not in fact use commands. I say this with the authority of my own extensive experience writing the mail folder and message datasources, not to mention years of ownership and maintenance on the global history datasource. As Dave Barry would say, no I'm not making this up. Mozilla is a software product, not a research project, and the use of RDF in practice of making this software product has demonstrated that commands have very little value. (not none, but IMO, less than has been argued here) Most actions taken on a datasource are either proprietary to that datasource (And thus are semantically clearer when using a seperate interface anyway) or are expressible with the "base" API in nsIRDFDataSource such as Unassert, Move, and Change. I would actually go so far as to say that these methods should be in an nsIRDFMutableDataSource, but we don't need to argue that in this bug. 3) It is common practice to keep "public" interfaces fairly small (lets say 3-10 methods) because the overhead of an additional interface is small relative to the value of having a well partitioned API. The value of such a well partitioned API, in my personal opionion, is that it provides a logical grouping which makes the API both easier to learn, and makes client code easier to understand, because the semantics of the particular interface in question are implied by the name of the interface pointer in use. In addition, it provides syntactic enforcement of semantics of an API. The "value" is not a number like bloat. The "value" is a conceptual advantage of one API design over another. I have been involved with the freezing of many public interfaces in the last 2-3 years and this is a well understood value amongst those who have been leading the interface-freezing effort. [For example: nsIArray and nsIMutableArray provide obvious semantic and syntactic indications of when an array should and should not be modified. The compiler enforces that you can't call AppendElement() on an nsIArray. Also see nsIProperties and nsIPersistentProperties. Another interesting one is nsIChromeRegistry and nsIXULChromeRegistry. If you still aren't bored, see nsIGlobalHistory and nsIBrowserHistory. But wait! There's also nsIDocShell, nsIWebShell and nsIWebNavigation.]
Just a quick note: just as nsIXULChromeRegistry inherits from nsIChromeRegistry, the "RDF command datasource" or whatever you want to call it can inherit from nsIRDFDataSource, saving the (minimal) cost of the extra vtable, plus making certain kinds of uses even less painful.
1) my point from the get-go on this bug has been that bloat isn't a factor for change here 2) I have never suggested dropped the command APIs in any way > I say this with the authority of my own extensive experience Your "authority of extensive experience"? No, Alec, I'm afraid not. In fact, you have basically no experience writing a datasource that takes advantage of RDF aggregation, I believe... Mail/News doesn't. In comparison, I have written or helped write all of the following (most from the "get-go"): bookmarks, file, ftp, local search, internet search, global history, and related links... as well as other datasources that were written for investigation of possible new functionality both years ago as well as within the last year, including Appletalk (around five years ago) and LDAP-based shared bookmarks (within the last year). Many of those datasources have been shown and implemented at various points to function together via RDF aggregation. I tire of giving you examples. If I don't, you point out how I haven't. If I do, you try out different tactics such as "oh, that's blue sky work". So, in the end, I feel little need to try and "prove" this area to you. 3) Taking about API design at this stage of the game is several years too late to be very interesting without any REAL measureable benefit.
lets try to move on some common grounds here: There are issues with commands. They are heavily underspecified, and the implementations do whatever they came up with. Whatever works, really does so by coincidence. I adjust the topic to make this bug more about fixing the issues than about religious wars on where to put stuff. That should come out naturally in the end. Scenario for specifying the behaviour: D aggregated datasources, R members of the aSources, A members of aArguments. The specification for commands in a native service should drop out for D=1. Issue 1: is a command really a resource? It could be a literal, a atom, even a string, I'd say. I guess this is "pointers are easier to check, get a unique one", so it may be whatever. I don't see an immediate reason to change this from a resource, just wanna make sure the list is as complete as possible. Issue 2: Error handling. There are D x R x A places where an error can occur. Which errors stop the execution of the command sequence, which don't. A bad parameter may or may not be fatal, and out of memory probably is. Is a disabled command a fatal error or just ignorable? This would not be a problem if we didn't have aggregation of both datasources and sources. Note that any fatal error leaves the datasource in a partially modified state. Issue 3: GetAllCmds doesn't take the parameters and may yield results that conflict with DoCommand or IsCommandEnabled, as the latter depend on the arguments. Issue 4: Aggregation is not specified. - DoCommand: DoCommand is called on each aggregated datasource. Success is required. Bug, disabled commands do what? There is no example in the code, but this isn't handled at all. This may want to call into IsCommandEnabled. - IsCommandEnabled: CompositeDataSourceImpl currently expects that each data source has the command enabled. I'd vote the other way around. I would expect that one can argue both ways, sadly enough, we can't make it implementation dependent. It's not at all clear how to force that into one bool, as there are at least "enabled", "disabled" and "unknown" for each datasource and source. - GetAllCmds: This currently just piles the commands, which may be ok. Issue 2 is the severest one IMHO. There is hardly alot of use for having the source as array, as the calling js needs to create the array in the first place. Having commands as batches does look like it would give you better performance for serialising changes back to disk or so, but you should do that with a timer anyway, see bookmarks. Thus, I see good reasons to get away from the array in the first param. That should ease some of the problems with Issue 4, too. On the question of ease-of-use and interfaces. If a js datasource would not implement these methods, the composite ds or any C++ wouldn't get an NS_ERROR_NOT_IMPLEMENTED, but the conversion of the "is not a function" exception. Not sure what error code that is right now, dbradley assured me that it'd not be NOT_IMPLEMENTED. Wherever and however commands end up, we should make not implementing commands as easy and straightforward as possible for js.
Summary: Split off nsIRDFDataSource Command goop into its own interface → specify, document and fix RDF Commands goop
Robert? One thing that I think should be added is that, AFAICT, only synchronous commands can be implemented. (Which is why I never used them in the first place.)
I saw that mailnews actually uses rdf commands on composite datasources. Sadly enough, commands are not implemented consistently in mailnews either. And the comments don't fit to the code and all. CCing Scott, someone over there should be able to clarify the expected behaviour. Scott, this bug morphed a few times, so you may wanna read from the bottom up. Unspecified aspects for rdf commands AFAICT are listed in comment #18
now that we have a new RDF owner, what are the chances of getting RDF commands cleaned up/removed?
alecf: see http://www.axel-hecht.de/mozwiki/RdfDataSource and friends. Please comment ;) I am especially interested in your thoughts on the triplevisitor interface, instead of the current enumeration interfaces.
QA Contact: rdf
If we haven't already removed commands, that is the only sane option.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.