Closed Bug 37275 Opened 24 years ago Closed 24 years ago

fix the progids

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: jband_mozilla, Assigned: rayw)

References

Details

(Whiteboard: [nsbeta3+])

We should convert the existing progids to have the form:

"unique_module_name.unique_name_in_module.number"

I do *not* think that they ought all start with "mozilla".

They certaly should not have URL syntax.

We need to do this sooner rather than later. The progid is a part of the 
contract of the component forever.

We need to get rid of the funky overloading of the progid to do dispatching:
"component://some_component_that_does_dispatching?additional_info"

This is not appropriate.

We should use the category manager for that. Some dispatch function is free is 
register as many progids as it wants in order to accomplish its funky thing.

Besides the stuff like this that rdf does, look at sspitzer's comment in 
http://bugzilla.mozilla.org/show_bug.cgi?id=4263 He has made the commandline arg 
dispatcher require that services which want to be started from the command line 
must declare a screwy progid.

Do we really want to do this?
it's not that screwy.

a progid of that for is is currently required, because basically doing this.

progID = "component://netscape/commandlinehandler/general-startup";
progID += <your command line argument>; /* example, "-jsconsole"
nsCOMPtr <nsICmdLineHandler> handler = do_GetService(progID,&rv);

we do tricks like that all over the place.

I can fix my code to play nice with 
"unique_module_name.unique_name_in_module.number", and make it so all the 
progIDs for all commandline handlers have the same format.

"commandlinehandler.jsconsole.1"

or what ever.  same trick.  different way of building the progID.
I'm wondering if we should try to
1) prefix some of them with "mozilla" or something - since something like the 
command line service is specific to mozilla, but would not be used by another 
consumer of XPCOM
2) however we do the initial prefix, it would be excellant if we could sync this 
up with the components and sub-componets as defined by chrome, such as 
global, communicator, communicator/prefs and so forth. (because chrome is 
basically the only other place that we distinguish between parts of the project 
in a text string)
I do have to agree with jband though - I think progids should be atomic strings 
that are not "dynamic" - I'm guilty of doing this myself simply because we 
didn't have categories up until recently.
But I've posted to .xpcom before and said that any time you have to do a dynamic 
progid, you should stop, and think categories instead.
I think we should dump the url syntax fast and clean.

The "two dot" scheme is what MS uses. I think the trailing number has 
potential for *great* future versioning benefit.

However, it has been suggested that the limitation of only two identifiers is 
unnecessary. I agree. Likelyhood of uniqueness is important. Simple dot 
separated identifiers with a trailing number is good. We should allow (and 
encourage?):

 org.mozilla.xpconnect.runtime.1
..and...
 com.evilmen.really_evil_men.evil_module.bfg.1
In the meantime several components use a mozilla.<name>.1 scheme already. One
with moz:
mozilla.consoleservice.1
moz.jsloader.1
mozilla.scripterror.1
mozilla.jssample.1
mozilla.categorymanager.1

I like the Javaesque scheme. I have been using digitalcreations.* for now, but I
like com.digicool.* or org.zope.* better.
Do you really need the TLD at the front of the progID? It doesn't convey much 
information and is not likely to be useful in preventing collisions.

One thing I have always disliked about Java's package syntax is that "." is too 
overloaded. I would have preferred
<dotted-package-name>::<class-name> [ ::<static-member> ]
But the word was that it reminded Gosling too strongly of C++ :-).

I think that it would be well worthwhile syntactically separating the 
"installation granularity" component name from other information. The version 
should be distinct too. So how about, say,
evilmen.really_evil_men.evil_module:bfg.powerpack,1
i.e.
<dotted-package-name>:<dotted-component-name>,<version>
We need some rules. Mostly, the particular syntax does not matter - it just 
needs to be consistent.

- I'd like to have the identifier subparts be legal identifiers in most 
languages.

- I doubt that anyone will want or need to parse the different subparts *except* 
to distinguish the trailing number. I see nothing wrong with using all dots. I 
think that having different puncuation to mean different things is not helpful 
and just sets us up for more instances of non-compliance and more likelyhood of 
parsers confused by their real-world input. I just don't see benefit in what 
roc+moz@cs.cmu.edu suggested.

- The Javaish "com.company.division.module.class.number" scheme is clear and 
based on a commonly understood convention. One of the rules of 
particular importance is that if one adds a progid into such a namesspace that 
one had better have approval of the 'owner' of the namespace. there is a lot of 
risk that novices will add "mozilla.mymodule.1" (cuz' we're *all* mozilla 
contributors). A longer more explicit name will help, but stressing the point 
that people need to make up their own names and not inject themselves into 
organizations is important.

We should *soon* implement code in the component manager's ProgIDToCLSID routine 
to allow translation with no trailing number. i.e. Given that foo.bar.1 and 
foo.bar.2 and foo.bar.8 are registered then is omeone asks for foo.bar they 
should get the CLSID for foo.bar.8 .  I think most clients will not need to 
specify the number if we do things right.
If the "package name" is clearly identified within the progID, then you can 
trivially map the progID to a package. I thought this might be useful if progIDs 
need to interact with an installer/package manager, or with other package-aware 
subsystems (e.g. chrome). Maybe there won't be any such interaction, but why bet 
against it?

If the version number is separated out within the progID, by say a comma, then 
you can later introduce dotted version numbers if you decide they're useful.

> Given that foo.bar.1 and foo.bar.2 and foo.bar.8 are registered then is omeone
> asks for foo.bar they hould get the CLSID for foo.bar.8

Maybe, but there's a big iceberg lurking here: release to release, bug for bug 
compatibility is too hard and does not work. If component A has been tested and 
debugged with version 5 of component B, and works well with it, then you often 
don't want it to start using B v6 when that suddenly appears. There are a number 
of interesting documents at Microsoft about this.

In my opinion, when someone asks for a component from a progID, they should 
provide their own progID as well. That would allow the component manager to use 
a flexible policy, from "always use latest version" to "use latest version known 
to work according to the latest published compatibility matrix".
Since clients are asking for progids of particular versions, I think that 
major.minor version numbers are too messy and error prone. I don't 
think we should suggest or allow for anything other than integer version 
numbers. KISS

We should decide the issue of "closest matching". There is something to be said 
for the form "foo.bar.v1" so that all subparts are valid identifiers in many 
languages. But, I still think that language mappings (e.g. xpconnect) should not 
try to let the user represent the ProgID as anything other than a quoted 
string. So, "v1" as opposed to "1" is not really a plus (just thought I'd bring 
it up - though I'd vote against it).

I'm really voting for minimal divergence from the MS model. I like the removal 
of the "two dot" restriction and a convention of reverse domain name, etc. But 
otherwise I see little to gain from divergence.
I'd just like to add my agreement with the objection to toplevel domains in
progids (i.e. org., com., etc) - these provide no value other than to
distinguish com.foo and org.foo, a collision I highly doubt will happen often.

.. not to mention that "com" could be confused with XPCOM and "net" could be
confused with necko (network library) to the fresh developer.

I also like the idea of delimiting the object name from the version with
different character, such as ',' - mostly because of what jband just said.. 
mainly I'm worried about someone stupidly naming their component foo.bar when
there is already a foo.bar.1 - then when someone asks for "foo.bar" do we give
them foo.bar.1 or foo.bar? foo.bar is an exact match, but foo.bar.1 has a later
version number if you think about it that way
What happens if someone makes a foo.bar.goo or foo.bar.4.goo? what happens then
when someone asks for foo.bar? 

if we said that it's comma delimited, it seems like we could do some better
tests to at least see if the comma is there, so the above objects would be
foo.bar,1 (foo.bar version 1)
foo.bar (foo.bar no comma => no version)
foo.bar.goo (foo.bar.goo no comman => no version)
foo.bar.4.goo (foo.bar.4.goo no comma => no version)
foo.bar.5.goo,2  (foo.bar.5.goo version 2)
Depends on: 36666
While I'm all for improving on COM, the reason for doing things differently in
this case isn't at all clear to me after reading all this.

Specifically, what needs to MS-style progids not address?
Ray, this is a good thing to fix.

I am with jband's just stick with COM's progid semantics. It is easy.
Assignee: dp → rayw
then we should not map a.b.c to a.b.c.1, it should be a straight strcmp
Changing Platform and OS to "All", since this is obviously an XP issue.

Also see bug 26013, which is either a dup or a dependency.
OS: Windows NT → All
Hardware: PC → All
I think we should set the precedent of some standard prefix to the module name.  
It doesn't have to be long, like mozilla-.  It could be short, like MZ- or NS-.  
It looks a bit uglier, but is much cleaner for the long term to identify who 
introduced a progid and prevent collisions.  Otherwise, we have to know all of 
the simple names that might be in use by small sets of users who will be really 
up in arms if we collide, and may even try to preemptively define, start using, 
and distribute simple names.
Um, can we please *not* resort to cryptic initialisms? When you talk about
remembering words versus remembering abbreviated forms, remembering words is
almost always easier for most humans.

I don't see how words vs. initials is going to do anything to solve the
collision problem. That problem probably needs to be solved by creating an
official repository of progIDs in which new progIDs can be registered.
A central repository does not handle the problem well. There is a social problem 
- people will ignore a central repository or be ignorant of its existence. The 
whole point of a reverse domain name convention is to avoid the need for a 
central repository.
The reverse domain name system does not avoid the need for a central repository,
it just leverages the use of an existing repository. There are a few quirky
problems with that--like what happens when a domain changes ownership, what if
someone doesn't want to affiliate their work with a domain name--but nothing too
major.

And as alecf notes, the need to differentiate between com, net, org, and the
like is debatable. So if we whack that part off, then we're back to the MS
style.

I guess, in essence, you're contending that having the org, net, com, etc. there
at the front of the identifier makes it more likely that people will "get the
idea" and go to some effort to make their progIDs unique. I'm not sure if I
agree or not. The reverse domain name convention certainly has a rather spotty
following where it has been proclaimed for Java package naming.
"And as alecf notes, the need to differentiate between com, net, org, and the
like is debatable. So if we whack that part off, then we're back to the MS
style."

Not quite -- there would be at least three dots and four components: 
mozilla.library.class.version (or sun.library.class.version, e.g.).  Or more, if 
you were at self.sunlabs.com and wanted to distinguish yourself from some other 
part of that empire via sunlabs.self.library.class.version.  MS has only 
library.class.version, IIRC.

I like conventions that cause people to "get the idea", too -- memes can be good 
and powerful.  Isn't the objection of some Java developers to the reverse FQDN 
based on long-winded-ness in the package statement of every single Java class?  
See http://www.flex-compiler.lcs.mit.edu/Harpoon/HOWTO/code-conventions.html, 
which anyway puts the Flex team's rejection of the Sun standard off to their 
being "too lazy".

Braden, can you comment on what other reasons people give for not following the 
official Java convention?  Also, if the objections arise from peculiarities of 
the Java language, perhaps we can make distinctions based on where and how often 
progids will be used.

I'd prefer shorter prog-ids too, all things equal.  But since collisions are a 
worry, and given that we don't believe yet another central registering authority 
is the right solution, why not leverage the almost fully-qualified domain name?

Separate comment: I agree with alecf that unless we use , or another character 
than . to separate version from what comes before, *and* we allow an arbitrary 
number of non-version parts > 2, then we are asking for trouble by implicitly 
mapping some.long.prog.id to some.long.prog.id.8.

/be

/be
First, forget that I, as a compromise, suggested an abbreviation (unless someone 
again suggests having no ownership prefix at all).

Second, I searched for COM progids on the internet and in a Windows registry.  
They did not seem to follow a single standard, and often clear ownership was not 
there.

Third, The java scheme does solve contentious issues such as, "who gets to use 
<some overloaded term or trademark>".  I am aware of a few cases where it is not 
followed in current practice.  Where it is not followed, the deviant example 
still usually identifies the ownership, avoids collision, and implicitly assumes 
responsibility for collisions caused by not following the standard.

Fourth, The (major) version number is good in some cases, but version can often 
be a function of the module, rather than of the component, which might make 
alternative forms preferred in other cases.  I doubt that automatic matching 
between versionless and versioned names is a good idea, when we can let this 
sort of thing be solved in a mostly module-specific way at registration time by 
having extra progids, thus keeping better flexibility by avoiding hard-coding a 
magic interpretation of dot-delimited parts at all.  On the other hand, it would 
seem important to understand the versioning that is likely to occur in the 
future on existing modules, as it is likely to affect the immediate names we 
choose.

Fifth, does anyone have an opinion on how to deal with conflict resolution when 
multiple dlls supply an implementation of the same progid (more likely to occur 
over time than initially).  Who wins?  Is it whoever registers last, or does the 
second registration attempt fail?  Either seems somewhat problematic for the 
user, even if they are easy to program.  Do we need something more?  For 
example, should related registrations in the same dll be transactional -- all or 
nothing?
ProgIDs should uniquely identify an implementation so there is never an issue of 
multiple DLLs providing the same ProgID.
rayw: There exist conventions for progIDs, and they are described in the
standard COM references (i.e., Box, Rogerson). It is true that they are not
always followed. Regardless of what convention is decided on here, expect that
trend to extend to XPCOM progIDs.

It's my understanding that the versionless -> versioned mapping described here
is reflected in the COM conventions. For that reason, I think it is safe and
sensible to incorporate it into XPCOM as well. I don't see why the version
number should ever be a function of the module. If I'm requesting an interface,
there's no reason I need to care about the module's version as long as the
components have been versioned correctly. And I agree with those who've
suggested we stick to a single integer version number. That eliminates ambiguity
about when and how the number should be changed: if you changed the component,
you increment the version before you release it. Simple.

roc: Should, yes. Must, no. CIDs provide the universal uniqueness requirement.
In the progID namespace, it is up to the individual component implementors to
make sure collisions don't happen. In the event of a collision, which CID gets
returned for the duplicated progID is undefined: it's up to the implementation
(XPCOM). So it's in component implementors' interest to take it upon themselves
to ensure they aren't stepping on anyone's toes.

In light of this, I would suggest that the implementation randomize which CID is
returned in the situation where multiple components on the system have the same
progID. That way we can't have a situation where one component implementor uses
XPCOM's behavior to step on another component implementor's toes.
I'd like to add that the only *really* good reason I can see to even *have*
progIDs is to be able to request the "latest version available" of a component.
Otherwise, I might as well just request it by its CID.
roc: 
No, in our system *CLSID*s are supposed to uniquely identify an implemention, 
not so for ProgIDs. It is reasonable for new-and-improved implementations to 
register themselves using the ProgID previous registered for some 
other implementation. Doing so implies a promise of full compatability from the 
client point of view. We *expect* that this will occur.

To rayw's question five: I would say that the last one to register wins. Unless 
you go back to verify the existence of previous registrants, the newest one is 
the one most likely to still exist next time the app runs! There is 
also the assumption that the module is being intentionally added 
into the system. Unfortunately we currently make no promises about registration 
order when a full re-do of autoregistration happens. This is a problem.

FWIW, the xpti system I recently implemented (for .xpt files) has clear rules 
for registration ordering that take into account previous registrations first 
and then file type, size, and name to make for a predicatable ordering of 
registration. It places more trust in older (presumably working) files than new 
ones. However, that system is different because it works with files of interface 
type data that is not *supposed* to change. So changed information about a given 
interface is suspect. This is not the case here where we are mapping key strings 
to prefered component factories.
here is my reason for opposing mozilla.rdf.parser.1 instead of rdf.parser.1

Progid is used by many applications. There could be many vendors implementing 
the progid using their own CIDs. An application if it cares that it wants the 
implementation from mozilla or SUN should use the CID. As long as the SUNs 
implementation and mozilla's implementation are compatible, they say "I do 
rdf.parser" progid. That is whole idea behind having the progid.

Uniqueness is an issue. A secondary one to the primary issue of sharing. If 
component vendors start marking progids with their trademark, then sharing will 
be lost. Since MS lives with this, I vote for MS style.

dp: That kind of transparency is clearly needed for interfaces. I do not see
that it is needed for components. The fact that I am even requesting a component
says that I want *some particular party's" implementation. The level of
genericness you're suggesting would appear to make such a request impossible to
fulfill reliably. And is the fallout from your claim that it would just be okay
to have two rdf.parser.1's on the system, or would we have different vendors for
the same component having to make sure their version number is coordinated with
all the other vendors for that component?

You say "MS lives with this". Well, looking at my Windows registry, I have a
*whole lot* of progIDs that begin with something like either "Microsoft" or
"MS". Apparently Microsoft has found it necessary to put their vendor name in
the progID even in the absense of a dot separator for it. So do we want
"mozillaRdf.parser.1", or "mozilla.rdf.parser.1"? I prefer the latter. In the
event that someone *is* interested in the component name independent of the
vendor name, the latter form makes that information easy to get.
branden: "*some particular party's implementation" Isnt that defined well by 
the CID. If so, then why a progid ?
dp: The CID corresponds to a single implemetation. This will be a specific
version. The progID simply provides a means for requesting the latest version.
If multiple vendors are thrown into the mix, then you also need to provide a
means of specifying *whose* latest version. I really think we don't want to go
there.

brendan: Right--not quite back to the MS convention; I misspoke. I do favor
starting with the vendor name. And no, I don't know of any reasons for not
following the Sun convention other than laziness or ignorance. Whether or not we
stick org/com/net/edu/gub/etc. on the front is not something I have strong
feelings about either way. I personally don't think it's necessary, but if folks
here think it would make the convention more likely to be followed, then we
might as well do it.

Exactly what trouble are we asking for by mapping a request for
"some.long.prog.id" to "some.long.prog.id.8"? Doesn't MS do it this way? If we
don't have a way for clients to request the "latest version" of a component, why
have progIDs at all? Why not just use CIDs and dispense with this whole
"potential collision" mess?
Whatever name convention we end up with for ProgIDs - and even if the version 
number magic does not happen - they also serve the purpose of allowing a 
reasonable way to impersonate and superceed components without the 
drastic step of reusing the CLSID. They provide a useful level of indirection 
and a human readable form. I disagree with a "we don't need them if they don't 
do X" argument. It would be confusing if we had to call each US chief executive 
"George Washington". One level of indirection is good here. Even without number 
magic a new implementation might register as the component that does "foo.bar.2" 
AND "foo.bar.1".

If we stick with two dots, then lets fix any place with a leading "mozilla" and 
encourage people to lead with their - presuably unique - module name.
*** Bug 26013 has been marked as a duplicate of this bug. ***
Braden, regarding what trouble brews if we allow a.b.c, a.b.c.1, a.b.c.d, and
a.b.c.d.e.f.3:

alecf has sketched some confusing hypothetical cases in this bug already, but 
maybe they're too stupid to live in the real world.

Formally, the extension to MS-COM that allows for an arbitrary number of dotted 
name parts makes for ambiguities in deciding what a progid names (exact vs. 
latest version) without also looking in the registry to see what's installed and 
registered; i.e., it makes purely syntactic analysis (no semantic feedback) 
useless.  Big whoop, I suppose.

Dot for version separator is not how I would have done things, but I admit MS 
COM's progid precedent weighs heavily upon us.

Everyone: I'm posting a summary of the comments in this bug, preceded by my 
shorter summary of open issues and my recommendations and rationales, to 
mozilla.xpcom. Can we take this to the newsgroup?  I really want message 
threading and a better composition UI than this damn six-line textarea, don't 
you guys too?

/be
I should add, hastily, that my formal problem with a.b.c being syntatically
ambiguous supposes that c could be a version name or a class name (a.b.c would 
mean "the latest version of class a.b.c").  But if we restrict version names to
be natural numbers, *and* if we require all other name parts to conform to the 
usual lexical rules for identifiers in modern C-like programming languages (or 
something like that), then there is no syntactic ambiguity.  Your syntax parser 
just has to classify natural number literals differently from identifiers.  Duh!

So I'm not really bugged by overloaded-dot, to bring this back to a very early 
roc+moz@cs.cmu.edu comment.  We aren't starting from a clean slate.  We could do 
worse than to extend MS-COM progids in XP-COM in a way that, as Braden showed 
by his perusal of his Windows registry, just codifies a first-part naming 
convention that MS itself uses.

/be
brendan: I think the potential syntax constraints you've described are very
sensible. But I think that if we establish and document a convention for
versioned and versionless progIDs, it's fair to say "you takes you chances" to
people who don't follow the convention. That said, I don't favor an arbitrary
number of dotted parts. I think we should decide on what resolution we think is
necessary and codify it.

To summarize my position... MS COM uses 2 parts plus an optional version number.
However, Microsoft's (and some other vendors') practice of prepending the first
part with characters signifying the vendor name suggests utility in a distinct
3rd part for the vendor name.
*** Bug 35547 has been marked as a duplicate of this bug. ***
Blocks: 35548
I have posted a list to the newsgroups of the progids from the registry that 
would have to change.  It is not clear to me that the proposed 4-part name is 
adequate.  I will wait for a response.
Status: NEW → ASSIGNED
Target Milestone: --- → M18
Per beta2 PDT mtg, this will be needed for nsbeta3.  Adding that keyword.
Keywords: nsbeta3
Making nsbeta3+ status.
Whiteboard: [nsbeta3+]
wasn't this the carpool that went in last night?
Yes.  It is done (except for the related filters bug which is being fixed now).
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.