Open Bug 502685 Opened 15 years ago Updated 2 years ago

Add method documentation for the PL_Hash* methods

Categories

(NSPR :: NSPR, defect)

defect

Tracking

(Not tracked)

People

(Reporter: Waldo, Unassigned)

Details

Attachments

(1 file, 3 obsolete files)

Most are reasonably self-explanatory, but a few (the raw methods) are less obvious.
Attached patch Add docs (obsolete) — Splinter Review
js/src/jshash seems to have forked, and this will need to be upstreamed anyway, so I'm only touching NSPR's copy for now.
Attachment #387352 - Flags: review?(ted.mielczarek)
Attachment #387352 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 387352 [details] [diff] [review]
Add docs

Looks sensible. I don't have time to check this in for you right now unfortunately.
Comment on attachment 387352 [details] [diff] [review]
Add docs

I will SR this patch.  
I've glanced at it and have some initial comments.

1. The subject of this patch is added documentation, but this patch also 
modifies plhash.c and the modifications to that file appear to be unrelated to 
docummentation, so I will assume that it is accidental, and not related to 
this bug.

2. One new comment says, in part:

>+ * keys and values.  allocOps enables support for custom memory-management
>+ * strategies for the table and its entries; if allocOps is null, use default
>+ * allocator ops built on top of malloc().  

It's actually built on top of PR_Malloc, which sometimes uses malloc underneath 
and at other times uses another allocator.  I think it is best not to document
what allocator the default alloc ops use, so that we retain flexibility there.
But if we were to document that, it would be incorrect to say "on top of malloc".  It's wrong to mix PR_Malloc and PR_Free with malloc and free.  
Things allocated with PR_Malloc must be allocated with PR_Free, and things 
allocated with malloc must be freed with free.  

I'll complete this review later.
Attachment #387352 - Flags: superreview?(nelson)
(In reply to comment #3)
> 1. The subject of this patch is added documentation, but this patch also 
> modifies plhash.c and the modifications to that file appear to be unrelated
> to docummentation, so I will assume that it is accidental, and not related to
> this bug.

Not accidental, incidental.  The added assertion was gleaned while I worked at writing the method's documentation; the whitespace fixes were simply because I have a tab-indent setting different from what most people have (and my editor doesn't happen to respect modelines for tab display), and the two lines were displaying in a confusing manner.


> 2. One new comment says, in part:
> 
> >+ * keys and values.  allocOps enables support for custom memory-management
> >+ * strategies for the table and its entries; if allocOps is null, use default
> >+ * allocator ops built on top of malloc().  
> 
> It's actually built on top of PR_Malloc, which sometimes uses malloc
> underneath and at other times uses another allocator.  I think it is best not
> to document what allocator the default alloc ops use, so that we retain
> flexibility there.
> But if we were to document that, it would be incorrect to say "on top of
> malloc".  It's wrong to mix PR_Malloc and PR_Free with malloc and free.  
> Things allocated with PR_Malloc must be allocated with PR_Free, and things 
> allocated with malloc must be freed with free.  

Fair point, should have noted it myself, but do note that that comment was pre-existing, just organized a little differently within an expanded comment.
Comment on attachment 387352 [details] [diff] [review]
Add docs

I have a number of requests for changes or additional explanation.

>+ * Associates the provided value with the key in the hash table, returning the
>+ * entry in the table for the pair.

That seems to be saying "There is a key in the hash table, and this call associates the provided value with that key.".  But we know that's not right.
I suggest something like "Creates an association in the hash table between the 
provided "key" and the provided "value".  

> PR_EXTERN(PLHashEntry *)
> PL_HashTableAdd(PLHashTable *ht, const void *key, void *value);

The documentation should also say something about how the length of the "key"
and "value" are determined, since that is not explicitly provided in this call.
(This is perhaps the most inobvious part of the whole scheme.)

The documentation should say whether the function makes its own copy of the 
"key" and "value" values so that the caller is free to destroy the copies it
passed in as soon as the call returns, or whether the caller is obligated to 
preserve the memory containing those original copies for the lifetime of the 
hash table entry thus created.

>+ *   This function invalidates all existing pointers to canonical pointers to
>+ *   entries returned by the RawLookup functions.

Let's find a better term than "canonical".  Maybe "predecessor"?  Also, let's 
distinguish between "pointer to" and "address of".  This sentence might become:
  "This function invalidates all existing addresses of predecessor's pointers 
   to entries returned ..."
This change, or a similar one, should be made to ALL places where that phrase
"pointer(s) to canonical pointer(s)" is found.

>+ * Returns the value associated with the given key in the hash table, or NULL if
>+ * there was no association.  

It returns the address of the value.  Also, it should be explained that this is 
not a new copy of the value.  The caller is obliged to not modify the value nor 
destroy it.  Some explanation should be given for how to determine when the 
returned value is no longer valid, i.e. for how long after the call returns that 
the caller may presume that the value at the returned address remains valid. 

Since this text is repeated in several places, these corrections must be made to
every copy. 

>+ * rv is a bitwise or of a non-empty subset of the following values,determining
>+ * what action will subsequently be taken:
>+ *
>+ *  HT_ENUMERATE_NEXT
>+ *    Continue enumeration with the next entry in the hash table.
>+ *  HT_ENUMERATE_UNHASH
>+ *    If the value includes HT_ENUMERATE_UNHASH, the entry that was passed to f
>+ *    is spliced out of the chain in which it resides in the hash table.  The
>+ *    entry is *not* deallocated; it must be set aside in some manner so that it
>+ *    can be properly deallocated at some future time.
>+ *  HT_ENUMERATE_REMOVE
>+ *    Same as HT_ENUMERATE_UNHASH, except that the entry is set aside to be
>+ *    deallocated when enumeration is finished and before this method returns.
>+ *  HT_ENUMERATE_STOP
>+ *    Stop enumerating entries once all other HT_ENUMERATE_* bits are
>+ *    processed.

Are all combinations of these flags valid?  

What happens if both _UNHASH and _REMOVE are 1?

Perhaps it is worthwhile to explain what happens if HT_ENUMERATE_NEXT is 0
and either _UNHASH or _REMOVE is non-zero.


>+ * Returns the pointer to the "canonical" pointer to the entry whose value
>+ * corresponds to keyHash and key.  This pointer might be to the first pointer
>+ * in a bucket chain, or it might be to a "next" pointer within a chain.  If a

I would suggest:
  Returns the address of the predecessor's pointer to the entry whose value 
  corresponds to keyHash and key.  The predecessor's pointer might be in the 
  head of a chain, pointing to the first entry in the chain, or it might be 
  the "next" pointer in another table entry in the chain.

I would also suggest putting that definition of "predecessor's pointer" near the 
top of the file, and then referring to it, rather than copying it in many places.  

>+ * value corresponding to keyHash and key was found, the entry for that value
>+ * will be the result of dereferencing the returned pointer.  If no such value
>+ * was found, the result of dereferencing the returned pointer will be NULL.

Under what circumstances will this function return NULL?  

>+ * Adds the given key-value correspondence to the hash table, and place the
>+ * canonical pointer to it 

make that "place the address of the new hash table entry"

                               in the location designated by hep (which must have
>+ * been returned by one of the other hash table methods and which must not have
>+ * been invalidated), unless the hash table is full and needs to be rehashed, in
>+ * which case hep is ignored.

>+ * Removes the given key-value correspondence from the hash table.  hep must be
>+ * a valid pointer to the canonical pointer to the given entry, as returned from
>+ * one of the RawLookup functions, and it must be the case that *hep == he.

or else what?
Attachment #387352 - Flags: superreview?(nelson) → superreview-
(In reply to comment #5)
> (From update of attachment 387352 [details] [diff] [review])
> I suggest something like "Creates an association in the hash table between
> the provided "key" and the provided "value".  

That's reasonable, with slight tweaks to word ordering.


> > PR_EXTERN(PLHashEntry *)
> > PL_HashTableAdd(PLHashTable *ht, const void *key, void *value);
> 
> The documentation should also say something about how the length of the "key"
> and "value" are determined, since that is not explicitly provided in this
> call. (This is perhaps the most inobvious part of the whole scheme.)
>
> The documentation should say whether the function makes its own copy of the 
> "key" and "value" values so that the caller is free to destroy the copies it
> passed in as soon as the call returns, or whether the caller is obligated to 
> preserve the memory containing those original copies for the lifetime of the 
> hash table entry thus created.

I don't see how this can even be described here; this is entirely dependent upon how the user-provided hash and compare functions interpret the generic pointers upon which they and the hash table operate.  This is a problem for the user to understand in his own code -- and if he can't do that, well, he probably shouldn't be riding the NSPR train, to borrow a saying from SpiderMonkey.


> >+ *   This function invalidates all existing pointers to canonical pointers to
> >+ *   entries returned by the RawLookup functions.
> 
> Let's find a better term than "canonical".  Maybe "predecessor"?  Also,
> let's distinguish between "pointer to" and "address of".  This sentence might
> become:
>   "This function invalidates all existing addresses of predecessor's pointers 
>    to entries returned ..."
> This change, or a similar one, should be made to ALL places where that phrase
> "pointer(s) to canonical pointer(s)" is found.

Hmm, predecessor is confusing to me.  Here's the nomenclature I propose: the RawLookup functions return *locations* containing pointers to entries.  In retrospect the "canonical" distinction is not actually necessary to the interface, only how such pointers behave and may be used, so long as it is made clear when the returned locations remain valid.


> >+ * Returns the value associated with the given key in the hash table, or NULL if
> >+ * there was no association.  
> 
> It returns the address of the value.

Incorrect, it returns PLHashEntry.value, which is a void* which per PL_HashTableRawAdd, and which is the value passed into that method, not a location which contains it.


> Also, it should be explained that this
> is not a new copy of the value.  The caller is obliged to not modify the
> value nor destroy it.  Some explanation should be given for how to determine
> when the returned value is no longer valid, i.e. for how long after the call
> returns that the caller may presume that the value at the returned address
> remains valid. 

The interpretation of the void* values in the hash table API is, again, a detail left to the user to determine, so its exposition here is impossible.


> >+ * rv is a bitwise or of a non-empty subset of the following values,determining
> >+ * what action will subsequently be taken:
> >+ *
> >+ *  HT_ENUMERATE_NEXT
> >+ *    Continue enumeration with the next entry in the hash table.
> >+ *  HT_ENUMERATE_UNHASH
> >+ *    If the value includes HT_ENUMERATE_UNHASH, the entry that was passed to f
> >+ *    is spliced out of the chain in which it resides in the hash table.  The
> >+ *    entry is *not* deallocated; it must be set aside in some manner so that it
> >+ *    can be properly deallocated at some future time.
> >+ *  HT_ENUMERATE_REMOVE
> >+ *    Same as HT_ENUMERATE_UNHASH, except that the entry is set aside to be
> >+ *    deallocated when enumeration is finished and before this method returns.
> >+ *  HT_ENUMERATE_STOP
> >+ *    Stop enumerating entries once all other HT_ENUMERATE_* bits are
> >+ *    processed.
> 
> Are all combinations of these flags valid?  

Yes, although I guess NEXT is more the absence of STOP than a meaningful flag.  I changed the text a little to make this clearer.


> What happens if both _UNHASH and _REMOVE are 1?

> Perhaps it is worthwhile to explain what happens if HT_ENUMERATE_NEXT is 0
> and either _UNHASH or _REMOVE is non-zero.

Does this text make it clearer that REMOVE subsumes the function of UNHASH and that NEXT isn't quite a bit to be or'd that could conflict with those other bits?


> I would suggest:

I think I've addressed this in a different manner.


> >+ * value corresponding to keyHash and key was found, the entry for that value
> >+ * will be the result of dereferencing the returned pointer.  If no such value
> >+ * was found, the result of dereferencing the returned pointer will be NULL.
> 
> Under what circumstances will this function return NULL?  

No mention is made of the function returning NULL, therefore the reader must assume (correctly) that it doesn't.  Right?  I don't feel a need to describe a situation which will never happen.


> >+ * Removes the given key-value correspondence from the hash table.  hep must be
> >+ * a valid pointer to the canonical pointer to the given entry, as returned from
> >+ * one of the RawLookup functions, and it must be the case that *hep == he.
> 
> or else what?

Garbage in, garbage out.  This is like passing a pointer to freed memory to a function -- maybe it'll work, maybe it won't.  We can't say what will happen, so we describe no permissible use in which a client might do this, and therefore clients should know not to do it.
Attachment #387352 - Attachment is obsolete: true
Attachment #388608 - Flags: review?(nelson)
A bug in SpiderMonkey's uses of these hash tables points me to another requirement: PL_HashTableRawAdd doesn't check whether an entry already exists for key-keyHash.  It instead assumes that hep is where such a new entry should be stored -- if an association for key-keyHash exists in the table, a phantom duplicate entry is going to be added, which is clearly bad.  We can guard against this to a small extent with a cheap assertion that hep doesn't contain a pointer to an entry for key (the most likely possibility, if this location -- returned by a lookup for key/keyHash -- actually does contain an entry for key/keyHash), so do that.  (The other option would be to add an assertion that uses PL_HashTableRawLookupConst, but it seems less likely we want that overhead even if it's only in case of DEBUG.)
Attachment #388608 - Attachment is obsolete: true
Attachment #388995 - Flags: review?(nelson)
Attachment #388608 - Flags: review?(nelson)
I really don't like all this abstract discussion of "locations", and 
invalidation of "locations".  We're talking about singly linked lists here.  
hep always points to the "next" pointer in a predecessor link, or the next 
pointer at the head of the list.  The documentation should clearly reflect 
that reality.  

Jeff, have you tried building FF with your patch (or running FF with a 
patched version of NSPR) for a while to see if your assertion ever hits?
(In reply to comment #8)
> I really don't like all this abstract discussion of "locations", and 
> invalidation of "locations".  We're talking about singly linked lists here.  
> hep always points to the "next" pointer in a predecessor link, or the next 
> pointer at the head of the list.  The documentation should clearly reflect 
> that reality.  

Why?  Users shouldn't have to care about the precise (internal) meaning of a location, and telling them adds mental overhead to understanding the hash table methods.  Also, how is this actually useful to know from the point of view of someone just using a hash table?  A PLHashEntry** might as well be an opaque handle for all the API user cares (really should have been, if this fast-and-loose raw model were to be exposed, but it's a little late for that now).


> Jeff, have you tried building FF with your patch (or running FF with a 
> patched version of NSPR) for a while to see if your assertion ever hits?

I generally have been, yes, but I forgot to do so with the last version, which has a minor problem (by scale, not by consequence except in debug builds) with the most recently added assertion, easily fixed in this version.
Attachment #389231 - Flags: review?(nelson)
Comment on attachment 388995 [details] [diff] [review]
Recognize one more API requirement

This patch was superseded by a newer patch.
Attachment #388995 - Attachment is obsolete: true
Attachment #388995 - Flags: review?(nelson)
> I don't see how this can even be described here; this is entirely dependent
> upon how the user-provided hash and compare functions interpret the generic
> pointers upon which they and the hash table operate. 

Right, that's exactly what the comments need to explain.  The reader of 
these comments in the header will be looking for answers to the questions 
I posed in comment 5.  The answer to those questions is: it depends on how
these other caller-provided functions work.  That's what needs explanation.

A statement in the headers to the effect that pointers point to "locations"
only states the obvious, a "truism" and sheds no light on the issue.  
Our objective here is to help the coder understand what these functions do
without him having to read through all the source to reverse engineer it.
In this case, it is the hidden aspect of the implementation, that it uses
a singly-linked list, that we want to clarify.  The message is not that 
some single "location" may change, invalidating any private copies of it,
but rather is that the entire linked list may be reordered at will by certain functions, invalidating any copies of any pointers found anywhere in that chain.  That's the message to get across.  "If your code is holding onto 
copies of pointers from this list, they may all be invalidated by calls to
these functions".    

Let's worry less about abstraction and more about clarifying the coder's 
understanding of the operation of these functions without having to study
their source code.  

/Nelson Bolyard - NSPR module co-owner

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: jwalden → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: