Closed Bug 1050009 Opened 5 years ago Closed 5 years ago

Initialize pldhash tables with a length, not a capacity


(Core :: XPCOM, defect)

Not set





(Reporter: njn, Assigned: njn)




(1 file)

The pldhash initialization functions take a |capacity| parameter, which
determines the initial table capacity. To store |N| items you need to specify a
capacity of |ceil(N / 0.75)|. 

In quite a few places we have a fixed, known number of things (N) that
immediately get inserted into the table, but |N| is specified as the capacity
without doing the division by 0.75. These tables will incur an unnecessary
rehashing operation.

In contrast, js::HashTable takes a |length| parameter, which matches what most
people expect. So let's do the same for pldhash.
This patch does the following.

- Changes the pldhash initialization functions to take an initial length rather
  than an initial capacity.

- Gives that length argument a sensible default. This required changing the
  position of the fallible_t argument in the fallible PL_DHashTableInit()



- I had to adjust lots of pldhash creations. I generally tried to make it so
  that the tables would end up the same size as they currently are.

  - If the caller clearly thought it was a length parameter rather than a
    capacity parameter -- e.g. they had a list of N things that were
    immediately inserted, and used |N| as the parameter -- I left it unchanged.
    Such a table will end up the same capacity as before, but we'll avoid a
    resizing operation along the way. Of course, most of these cases aren't
    visible in the patch, but there are dozens of them.

  - If the caller used PL_DHASH_MIN_SIZE or |16|, I removed the argument so it
    uses the default (which results in a capacity of 16). |16| appeared a lot
    of times, so I figure lots of people are using it as a quasi-default. Later
    on I plan to try reducing the default capacity to 8, and these changes will
    cause such a change have more of an effect.

  - Otherwise, I modified the argument (and renamed it appropriately if
    necessary, e.g. FOO_SIZE --> FOO_LENGTH), so the initial capacity ended up
    the same. Usually this involved halving it, but for some values it was a
    different change (e.g. reducing 10 to 5 would halve the capacity from 16 to
    8, so I instead reduced to 8 to give a capacity of 16).

    In many of these cases it's hard to tell if the caller was
    treating the parameter as a length or a capacity, but I figure I might as
    well assume capacity since that's how old code actually behaved.

I carefully audited every single pldhash instantiation (including those done
via nsTHashtable and its sub-classes) by temporarily adding dummy parameters to
the initialization functions, which forced me to inspect and modify every one
of them.

I've also confirmed via debugging printfs that the distribution of hash table
capacities is pretty much unchanged. Doing a precise comparison is hard, but
the rough check is reassuring that things are working as expected.
Attachment #8468941 - Flags: review?(roc)
Blocks: 1050036
Comment on attachment 8468941 [details] [diff] [review]
Initialize pldhash tables with a length, not a capacity

Review of attachment 8468941 [details] [diff] [review]:

Bit of a rubber-stamp review, to be honest, but it makes sense to me.

::: xpcom/glue/pldhash.cpp
@@ +765,5 @@
>      MallocSizeOf aMallocSizeOf, void* aArg /* = nullptr */)
>  {
> +  fprintf(stderr, "pld: %u / %u\n",
> +          aTable->entryCount,
> +          (uint32_t)1 << (PL_DHASH_BITS - aTable->hashShift));

Don't forget to remove this.
Attachment #8468941 - Flags: review?(roc) → review+
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
You need to log in before you can comment on or make changes to this bug.