Closed Bug 240934 Opened 21 years ago Closed 21 years ago

Investigate speeding up PresShell::AlreadyInQueue


(Core :: Layout, defect)

Not set





(Reporter: bzbarsky, Assigned: bzbarsky)



(Keywords: perf)


(2 files)

In bug 229391 Brendan says:

  AlreadyInQueue sure could use a hash queue.  One implementation idea: drop the
  nsVoidArray, create a PLHashTable whose entry struct derives from PLHashEntry
  but mixes in PRCList, so live entries can be kept in a doubly-linked list;
  both unlink and search are thus O(1).

I assume PLHashTable would be preferable to PLDHash because it does not move
entries about in memory?  What are the benefits of this approach over just
sticking the reflow commands in both an array and a PLHashTable/PLDHash and just
using the former to ensure ordering?

And lastly, is this something we want to bother with?  See profile results in
bug 229391 for why I think the answer is yes.
PLDHashTable containing a pointer to the reflow command, in addition to the
nsVoidArray used currently, would use 2 words per entry and would fuse entry
allocations into one power-of-two sized array.  This will win on space unless
the table is badly underloaded.  See (k=3 in the
scenario at hand).

From, viz,

alpha < (esize - 1) / (esize + k)

We have esize = 2, k = 3, and an underloaded table has alpha < .25.  Since .25
is the default minimum alpha, no such underloaded table can exist unless you use

So forget the PLHashTable/PRCList idea, it loses ;-).  Thanks,

Blocks: 203448
Keywords: perf
Oops, math is hard:

alpha < (esize - 1) / (esize + k)

We have esize = 2, k = 3, and an underloaded table has alpha < .20.  Since .25
is the default minimum alpha, it's unlikely that a table could be underloaded in
the [.20, .25] range, but we can use PL_DHashTableSetAlphaBounds to prevent it
from operating in that range.  See the PL_DHASH_MIN_ALPHA macro in pldhash.h.


Argh, my brain is backward still.  What I meant was that if we care, we can let
the table be less loaded than .25 alpha, but not less than .20, and its space
utilization compared to PLHashTable (apart from any cycle concerns with
malloc'ing a lot in the PLHashTable case) will still win.

Blocks: 195408, 237977
Attached patch Proposed patchSplinter Review
Comment on attachment 147475 [details] [diff] [review]
Same as diff -b (For review only)

The changes to CancelReflowCommandInternal are not strictly needed, but I just
couldn't resist.
Attachment #147475 - Flags: superreview?(brendan)
Attachment #147475 - Flags: review?(dbaron)
This makes this function all but disappear from profiles in the bugs this one
blocks (we're talking on the order of 1000 times less time spent in it in some
of those testcases).
Attachment #147475 - Flags: review?(dbaron) → review?(roc)
Comment on attachment 147475 [details] [diff] [review]
Same as diff -b (For review only)

Can you add a comment on AlreadyInQueue that if it returns PR_FALSE, then it
assumes the command will be added to the queue, so the caller must ensure it is
in fact added?
Attachment #147475 - Flags: review?(roc) → review+
I always wondered whether the linear search through the array was a problem :-)
Only when you make the array a few thousand items long and put them in one by
one, searching each time.... ;)

I'll add the comment.
Comment on attachment 147475 [details] [diff] [review]
Same as diff -b (For review only)

>+ReflowCommandHashHashKey(PLDHashTable *table, const void *key)
>+  const nsHTMLReflowCommand* command =
>+    NS_STATIC_CAST(const nsHTMLReflowCommand*, key);
>+  return
>+    NS_PTR_TO_INT32(command->GetTarget()) ^
>+    (command->GetType() << 12) ^
>+    (NS_PTR_TO_INT32(command->GetChildListName()) << 20);

Question/comment, more than nit: Which low-order bits do you expect to be
best-distributed?  nsReflowType is just 2 bits, probably doesn't matter where
you xor it.  The << 20 seems a bit large to me. 

>+ReflowCommandHashMatchEntry(PLDHashTable *table, const PLDHashEntryHdr *entry,
>+                            const void *key)
>+   const ReflowCommandEntry *e =
>+     NS_STATIC_CAST(const ReflowCommandEntry *, entry);
>+   const nsHTMLReflowCommand *command =
>+     NS_STATIC_CAST(const nsHTMLReflowCommand *, key);
>+   const nsHTMLReflowCommand *command2 = e->mCommand;
>+   return command->GetTarget() == command2->GetTarget() &&
>+     command->GetType() == command2->GetType() &&
>+     command->GetChildListName() == command2->GetChildListName();

Uber-nit: swap command and command2 so e->command is loaded earlier, the "2nd"
command is the key, not the entry's key. etc.  Just a thought. in any event -- looks good.

Attachment #147475 - Flags: superreview?(brendan) → superreview+
> Which low-order bits do you expect to be best-distributed?

The target, probably...  The other two quantities can only take on a rather
small set of values, really (one being an enum and the other coming from a
smallish number of nsIAtom* possibilities, which will be sequential).

So the high-order bits of the child list name really don't matter much (since
they'll be identical across child lists).  That mostly justifies the <<20.  I'll
change the <<12 to be <<17 (in case we end up with more reflow types, which is

> swap command and command2

Will do.
If frames are 4- or 8-byte aligned, why not put the reflow type in the low order
bits (xor, future proof a bit)?  I'm assuming you can have different type reflow
commands targeting the same frame.  With multiplicative hash, you want the low
order bits to be best-distributed.

Actually, in the cases when this really matters almost all the reflow commands
will be the same type -- style changed.

Perhaps the right thing to do is simply to bit-shift the target ptr 2 to the right?
Made that change (shifting 2 right for the target) and checked in.  No Tp impact
(as expected).
Closed: 21 years ago
Resolution: --- → FIXED
Assignee: nobody → bzbarsky
Product: Core → Core Graveyard
Component: Layout: Misc Code → Layout
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.