Closed Bug 87229 Opened 23 years ago Closed 23 years ago

should nsRuleNode be using jump tables instead of switch statements?

Categories

(Core :: CSS Parsing and Computation, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.3

People

(Reporter: waterson, Assigned: waterson)

References

Details

(Keywords: perf, Whiteboard: [whitebox])

Attachments

(3 files)

dbaron and I were examining a jprof profile of an iBench run, and noticed that
nsRuleNode::GetStyleData() appeared to have an inordinate amount of time spent
_in the function_. There are two big inlined switch statements that degenerate
to long, sequential test-and-jump runs of instructions. Perhaps using jump
tables (specifically for the switch statement in nsRuleNode::GetStyleData, and
the inlined nsCachedStyleData::GetStyleData() method) might improve this?

I'll take a crack at hacking that up and see if it has any effect.
*** Bug 87230 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.3
Keywords: perf
So I went ahead and re-ordered the NS_STYLE_INHERIT_* bits, as it didn't appear
that they were in any particular order. Now the bits ``line up'', so
GetBitForSID() is trivial:

  1 << (aSid - 1)

Also, moved the struct info table into the content/shared area so that (in
theory) it will link properly in a dynamic build. (I've been hacking on a static
build because that's what I have.) Now to see if this makes any difference...
(FWIW, I just disassembled some of the Win32 code, and its switch statements
_do_ use jump tables, so it's not likely this will be of any benefit there.)
Wow, even windows can generate some insanely stupid code though. Here's part of 
nsRuleNode::ComputeStyleData()

  switch (aSID) {
  case eStyleStruct_Font:
    return ComputeFontData((nsStyleFont*)aStartStruct,
                           (const nsCSSFont&) aStartData,
                           aContext, aHighestNode, aRuleDetail, aInherited);
  case eStyleStruct_Display:
    return ComputeDisplayData((nsStyleDisplay*)aStartStruct,
                              (const nsCSSDisplay&)aStartData,
                              aContext, aHighestNode, aRuleDetail, aInherited);
  case eStyleStruct_Visibility:
    return ComputeVisibilityData((nsStyleVisibility*)aStartStruct,
                                 (const nsCSSDisplay&)aStartData,
                                 aContext, aHighestNode, aRuleDetail,
                                 aInherited);

...

And here's the code it generates:

00000000 <?
ComputeStyleData@nsRuleNode@@IAEPBUnsStyleStruct@@ABW4nsStyleStructID@@PAU2@ABUn
sCSSStruct@@PAVnsIStyleContext@@PAV1@ABW4RuleDetail@1@H@Z>:
   0:	55                   	push   %ebp
   1:	8b ec                	mov    %esp,%ebp
   3:	8b 45 08             	mov    0x8(%ebp),%eax
   6:	8b 00                	mov    (%eax),%eax
   8:	48                   	dec    %eax
   9:	83 f8 13             	cmp    $0x13,%eax
   c:	0f 87 25 02 00 00    	ja     237 <$L16401+0x19>
  12:	ff 24 85 00 00 00 00 	jmp    *0x0(,%eax,4)

00000019 <$L16344>:
  19:	ff 75 20             	pushl  0x20(%ebp)
  1c:	ff 75 1c             	pushl  0x1c(%ebp)
  1f:	ff 75 18             	pushl  0x18(%ebp)
  22:	ff 75 14             	pushl  0x14(%ebp)
  25:	ff 75 10             	pushl  0x10(%ebp)
  28:	ff 75 0c             	pushl  0xc(%ebp)
  2b:	e8 00 00 00 00       	call   30 <$L16344+0x17>
  30:	e9 04 02 00 00       	jmp    239 <$L16401+0x1b>

00000035 <$L16347>:
  35:	ff 75 20             	pushl  0x20(%ebp)
  38:	ff 75 1c             	pushl  0x1c(%ebp)
  3b:	ff 75 18             	pushl  0x18(%ebp)
  3e:	ff 75 14             	pushl  0x14(%ebp)
  41:	ff 75 10             	pushl  0x10(%ebp)
  44:	ff 75 0c             	pushl  0xc(%ebp)
  47:	e8 00 00 00 00       	call   4c <$L16347+0x17>
  4c:	e9 e8 01 00 00       	jmp    239 <$L16401+0x1b>

00000051 <$L16350>:
  51:	ff 75 20             	pushl  0x20(%ebp)
  54:	ff 75 1c             	pushl  0x1c(%ebp)
  57:	ff 75 18             	pushl  0x18(%ebp)
  5a:	ff 75 14             	pushl  0x14(%ebp)
  5d:	ff 75 10             	pushl  0x10(%ebp)
  60:	ff 75 0c             	pushl  0xc(%ebp)
  63:	e8 00 00 00 00       	call   68 <$L16350+0x17>
  68:	e9 cc 01 00 00       	jmp    239 <$L16401+0x1b>

0000006d <$L16353>:
  6d:	ff 75 20             	pushl  0x20(%ebp)
  70:	ff 75 1c             	pushl  0x1c(%ebp)
  73:	ff 75 18             	pushl  0x18(%ebp)
  76:	ff 75 14             	pushl  0x14(%ebp)
  79:	ff 75 10             	pushl  0x10(%ebp)
  7c:	ff 75 0c             	pushl  0xc(%ebp)
  7f:	e8 00 00 00 00       	call   84 <$L16353+0x17>
  84:	e9 b0 01 00 00       	jmp    239 <$L16401+0x1b>

...

At least it's a jump table, but it jumps to the same damn 28-byte sequence 20 
times! :-)
Well, maybe it does make a small difference on Windows. It's hard to tell,
because I'm doing iBench from the zdnet site over DSL, but here are my numbers:

Before: 211.49/83.39/18.30
After:  206.06/82.61/17.63

I'll post some Linux numbers tomorrow.
It occurred to me just now that we could just make the style struct IDs BE the
bits, and then you wouldn't even need GetBitForSID.
Seems to be a marginal improvement on Linux. It looks like the gcc on RH7.1 is
generating jump tables for switch statements, too.

Before 189.31/60.03/18.47
After  186.69/59.85/18.12

Again, noise may account for more swing than this patch, but at least it seems
to move in a good direction. hyatt, what do you think?

(We should also probably do a jump table instead of
nsRuleNode::ComputeStyleData(), and do the casts in each of the ComputeXXXData()
members, just to save a few hundred bytes of code space.)
You can make the SIDs be the bits, because then the jump tables won't work.
Er, ``*can't* make the SIDs be the bits.''
sr=hyatt
Keywords: patch
r=attinasi
Ok, verified that the fancy C++ compiles on gcc-2.7.2.3.
Okay, checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: [whitebox]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: