Closed
Bug 87229
Opened 24 years ago
Closed 24 years ago
should nsRuleNode be using jump tables instead of switch statements?
Categories
(Core :: CSS Parsing and Computation, defect, P3)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
mozilla0.9.3
People
(Reporter: waterson, Assigned: waterson)
References
Details
(Keywords: perf, Whiteboard: [whitebox])
Attachments
(3 files)
11.73 KB,
patch
|
Details | Diff | Splinter Review | |
17.90 KB,
patch
|
Details | Diff | Splinter Review | |
114.40 KB,
patch
|
Details | Diff | Splinter Review |
dbaron and I were examining a jprof profile of an iBench run, and noticed that
nsRuleNode::GetStyleData() appeared to have an inordinate amount of time spent
_in the function_. There are two big inlined switch statements that degenerate
to long, sequential test-and-jump runs of instructions. Perhaps using jump
tables (specifically for the switch statement in nsRuleNode::GetStyleData, and
the inlined nsCachedStyleData::GetStyleData() method) might improve this?
I'll take a crack at hacking that up and see if it has any effect.
Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.3
Assignee | ||
Comment 2•24 years ago
|
||
Assignee | ||
Comment 3•24 years ago
|
||
Assignee | ||
Comment 4•24 years ago
|
||
So I went ahead and re-ordered the NS_STYLE_INHERIT_* bits, as it didn't appear
that they were in any particular order. Now the bits ``line up'', so
GetBitForSID() is trivial:
1 << (aSid - 1)
Also, moved the struct info table into the content/shared area so that (in
theory) it will link properly in a dynamic build. (I've been hacking on a static
build because that's what I have.) Now to see if this makes any difference...
Assignee | ||
Comment 5•24 years ago
|
||
(FWIW, I just disassembled some of the Win32 code, and its switch statements
_do_ use jump tables, so it's not likely this will be of any benefit there.)
Assignee | ||
Comment 6•24 years ago
|
||
Wow, even windows can generate some insanely stupid code though. Here's part of
nsRuleNode::ComputeStyleData()
switch (aSID) {
case eStyleStruct_Font:
return ComputeFontData((nsStyleFont*)aStartStruct,
(const nsCSSFont&) aStartData,
aContext, aHighestNode, aRuleDetail, aInherited);
case eStyleStruct_Display:
return ComputeDisplayData((nsStyleDisplay*)aStartStruct,
(const nsCSSDisplay&)aStartData,
aContext, aHighestNode, aRuleDetail, aInherited);
case eStyleStruct_Visibility:
return ComputeVisibilityData((nsStyleVisibility*)aStartStruct,
(const nsCSSDisplay&)aStartData,
aContext, aHighestNode, aRuleDetail,
aInherited);
...
And here's the code it generates:
00000000 <?
ComputeStyleData@nsRuleNode@@IAEPBUnsStyleStruct@@ABW4nsStyleStructID@@PAU2@ABUn
sCSSStruct@@PAVnsIStyleContext@@PAV1@ABW4RuleDetail@1@H@Z>:
0: 55 push %ebp
1: 8b ec mov %esp,%ebp
3: 8b 45 08 mov 0x8(%ebp),%eax
6: 8b 00 mov (%eax),%eax
8: 48 dec %eax
9: 83 f8 13 cmp $0x13,%eax
c: 0f 87 25 02 00 00 ja 237 <$L16401+0x19>
12: ff 24 85 00 00 00 00 jmp *0x0(,%eax,4)
00000019 <$L16344>:
19: ff 75 20 pushl 0x20(%ebp)
1c: ff 75 1c pushl 0x1c(%ebp)
1f: ff 75 18 pushl 0x18(%ebp)
22: ff 75 14 pushl 0x14(%ebp)
25: ff 75 10 pushl 0x10(%ebp)
28: ff 75 0c pushl 0xc(%ebp)
2b: e8 00 00 00 00 call 30 <$L16344+0x17>
30: e9 04 02 00 00 jmp 239 <$L16401+0x1b>
00000035 <$L16347>:
35: ff 75 20 pushl 0x20(%ebp)
38: ff 75 1c pushl 0x1c(%ebp)
3b: ff 75 18 pushl 0x18(%ebp)
3e: ff 75 14 pushl 0x14(%ebp)
41: ff 75 10 pushl 0x10(%ebp)
44: ff 75 0c pushl 0xc(%ebp)
47: e8 00 00 00 00 call 4c <$L16347+0x17>
4c: e9 e8 01 00 00 jmp 239 <$L16401+0x1b>
00000051 <$L16350>:
51: ff 75 20 pushl 0x20(%ebp)
54: ff 75 1c pushl 0x1c(%ebp)
57: ff 75 18 pushl 0x18(%ebp)
5a: ff 75 14 pushl 0x14(%ebp)
5d: ff 75 10 pushl 0x10(%ebp)
60: ff 75 0c pushl 0xc(%ebp)
63: e8 00 00 00 00 call 68 <$L16350+0x17>
68: e9 cc 01 00 00 jmp 239 <$L16401+0x1b>
0000006d <$L16353>:
6d: ff 75 20 pushl 0x20(%ebp)
70: ff 75 1c pushl 0x1c(%ebp)
73: ff 75 18 pushl 0x18(%ebp)
76: ff 75 14 pushl 0x14(%ebp)
79: ff 75 10 pushl 0x10(%ebp)
7c: ff 75 0c pushl 0xc(%ebp)
7f: e8 00 00 00 00 call 84 <$L16353+0x17>
84: e9 b0 01 00 00 jmp 239 <$L16401+0x1b>
...
At least it's a jump table, but it jumps to the same damn 28-byte sequence 20
times! :-)
Assignee | ||
Comment 7•24 years ago
|
||
Well, maybe it does make a small difference on Windows. It's hard to tell,
because I'm doing iBench from the zdnet site over DSL, but here are my numbers:
Before: 211.49/83.39/18.30
After: 206.06/82.61/17.63
I'll post some Linux numbers tomorrow.
Comment 8•24 years ago
|
||
It occurred to me just now that we could just make the style struct IDs BE the
bits, and then you wouldn't even need GetBitForSID.
Assignee | ||
Comment 9•24 years ago
|
||
Seems to be a marginal improvement on Linux. It looks like the gcc on RH7.1 is
generating jump tables for switch statements, too.
Before 189.31/60.03/18.47
After 186.69/59.85/18.12
Again, noise may account for more swing than this patch, but at least it seems
to move in a good direction. hyatt, what do you think?
(We should also probably do a jump table instead of
nsRuleNode::ComputeStyleData(), and do the casts in each of the ComputeXXXData()
members, just to save a few hundred bytes of code space.)
Assignee | ||
Comment 10•24 years ago
|
||
You can make the SIDs be the bits, because then the jump tables won't work.
Assignee | ||
Comment 11•24 years ago
|
||
Er, ``*can't* make the SIDs be the bits.''
Assignee | ||
Comment 12•24 years ago
|
||
Comment 13•24 years ago
|
||
sr=hyatt
Comment 14•24 years ago
|
||
r=attinasi
Assignee | ||
Comment 15•24 years ago
|
||
Ok, verified that the fancy C++ compiles on gcc-2.7.2.3.
Assignee | ||
Comment 16•24 years ago
|
||
Okay, checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Updated•22 years ago
|
Whiteboard: [whitebox]
You need to log in
before you can comment on or make changes to this bug.
Description
•