Closed
Bug 87229
Opened 23 years ago
Closed 23 years ago
should nsRuleNode be using jump tables instead of switch statements?
Categories
(Core :: CSS Parsing and Computation, defect, P3)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
mozilla0.9.3
People
(Reporter: waterson, Assigned: waterson)
References
Details
(Keywords: perf, Whiteboard: [whitebox])
Attachments
(3 files)
11.73 KB,
patch
|
Details | Diff | Splinter Review | |
17.90 KB,
patch
|
Details | Diff | Splinter Review | |
114.40 KB,
patch
|
Details | Diff | Splinter Review |
dbaron and I were examining a jprof profile of an iBench run, and noticed that nsRuleNode::GetStyleData() appeared to have an inordinate amount of time spent _in the function_. There are two big inlined switch statements that degenerate to long, sequential test-and-jump runs of instructions. Perhaps using jump tables (specifically for the switch statement in nsRuleNode::GetStyleData, and the inlined nsCachedStyleData::GetStyleData() method) might improve this? I'll take a crack at hacking that up and see if it has any effect.
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.3
Assignee | ||
Comment 2•23 years ago
|
||
Assignee | ||
Comment 3•23 years ago
|
||
Assignee | ||
Comment 4•23 years ago
|
||
So I went ahead and re-ordered the NS_STYLE_INHERIT_* bits, as it didn't appear that they were in any particular order. Now the bits ``line up'', so GetBitForSID() is trivial: 1 << (aSid - 1) Also, moved the struct info table into the content/shared area so that (in theory) it will link properly in a dynamic build. (I've been hacking on a static build because that's what I have.) Now to see if this makes any difference...
Assignee | ||
Comment 5•23 years ago
|
||
(FWIW, I just disassembled some of the Win32 code, and its switch statements _do_ use jump tables, so it's not likely this will be of any benefit there.)
Assignee | ||
Comment 6•23 years ago
|
||
Wow, even windows can generate some insanely stupid code though. Here's part of nsRuleNode::ComputeStyleData() switch (aSID) { case eStyleStruct_Font: return ComputeFontData((nsStyleFont*)aStartStruct, (const nsCSSFont&) aStartData, aContext, aHighestNode, aRuleDetail, aInherited); case eStyleStruct_Display: return ComputeDisplayData((nsStyleDisplay*)aStartStruct, (const nsCSSDisplay&)aStartData, aContext, aHighestNode, aRuleDetail, aInherited); case eStyleStruct_Visibility: return ComputeVisibilityData((nsStyleVisibility*)aStartStruct, (const nsCSSDisplay&)aStartData, aContext, aHighestNode, aRuleDetail, aInherited); ... And here's the code it generates: 00000000 <? ComputeStyleData@nsRuleNode@@IAEPBUnsStyleStruct@@ABW4nsStyleStructID@@PAU2@ABUn sCSSStruct@@PAVnsIStyleContext@@PAV1@ABW4RuleDetail@1@H@Z>: 0: 55 push %ebp 1: 8b ec mov %esp,%ebp 3: 8b 45 08 mov 0x8(%ebp),%eax 6: 8b 00 mov (%eax),%eax 8: 48 dec %eax 9: 83 f8 13 cmp $0x13,%eax c: 0f 87 25 02 00 00 ja 237 <$L16401+0x19> 12: ff 24 85 00 00 00 00 jmp *0x0(,%eax,4) 00000019 <$L16344>: 19: ff 75 20 pushl 0x20(%ebp) 1c: ff 75 1c pushl 0x1c(%ebp) 1f: ff 75 18 pushl 0x18(%ebp) 22: ff 75 14 pushl 0x14(%ebp) 25: ff 75 10 pushl 0x10(%ebp) 28: ff 75 0c pushl 0xc(%ebp) 2b: e8 00 00 00 00 call 30 <$L16344+0x17> 30: e9 04 02 00 00 jmp 239 <$L16401+0x1b> 00000035 <$L16347>: 35: ff 75 20 pushl 0x20(%ebp) 38: ff 75 1c pushl 0x1c(%ebp) 3b: ff 75 18 pushl 0x18(%ebp) 3e: ff 75 14 pushl 0x14(%ebp) 41: ff 75 10 pushl 0x10(%ebp) 44: ff 75 0c pushl 0xc(%ebp) 47: e8 00 00 00 00 call 4c <$L16347+0x17> 4c: e9 e8 01 00 00 jmp 239 <$L16401+0x1b> 00000051 <$L16350>: 51: ff 75 20 pushl 0x20(%ebp) 54: ff 75 1c pushl 0x1c(%ebp) 57: ff 75 18 pushl 0x18(%ebp) 5a: ff 75 14 pushl 0x14(%ebp) 5d: ff 75 10 pushl 0x10(%ebp) 60: ff 75 0c pushl 0xc(%ebp) 63: e8 00 00 00 00 call 68 <$L16350+0x17> 68: e9 cc 01 00 00 jmp 239 <$L16401+0x1b> 0000006d <$L16353>: 6d: ff 75 20 pushl 0x20(%ebp) 70: ff 75 1c pushl 0x1c(%ebp) 73: ff 75 18 pushl 0x18(%ebp) 76: ff 75 14 pushl 0x14(%ebp) 79: ff 75 10 pushl 0x10(%ebp) 7c: ff 75 0c pushl 0xc(%ebp) 7f: e8 00 00 00 00 call 84 <$L16353+0x17> 84: e9 b0 01 00 00 jmp 239 <$L16401+0x1b> ... At least it's a jump table, but it jumps to the same damn 28-byte sequence 20 times! :-)
Assignee | ||
Comment 7•23 years ago
|
||
Well, maybe it does make a small difference on Windows. It's hard to tell, because I'm doing iBench from the zdnet site over DSL, but here are my numbers: Before: 211.49/83.39/18.30 After: 206.06/82.61/17.63 I'll post some Linux numbers tomorrow.
Comment 8•23 years ago
|
||
It occurred to me just now that we could just make the style struct IDs BE the bits, and then you wouldn't even need GetBitForSID.
Assignee | ||
Comment 9•23 years ago
|
||
Seems to be a marginal improvement on Linux. It looks like the gcc on RH7.1 is generating jump tables for switch statements, too. Before 189.31/60.03/18.47 After 186.69/59.85/18.12 Again, noise may account for more swing than this patch, but at least it seems to move in a good direction. hyatt, what do you think? (We should also probably do a jump table instead of nsRuleNode::ComputeStyleData(), and do the casts in each of the ComputeXXXData() members, just to save a few hundred bytes of code space.)
Assignee | ||
Comment 10•23 years ago
|
||
You can make the SIDs be the bits, because then the jump tables won't work.
Assignee | ||
Comment 11•23 years ago
|
||
Er, ``*can't* make the SIDs be the bits.''
Assignee | ||
Comment 12•23 years ago
|
||
Comment 13•23 years ago
|
||
sr=hyatt
Comment 14•23 years ago
|
||
r=attinasi
Assignee | ||
Comment 15•23 years ago
|
||
Ok, verified that the fancy C++ compiles on gcc-2.7.2.3.
Assignee | ||
Comment 16•23 years ago
|
||
Okay, checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Updated•22 years ago
|
Whiteboard: [whitebox]
You need to log in
before you can comment on or make changes to this bug.
Description
•