Closed Bug 87229 Opened 24 years ago Closed 24 years ago

should nsRuleNode be using jump tables instead of switch statements?

Categories

(Core :: CSS Parsing and Computation, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.3

People

(Reporter: waterson, Assigned: waterson)

References

Details

(Keywords: perf, Whiteboard: [whitebox])

Attachments

(3 files)

dbaron and I were examining a jprof profile of an iBench run, and noticed that nsRuleNode::GetStyleData() appeared to have an inordinate amount of time spent _in the function_. There are two big inlined switch statements that degenerate to long, sequential test-and-jump runs of instructions. Perhaps using jump tables (specifically for the switch statement in nsRuleNode::GetStyleData, and the inlined nsCachedStyleData::GetStyleData() method) might improve this? I'll take a crack at hacking that up and see if it has any effect.
*** Bug 87230 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.3
Keywords: perf
So I went ahead and re-ordered the NS_STYLE_INHERIT_* bits, as it didn't appear that they were in any particular order. Now the bits ``line up'', so GetBitForSID() is trivial: 1 << (aSid - 1) Also, moved the struct info table into the content/shared area so that (in theory) it will link properly in a dynamic build. (I've been hacking on a static build because that's what I have.) Now to see if this makes any difference...
(FWIW, I just disassembled some of the Win32 code, and its switch statements _do_ use jump tables, so it's not likely this will be of any benefit there.)
Wow, even windows can generate some insanely stupid code though. Here's part of nsRuleNode::ComputeStyleData() switch (aSID) { case eStyleStruct_Font: return ComputeFontData((nsStyleFont*)aStartStruct, (const nsCSSFont&) aStartData, aContext, aHighestNode, aRuleDetail, aInherited); case eStyleStruct_Display: return ComputeDisplayData((nsStyleDisplay*)aStartStruct, (const nsCSSDisplay&)aStartData, aContext, aHighestNode, aRuleDetail, aInherited); case eStyleStruct_Visibility: return ComputeVisibilityData((nsStyleVisibility*)aStartStruct, (const nsCSSDisplay&)aStartData, aContext, aHighestNode, aRuleDetail, aInherited); ... And here's the code it generates: 00000000 <? ComputeStyleData@nsRuleNode@@IAEPBUnsStyleStruct@@ABW4nsStyleStructID@@PAU2@ABUn sCSSStruct@@PAVnsIStyleContext@@PAV1@ABW4RuleDetail@1@H@Z>: 0: 55 push %ebp 1: 8b ec mov %esp,%ebp 3: 8b 45 08 mov 0x8(%ebp),%eax 6: 8b 00 mov (%eax),%eax 8: 48 dec %eax 9: 83 f8 13 cmp $0x13,%eax c: 0f 87 25 02 00 00 ja 237 <$L16401+0x19> 12: ff 24 85 00 00 00 00 jmp *0x0(,%eax,4) 00000019 <$L16344>: 19: ff 75 20 pushl 0x20(%ebp) 1c: ff 75 1c pushl 0x1c(%ebp) 1f: ff 75 18 pushl 0x18(%ebp) 22: ff 75 14 pushl 0x14(%ebp) 25: ff 75 10 pushl 0x10(%ebp) 28: ff 75 0c pushl 0xc(%ebp) 2b: e8 00 00 00 00 call 30 <$L16344+0x17> 30: e9 04 02 00 00 jmp 239 <$L16401+0x1b> 00000035 <$L16347>: 35: ff 75 20 pushl 0x20(%ebp) 38: ff 75 1c pushl 0x1c(%ebp) 3b: ff 75 18 pushl 0x18(%ebp) 3e: ff 75 14 pushl 0x14(%ebp) 41: ff 75 10 pushl 0x10(%ebp) 44: ff 75 0c pushl 0xc(%ebp) 47: e8 00 00 00 00 call 4c <$L16347+0x17> 4c: e9 e8 01 00 00 jmp 239 <$L16401+0x1b> 00000051 <$L16350>: 51: ff 75 20 pushl 0x20(%ebp) 54: ff 75 1c pushl 0x1c(%ebp) 57: ff 75 18 pushl 0x18(%ebp) 5a: ff 75 14 pushl 0x14(%ebp) 5d: ff 75 10 pushl 0x10(%ebp) 60: ff 75 0c pushl 0xc(%ebp) 63: e8 00 00 00 00 call 68 <$L16350+0x17> 68: e9 cc 01 00 00 jmp 239 <$L16401+0x1b> 0000006d <$L16353>: 6d: ff 75 20 pushl 0x20(%ebp) 70: ff 75 1c pushl 0x1c(%ebp) 73: ff 75 18 pushl 0x18(%ebp) 76: ff 75 14 pushl 0x14(%ebp) 79: ff 75 10 pushl 0x10(%ebp) 7c: ff 75 0c pushl 0xc(%ebp) 7f: e8 00 00 00 00 call 84 <$L16353+0x17> 84: e9 b0 01 00 00 jmp 239 <$L16401+0x1b> ... At least it's a jump table, but it jumps to the same damn 28-byte sequence 20 times! :-)
Well, maybe it does make a small difference on Windows. It's hard to tell, because I'm doing iBench from the zdnet site over DSL, but here are my numbers: Before: 211.49/83.39/18.30 After: 206.06/82.61/17.63 I'll post some Linux numbers tomorrow.
It occurred to me just now that we could just make the style struct IDs BE the bits, and then you wouldn't even need GetBitForSID.
Seems to be a marginal improvement on Linux. It looks like the gcc on RH7.1 is generating jump tables for switch statements, too. Before 189.31/60.03/18.47 After 186.69/59.85/18.12 Again, noise may account for more swing than this patch, but at least it seems to move in a good direction. hyatt, what do you think? (We should also probably do a jump table instead of nsRuleNode::ComputeStyleData(), and do the casts in each of the ComputeXXXData() members, just to save a few hundred bytes of code space.)
You can make the SIDs be the bits, because then the jump tables won't work.
Er, ``*can't* make the SIDs be the bits.''
sr=hyatt
Keywords: patch
r=attinasi
Ok, verified that the fancy C++ compiles on gcc-2.7.2.3.
Okay, checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Whiteboard: [whitebox]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: