Closed
Bug 581177
Opened 14 years ago
Closed 14 years ago
Have 64 bits for nsINode flags
Categories
(Core :: General, defect, P1)
Core
General
Tracking
()
RESOLVED
FIXED
mozilla5
People
(Reporter: mstange, Assigned: bzbarsky)
References
Details
Attachments
(13 files, 1 obsolete file)
3.91 KB,
text/plain
|
Details | |
870 bytes,
text/plain
|
Details | |
13.81 KB,
patch
|
sicking
:
review+
|
Details | Diff | Splinter Review |
12.64 KB,
patch
|
sicking
:
review+
|
Details | Diff | Splinter Review |
4.39 KB,
patch
|
sicking
:
review+
|
Details | Diff | Splinter Review |
14.40 KB,
patch
|
sicking
:
review+
|
Details | Diff | Splinter Review |
8.21 KB,
patch
|
sicking
:
review+
|
Details | Diff | Splinter Review |
7.58 KB,
patch
|
sicking
:
review+
|
Details | Diff | Splinter Review |
7.07 KB,
patch
|
sicking
:
review+
|
Details | Diff | Splinter Review |
5.89 KB,
patch
|
sicking
:
review+
|
Details | Diff | Splinter Review |
952 bytes,
text/plain
|
Details | |
481 bytes,
application/octet-stream
|
Details | |
502 bytes,
application/octet-stream
|
Details |
All 32 bits are taken, I need a new bit in bug 572689, and this seemed like the fastest of the three solutions bz proposed in bug 572689 comment 23. This patch makes node flags similar to frame state flags which were made 64 bit in bug 570837 and bug 571211. The cast here seems a little dangerous to me: nsSlots* newSlots = CreateSlots(); if (newSlots) { mFlagsOrSlots = reinterpret_cast<nsNodeFlags>(newSlots); In 32 bit builds, sizeof(newSlots) will be 4 and sizeof(nsNodeFlags) will be 8. Is this guaranteed to work anyway?
Attachment #459586 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 1•14 years ago
|
||
If we're just looking for a quickfix, wouldn't separating flags and slots (which we want to do anyway) be better? I guess not on 64-bit....
> In 32 bit builds, sizeof(newSlots) will be 4 and sizeof(nsNodeFlags) will be 8.
> Is this guaranteed to work anyway?
It should, yes. I think.
Assignee | ||
Comment 2•14 years ago
|
||
Then again, the worries about that make me think we really should separate slots from flags and be done.
While separating slots from flags seems like the quickest fix, and one that i'm fine with doing to unblock bug 572689, it's a very short term one. We'll undoubtedly run into the same problem in a matter of months, if not weeks, again. I'm not super exited to use a PRUint64. Won't that result in more code at callsites for 32bit builds? Alternatively we could have GetFlags and GetFlags2 that simply operate on two separate members.
Assignee | ||
Comment 4•14 years ago
|
||
Or switch some things that are currently flags to just |bool x : 1| or |PRUint32 x : 1| members? This would be especially easy for, say, the "is an element" flag, which no one really accesses via GetFlags/SetFlags other than nsGenericElement and its subclasses setting it initially.... That would have the same effect as using a PRUint64 memory-wise, but lead to better code for accessing the bits, right?
Yup, that sounds like a great idea. We should definitely check that bool:1 compiles to a compact bitfield on both recent GCCs and MSVCs though. Unless someone already has...
Reporter | ||
Comment 7•14 years ago
|
||
(In reply to comment #4) > Or switch some things that are currently flags to just |bool x : 1| or > |PRUint32 x : 1| members? Would you mind writing that patch? You seem to have a better understanding of what's necessary than I do.
Reporter | ||
Comment 8•14 years ago
|
||
(In reply to comment #4) > Or switch some things that are currently flags to just |bool x : 1| or > |PRUint32 x : 1| members? Can you make a list of these things? I can try to write the patch. > That would have > the same effect as using a PRUint64 memory-wise, but lead to better code for > accessing the bits, right? Wouldn't it use more memory in a 64 bit build, where the existing flags bitfield is already 64bits wide?
Assignee | ||
Comment 9•14 years ago
|
||
> Can you make a list of these things? Offhand, NODE_IS_ELEMENT and NODE_HAS_ID and NODE_MAY_HAVE_CLASS and NODE_MAY_HAVE_STYLE can go this way easily. > Wouldn't it use more memory in a 64 bit build, Ah, true. Fwiw, I'm sorry on the lag responding to comment 7; I just hadn't found time to do that patch-writing yet. :(
Fwiw, I'm fine with using more memory in 64bit builds. We'll rejiggle this stuff soon enough anyway I'm sure
Reporter | ||
Comment 11•14 years ago
|
||
(In reply to comment #6) > We should definitely check that bool:1 > compiles to a compact bitfield on both recent GCCs and MSVCs though. Unless > someone already has... I've checked GCC 4.2.1 on Mac OS 10.6, it works there. (In reply to comment #9) > > Can you make a list of these things? > > Offhand, NODE_IS_ELEMENT and NODE_HAS_ID and NODE_MAY_HAVE_CLASS and > NODE_MAY_HAVE_STYLE can go this way easily. Actually, is there a reason to convert those now? I could just add the new flag I need in bug 572689 as bool mNodeMayHaveRenderingObservers : 1; without touching existing flags. That would unblock -moz-element from this bug.
Reporter | ||
Updated•14 years ago
|
Assignee: mstange → nobody
Reporter | ||
Updated•14 years ago
|
Attachment #459586 -
Attachment is obsolete: true
Attachment #459586 -
Flags: review?(bzbarsky)
jseward's hot field profiler pointed out that, on 64-bit, the current solution bloats nsINode by 64 bits when we actually have all the bits we need already (since the flags are a union with a pointer).
Reporter | ||
Comment 14•14 years ago
|
||
(In reply to comment #13) > jseward's hot field profiler pointed out that, on 64-bit, the current solution > bloats nsINode by 64 bits What causes this? The fact that nsIContent starts with a 64 bit member variable (the primary frame pointer) which needs to be aligned to the next 64 bit multiple?
Status: ASSIGNED → NEW
Assignee | ||
Comment 15•14 years ago
|
||
Yes, exactly. I'll take this and see what I can do.
Assignee: nobody → bzbarsky
Priority: -- → P1
Assignee | ||
Comment 16•14 years ago
|
||
The results I get from this program: gcc 32-bit (mac+linux): 4 8 8 12 12 12 gcc 64-bit (mac+linux): 8 16 16 16 16 16 MSVC 32-bit: 4 12 8 12 12 12 MSVC 64-bit: 8 16 16 16 16 16 So in other words, bool:1 will pack with itself fine on both, but going from unsigned int to bool aligns to a 4-byte boundary on MSVC.
Assignee | ||
Comment 17•14 years ago
|
||
So I started doing the bool thing, then happened to look at the webkit code and realized that they have in the meantime switched to an unsigned int flag word, as part of a checkin to improve performance. See http://trac.webkit.org/changeset/58914 So I went and looked at what code gcc generates for a getter that returns bool for two cases: boolean bitfield members and an unsigned int member with explicit mask. For the former case, you get code like this for accessing bit 14; the bitfield is in 4(%rdi): movzbl 5(%rdi), %eax shrb $6, %al andl $1, %eax For the latter case, you get code like this for accessing bit 14; the int is in 0(%rdi): movl (%rdi), %edx shrl $14, %edx andl $1, %edx Which _seems_ like a wash to me, if the movzbl is just as fast as the movl. Which is a bit of an if. I also looked at initialization (which the webkit folks explicitly mention in the checkin comment there). If I'm initializing two bits (bit 0 and bit 10), then the code for the boolean bitfield looks like this (with the pointer to the object whose members these things are in %rax): movb $1, 4(%rax) movzbl 5(%rax), %edx andl $-128, %edx orl $4, %edx movb %dl, 5(%rax) The corresponding code for the unsigned int is: movl $1025, (%rax) So indeed initialization is faster for an integer member. That's 64-bit code all along; 32-bit equivalents coming up.
Assignee | ||
Comment 18•14 years ago
|
||
For 32-bit, access to boolean bitfield bit 14 looks like this (bitfield is in 4(%ecx)): movl 4(%ecx), %eax shrl $14, %eax andl $1, %eax while the bit 14 access for the unsigned int (which is in (%ecx)) looks like this: movl (%ecx), %edx shrl $14, %edx andl $1, %edx One difference is that for the int the move into %edx actually happened up front and then was reused for both bit accesses in my test, whereas for the bitfield the dereference of %ecx happened before each bit access. For initialization of the boolean bitfield we have (object pointer lives in %eax): movl 4(%eax), %edx andl $-32768, %edx orl $1025, %edx movl %edx, 4(%eax) while for the int the initialization is: movl $1025, (%eax) I did check, and reordering the members didn't change this behavior. So conclusions.... Codegen for accessing the int seems to be more consistent across 32 and 64-bit, and most likely better than code for accessing the bool bitfield. Initializing the bool bitfield is _definitely_ worse than initializing an int. I'll attach my test file in a sec.
Assignee | ||
Comment 19•14 years ago
|
||
Assignee | ||
Updated•14 years ago
|
Attachment #522930 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 20•14 years ago
|
||
I just tested, and the codegen for the bitfield is the same if I use |unsigned int| instead of bool. So it looks like just having a bitfield makes gcc do byte-by-byte operations. Given that, I think we should simply have two 32-bit flag words; one for arbitrary flags and one for boolean flags only; effectively the last paragraph of comment 3. I'll write that patch series tomorrow.
Assignee | ||
Comment 21•14 years ago
|
||
Oh, and all that was with gcc 4.2 on Mac. I also looked at 4.4 on Linux; some of the register allocation details differ and it's trickier about instruction ordering, but the overall picture is the same: bitfield initialization is slow and byte by byte; reading from a bitfield involves byte ops.
Using bit fields produces a whole lot more readable code though :( It seems like initialization was the only part that was meaningfully different. Aren't we attempting to construct nodes rarely enough that this won't matter anyway. I would expect malloc costs etc to be >> the cost of initializing these fields.
Assignee | ||
Comment 23•14 years ago
|
||
Talked to sicking a bit, and I think I'm sticking with the two-field plan for now, but abstracting it away enough that we can switch as desired. Patch series coming up that frees up 6 bits in the old flag word and leaves space for 24 more boolean flags. I only moved over boolean flags that were not set or tested for in conjunction with other flags and whose setters could be protected.
Assignee | ||
Updated•14 years ago
|
Whiteboard: [need review]
Comment 32•14 years ago
|
||
I just remembered that I think we should include clang/llvm in the compiler tests performed here. The good part is that we have someone to fix clang for us if it can't generate good code for us. :-)
This is a modified version of the test program with the methods out of line to make them easy to compare individually.
Interesting things to note: * LLVM produces the same code for direct bit access and mask. * I think the llvm code is as good as gcc, except for the method three where it avoids a setne. * In the combined case gcc does a really smart job at reusing the al register.
Comment on attachment 523223 [details] [diff] [review] part 1. Separate mFlags and mSlots into separate members. Stop storing flags in slots. >diff --git a/content/xul/content/src/nsXULElement.cpp b/content/xul/content/src/nsXULElement.cpp >@@ -1,8 +1,9 @@ >+ > /* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- > * > * ***** BEGIN LICENSE BLOCK ***** > * Version: MPL 1.1/GPL 2.0/LGPL 2.1 Revert this. r=me with that.
Attachment #523223 -
Flags: review?(jonas) → review+
Comment on attachment 523226 [details] [diff] [review] part 2. Separate mParent from the flags it used to cohabit with. >diff --git a/content/base/public/nsIContent.h b/content/base/public/nsIContent.h >+inline nsIContent* >+nsINode::GetParent() const >+{ >+ return NS_LIKELY(GetBoolFlag(ParentIsContent)) ? >+ static_cast<nsIContent*>(mParent) : >+ nsnull; >+} Why this change? It seems useful to be able to call GetParent and forward the returned pointer to somewhere without having to include nsIContent.h. You should be able to keep the old reinterpret_cast. r=me with that changed back.
Attachment #523226 -
Flags: review?(jonas) → review+
Attachment #523227 -
Flags: review?(jonas) → review+
Attachment #523229 -
Flags: review?(jonas) → review+
Attachment #523228 -
Flags: review?(jonas) → review+
Attachment #523231 -
Flags: review?(jonas) → review+
Comment on attachment 523232 [details] [diff] [review] part 8. Compress the flags so it's clear what's free. But why didn't you do the "may have class" flag as well? Or a large number of the other ones for that matter?
Attachment #523232 -
Flags: review?(jonas) → review+
Attachment #523230 -
Flags: review?(jonas) → review+
Assignee | ||
Comment 40•14 years ago
|
||
> Revert this. Done. > Why this change? It seemed cleaner, but if you think that there are use cases for not including nsIContent, sure. I'll revert that. > But why didn't you do the "may have class" flag as well? I generally converted, for now, the flags that had the following properties: 1) Only gotten or set in isolation, not with other flags. 2) Only needed a protected setter. "may have class" is set from nsSVGClass, so failed the second test for the moment.
Assignee | ||
Comment 41•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/b7c8a433b9e5 http://hg.mozilla.org/mozilla-central/rev/9ac889e59c24 http://hg.mozilla.org/mozilla-central/rev/c516d83c8519 http://hg.mozilla.org/mozilla-central/rev/45eb833bbc84 http://hg.mozilla.org/mozilla-central/rev/bb540a56e9ee http://hg.mozilla.org/mozilla-central/rev/f4a7ad868b50 http://hg.mozilla.org/mozilla-central/rev/b8ec52d26284 http://hg.mozilla.org/mozilla-central/rev/2d69df8b61dc
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Whiteboard: [need review]
Target Milestone: --- → mozilla2.2
Comment 42•14 years ago
|
||
(In reply to comment #40) > > Why this change? > > It seemed cleaner, but if you think that there are use cases for not including > nsIContent, sure. I'll revert that. You forgot to remove + * Implemented in nsIContent.h
Assignee | ||
Comment 43•14 years ago
|
||
Good catch. I pushed http://hg.mozilla.org/mozilla-central/rev/27d6a4a5e20f to do that and also to rev the nsINode and nsIContent IIDs (something I'd been sure I did in the patches above, but apparently forgot).
You need to log in
before you can comment on or make changes to this bug.
Description
•