Closed Bug 581177 Opened 10 years ago Closed 9 years ago

Have 64 bits for nsINode flags

Categories

(Core :: General, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: mstange, Assigned: bzbarsky)

References

Details

Attachments

(13 files, 1 obsolete file)

3.91 KB, text/plain
Details
870 bytes, text/plain
Details
13.81 KB, patch
sicking
: review+
Details | Diff | Splinter Review
12.64 KB, patch
sicking
: review+
Details | Diff | Splinter Review
4.39 KB, patch
sicking
: review+
Details | Diff | Splinter Review
14.40 KB, patch
sicking
: review+
Details | Diff | Splinter Review
8.21 KB, patch
sicking
: review+
Details | Diff | Splinter Review
7.58 KB, patch
sicking
: review+
Details | Diff | Splinter Review
7.07 KB, patch
sicking
: review+
Details | Diff | Splinter Review
5.89 KB, patch
sicking
: review+
Details | Diff | Splinter Review
952 bytes, text/plain
Details
481 bytes, application/octet-stream
Details
502 bytes, application/octet-stream
Details
Attached patch v1 (obsolete) — Splinter Review
All 32 bits are taken, I need a new bit in bug 572689, and this seemed like the fastest of the three solutions bz proposed in bug 572689 comment 23.

This patch makes node flags similar to frame state flags which were made 64 bit in bug 570837 and bug 571211.

The cast here seems a little dangerous to me:

    nsSlots* newSlots = CreateSlots();
    if (newSlots) {
      mFlagsOrSlots = reinterpret_cast<nsNodeFlags>(newSlots);

In 32 bit builds, sizeof(newSlots) will be 4 and sizeof(nsNodeFlags) will be 8. Is this guaranteed to work anyway?
Attachment #459586 - Flags: review?(bzbarsky)
If we're just looking for a quickfix, wouldn't separating flags and slots (which we want to do anyway) be better?  I guess not on 64-bit....

> In 32 bit builds, sizeof(newSlots) will be 4 and sizeof(nsNodeFlags) will be 8.
> Is this guaranteed to work anyway?

It should, yes.  I think.
Then again, the worries about that make me think we really should separate slots from flags and be done.
While separating slots from flags seems like the quickest fix, and one that i'm fine with doing to unblock bug 572689, it's a very short term one. We'll undoubtedly run into the same problem in a matter of months, if not weeks, again.

I'm not super exited to use a PRUint64. Won't that result in more code at callsites for 32bit builds?

Alternatively we could have GetFlags and GetFlags2 that simply operate on two separate members.
Or switch some things that are currently flags to just |bool x : 1| or |PRUint32 x : 1| members?  This would be especially easy for, say, the "is an element" flag, which no one really accesses via GetFlags/SetFlags other than nsGenericElement and its subclasses setting it initially....  That would have the same effect as using a PRUint64 memory-wise, but lead to better code for accessing the bits, right?
fwiw, webkit uses bool:1 for this sort of thing.
Yup, that sounds like a great idea. We should definitely check that bool:1 compiles to a compact bitfield on both recent GCCs and MSVCs though. Unless someone already has...
(In reply to comment #4)
> Or switch some things that are currently flags to just |bool x : 1| or
> |PRUint32 x : 1| members?

Would you mind writing that patch? You seem to have a better understanding of what's necessary than I do.
(In reply to comment #4)
> Or switch some things that are currently flags to just |bool x : 1| or
> |PRUint32 x : 1| members?

Can you make a list of these things? I can try to write the patch.

> That would have
> the same effect as using a PRUint64 memory-wise, but lead to better code for
> accessing the bits, right?

Wouldn't it use more memory in a 64 bit build, where the existing flags bitfield is already 64bits wide?
> Can you make a list of these things?

Offhand, NODE_IS_ELEMENT and NODE_HAS_ID and NODE_MAY_HAVE_CLASS and NODE_MAY_HAVE_STYLE can go this way easily.

> Wouldn't it use more memory in a 64 bit build,

Ah, true.

Fwiw, I'm sorry on the lag responding to comment 7; I just hadn't found time to do that patch-writing yet.  :(
Fwiw, I'm fine with using more memory in 64bit builds. We'll rejiggle this stuff soon enough anyway I'm sure
(In reply to comment #6)
> We should definitely check that bool:1
> compiles to a compact bitfield on both recent GCCs and MSVCs though. Unless
> someone already has...

I've checked GCC 4.2.1 on Mac OS 10.6, it works there.

(In reply to comment #9)
> > Can you make a list of these things?
> 
> Offhand, NODE_IS_ELEMENT and NODE_HAS_ID and NODE_MAY_HAVE_CLASS and
> NODE_MAY_HAVE_STYLE can go this way easily.

Actually, is there a reason to convert those now? I could just add the new flag I need in bug 572689 as bool mNodeMayHaveRenderingObservers : 1; without touching existing flags. That would unblock -moz-element from this bug.
That sounds just fine.  Go for it!
Assignee: mstange → nobody
Attachment #459586 - Attachment is obsolete: true
Attachment #459586 - Flags: review?(bzbarsky)
jseward's hot field profiler pointed out that, on 64-bit, the current solution bloats nsINode by 64 bits when we actually have all the bits we need already (since the flags are a union with a pointer).
(In reply to comment #13)
> jseward's hot field profiler pointed out that, on 64-bit, the current solution
> bloats nsINode by 64 bits

What causes this? The fact that nsIContent starts with a 64 bit member variable (the primary frame pointer) which needs to be aligned to the next 64 bit multiple?
Status: ASSIGNED → NEW
Yes, exactly.

I'll take this and see what I can do.
Assignee: nobody → bzbarsky
Priority: -- → P1
Blocks: 439258
The results I get from this program:

gcc 32-bit (mac+linux): 4  8  8 12 12 12
gcc 64-bit (mac+linux): 8 16 16 16 16 16
MSVC 32-bit:            4 12  8 12 12 12
MSVC 64-bit:            8 16 16 16 16 16

So in other words, bool:1 will pack with itself fine on both, but going from unsigned int to bool aligns to a 4-byte boundary on MSVC.
So I started doing the bool thing, then happened to look at the webkit code and realized that they have in the meantime switched to an unsigned int flag word, as part of a checkin to improve performance.  See http://trac.webkit.org/changeset/58914

So I went and looked at what code gcc generates for a getter that returns bool for two cases: boolean bitfield members and an unsigned int member with explicit mask.

For the former case, you get code like this for accessing bit 14; the bitfield is in 4(%rdi):

	movzbl	5(%rdi), %eax
	shrb	$6, %al
	andl	$1, %eax

For the latter case, you get code like this for accessing bit 14; the int is in 0(%rdi):

	movl	(%rdi), %edx
	shrl	$14, %edx
	andl	$1, %edx

Which _seems_ like a wash to me, if the movzbl is just as fast as the movl.  Which is a bit of an if.

I also looked at initialization (which the webkit folks explicitly mention in the checkin comment there).  If I'm initializing two bits (bit 0 and bit 10), then the code for the boolean bitfield looks like this (with the pointer to the object whose members these things are in %rax):

	movb	$1, 4(%rax)
	movzbl	5(%rax), %edx
	andl	$-128, %edx
	orl	$4, %edx
	movb	%dl, 5(%rax)

The corresponding code for the unsigned int is:

	movl	$1025, (%rax)

So indeed initialization is faster for an integer member.  That's 64-bit code all along; 32-bit equivalents coming up.
For 32-bit, access to boolean bitfield bit 14 looks like this (bitfield is in 4(%ecx)): 

	movl	4(%ecx), %eax
	shrl	$14, %eax
	andl	$1, %eax

while the bit 14 access for the unsigned int (which is in (%ecx)) looks like this:

	movl	(%ecx), %edx
	shrl	$14, %edx
	andl	$1, %edx

One difference is that for the int the move into %edx actually happened up front and then was reused for both bit accesses in my test, whereas for the bitfield the dereference of %ecx happened before each bit access.

For initialization of the boolean bitfield we have (object pointer lives in %eax):

	movl	4(%eax), %edx
	andl	$-32768, %edx
	orl	$1025, %edx
	movl	%edx, 4(%eax)

while for the int the initialization is:

	movl	$1025, (%eax)

I did check, and reordering the members didn't change this behavior.

So conclusions....  Codegen for accessing the int seems to be more consistent across 32 and 64-bit, and most likely better than code for accessing the bool bitfield.  Initializing the bool bitfield is _definitely_ worse than initializing an int.  I'll attach my test file in a sec.
Attachment #522930 - Attachment mime type: application/octet-stream → text/plain
I just tested, and the codegen for the bitfield is the same if I use |unsigned int| instead of bool.  So it looks like just having a bitfield makes gcc do byte-by-byte operations.  Given that, I think we should simply have two 32-bit flag words; one for arbitrary flags and one for boolean flags only; effectively the last paragraph of comment 3.  I'll write that patch series tomorrow.
Oh, and all that was with gcc 4.2 on Mac.  I also looked at 4.4 on Linux; some of the register allocation details differ and it's trickier about instruction ordering, but the overall picture is the same: bitfield initialization is slow and byte by byte; reading from a bitfield involves byte ops.
Using bit fields produces a whole lot more readable code though :(

It seems like initialization was the only part that was meaningfully different. Aren't we attempting to construct nodes rarely enough that this won't matter anyway. I would expect malloc costs etc to be >> the cost of initializing these fields.
Talked to sicking a bit, and I think I'm sticking with the two-field plan for now, but abstracting it away enough that we can switch as desired.

Patch series coming up that frees up 6 bits in the old flag word and leaves space for 24 more boolean flags.  I only moved over boolean flags that were not set or tested for in conjunction with other flags and whose setters could be protected.
Whiteboard: [need review]
I just remembered that I think we should include clang/llvm in the compiler tests performed here.  The good part is that we have someone to fix clang for us if it can't generate good code for us.  :-)
This is a modified version of the test program with the methods out of line to make them easy to compare individually.
Interesting things to note:

* LLVM produces the same code for direct bit access and mask.
* I think the llvm code is as good as gcc, except for the method three where it avoids a setne.
* In the combined case gcc does a really smart job at reusing the al register.
Comment on attachment 523223 [details] [diff] [review]
part 1.  Separate mFlags and mSlots into separate members.  Stop storing flags in slots.

>diff --git a/content/xul/content/src/nsXULElement.cpp b/content/xul/content/src/nsXULElement.cpp
>@@ -1,8 +1,9 @@
>+
> /* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*-
>  *
>  * ***** BEGIN LICENSE BLOCK *****
>  * Version: MPL 1.1/GPL 2.0/LGPL 2.1

Revert this. r=me with that.
Attachment #523223 - Flags: review?(jonas) → review+
Comment on attachment 523226 [details] [diff] [review]
part 2.  Separate mParent from the flags it used to cohabit with.

>diff --git a/content/base/public/nsIContent.h b/content/base/public/nsIContent.h
>+inline nsIContent*
>+nsINode::GetParent() const
>+{
>+  return NS_LIKELY(GetBoolFlag(ParentIsContent)) ?
>+         static_cast<nsIContent*>(mParent) :
>+         nsnull;
>+}

Why this change? It seems useful to be able to call GetParent and forward the returned pointer to somewhere without having to include nsIContent.h. You should be able to keep the old reinterpret_cast.

r=me with that changed back.
Attachment #523226 - Flags: review?(jonas) → review+
Comment on attachment 523232 [details] [diff] [review]
part 8.  Compress the flags so it's clear what's free.

But why didn't you do the "may have class" flag as well? Or a large number of the other ones for that matter?
Attachment #523232 - Flags: review?(jonas) → review+
> Revert this.

Done.

> Why this change?

It seemed cleaner, but if you think that there are use cases for not including nsIContent, sure.  I'll revert that.

> But why didn't you do the "may have class" flag as well?

I generally converted, for now, the flags that had the following properties:

1)  Only gotten or set in isolation, not with other flags.
2)  Only needed a protected setter.

"may have class" is set from nsSVGClass, so failed the second test for the moment.
(In reply to comment #40)
> > Why this change?
> 
> It seemed cleaner, but if you think that there are use cases for not including
> nsIContent, sure.  I'll revert that.

You forgot to remove

+   * Implemented in nsIContent.h
Good catch.  I pushed http://hg.mozilla.org/mozilla-central/rev/27d6a4a5e20f to do that and also to rev the nsINode and nsIContent IIDs (something I'd been sure I did in the patches above, but apparently forgot).
You need to log in before you can comment on or make changes to this bug.