Closed Bug 599251 Opened 14 years ago Closed 14 years ago

nanojit: make Register a non-numeric type

Categories

(Core Graveyard :: Nanojit, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey)

Attachments

(2 files, 2 obsolete files)

When I started looking at bug 599247 I was disturbed to find that many of the types on all the parameters in the i386 codegen functions are wrong -- many parameters that should be 'Register' are 'int32_t', and vice versa.  Also, the ordering of the disp/base/index/scale parameters is inconsistent between different functions, which is asking for trouble.

This reminded me of another thing that has bitten me, which is typing 'r' instead of 'rmask(r)' or vice versa.  Ie. mixing up the Register and RegisterMask types.

Both these problems can be avoided by making Register a non-integer type.  The attached patch is only partial and very rough, but gives the idea.  Specifically:

- Register is a wrapper around a uint32_t:

    struct Register {
        uint32_t n;     // the register number
    };

- In various places you have to write 'r.n' instead of 'r', eg. when accessing arrays that are indexed by registers.  This isn't very onerous.

- Each Register value now looks like this:

    static const Register EAX = { 0 };  // return value, scratch

And that's most of it.

I think this is a good thing to do.  The only problem is that it requires numerous changes to every back-end.  I can do i386, X64 and ARM myself, but the other four would have to be fixed up by their respective maintainers.

This bug depends on bug 599245 because that cleans up the Nativei386.cpp functions, many of will be touched by this bug as well.
There's no attached patch.

My one concern here is that we verify that all interesting compilers will actually registerize these structs appropriately; at one point in the past I found compilers (I forget which) that apparently decided, "struct? that's not going to fit in a register! let's always put it on the stack!" (regardless of size)

(One of my many pet peeves with C++ is that there's no way to declare a "not-compatible-with-int" enum, which is really what's called for here... sigh)
If we're worried about compiler smarts then the usual trick is to use a struct in DEBUG builds and an int in non-DEBUG builds.  To do that requires mainly that Register is an ADT with functions (including construction logic) that are not methods.
(In reply to comment #1)
> "struct? that's not going to fit in a register! 

I've seen the same, but only in regards to multiple fields being used in the struct.  

I think this change is worth pursuing, so if you're game Nick, we can work together to ensure that all the compilers we worry about do the optimization.
Lars's suggestion will work, actually.  Register is currently a value type and is used as such in all our code, so making it into an opaque ADT for debug-only should be feasible if optimizers balk on struct { int }.
(In reply to comment #4)
> Lars's suggestion will work, actually.

That's what I've done, I'll put up a patch later today.
Here's a draft patch that updates the i386 and X64 back-ends, and partly updates the ARM back-end (leaving it in a broken state).  I only partly did the ARM back-end because (a) I have a very slow virtual machine, (b) it still uses macros for codegen which makes it really hard to work out which values should be integers and which should be Registers.

How should we proceed?  From a TM point-of-view we can't land this until ARM is fixed, which puts the onus on Jacob;  in comparison PPC, MIPS, Sparc and SH4 can be fixed at leisure by their respective maintainers.

Additional thoughts on how to proceed here would be welcome.
Attachment #479021 - Attachment is obsolete: true
Attachment #479675 - Flags: feedback?(edwsmith)
Attachment #479675 - Flags: feedback?(stejohns)
For the problem of mixing RegisterMask and Register, it seems more natural to me to promote RegisterMask to a non-int type. Perhaps it could be done at the same time.
But the initial problem is mixing offset values and register values, hence it is justified to also change Register to non-int.

Just a comment on the patch, you should use a macro for constructing such as:
#ifdef DEBUG
#define Register_INIT(x) { x }
#else
#define Register_INIT(x) (x)

Lars mentionned this point also "... (including construction logic)"
njn, since you're already changing Registers, would you mind rename them to rEAX, rEBX, rRAX, etc. to fix Bug 570726?

I'm sorry I didn't update Bug 570726 because I'm busy on something else.

Thanks!
OS: Mac OS X → Windows XP
(In reply to comment #8)
> For the problem of mixing RegisterMask and Register, it seems more natural to
> me to promote RegisterMask to a non-int type. Perhaps it could be done at the
> same time.
> But the initial problem is mixing offset values and register values, hence it
> is justified to also change Register to non-int.

There are two problems:  mixing offsets with Register, and mixing Register with RegisterMask.  Changing Register to non-int fixes both of them.  I could change RegisterMask to a struct as well but that would just be extra work for no benefit.

> Just a comment on the patch, you should use a macro for constructing such as:
> #ifdef DEBUG
> #define Register_INIT(x) { x }
> #else
> #define Register_INIT(x) (x)
> 
> Lars mentionned this point also "... (including construction logic)"

I originally did this, then I discovered that "int x = { 3 }" is completely legitimate C++ (C, too).  So I figured there is no point introducing an extra macro.
Ok, but the point here is to have an abstract type.
Thus the contruction mechanism should not be neglected, and currently there is no contructor from int to Register. 
If we do not want automatic promotion, a macro can do the work, but there is perhaps other alternatives. Making explicit the shape of the type removes the abstraction and any possible further change.
For instance statements like this in you patch:
Register r5 = { 5 };
would become:
Register r5 = Register_INIT(5);

And statements like this:
Register r = { (sizeof(RegisterMask) == 4) ? msbSet32(mask) : msbSet64(mask) };
Register r = Register_INIT((sizeof(RegisterMask) == 4) ? msbSet32(mask) : msbSet64(mask));

For the regmask, it is an option of course, but errors like this:
Register r = Register_INIT( (sizeof(RegisterMask) == 4) ? msbSet32(mask) : mask );
would be avoided if the Regmask is non-int.
(In reply to comment #11)
> Ok, but the point here is to have an abstract type.

I disagree?  The point is not to have an abstract type, we are not hiding the internal representation of the type, we are just making it be like an integer but cannot be used interchangeably with an integer.  It's quite important that register be integer-like, eg. in i386 various registers have to have specific integer values to match the ISA encoding.

If we add Register_INIT() there's nothing stopping anyone from using the '{ 5 }' form anyway, and Register_INIT() is longer, so I didn't bother with it.
Comment on attachment 479675 [details] [diff] [review]
patch (against TM 54638:da5a3ef843f8)

-            Register reg:7;
+            uint32_t regnum:7;

I think I'll miss the ability to do "p *ins" in gdb and see the register printed by name rather than int, but the benefits of the patch outweigh this niceity.

(In reply to comment #11)
> Register r = { (sizeof(RegisterMask) == 4) ? msbSet32(mask) : msbSet64(mask) };
> Register r = Register_INIT((sizeof(RegisterMask) == 4) ? msbSet32(mask) :
> msbSet64(mask));

As a person reading the code, I like not having to read "Register_INIT(...)" when it doesn't do anything of interest.  { value } is clear; that it works on plain int in C/C++ is great.  (you learn something new every day).

(In reply to comment #7)
> How should we proceed?  From a TM point-of-view we can't land this until ARM is
> fixed, which puts the onus on Jacob;  in comparison PPC, MIPS, Sparc and SH4
> can be fixed at leisure by their respective maintainers.
> 
> Additional thoughts on how to proceed here would be welcome.

Could we organize the code so each backend can select the struct or non-struct wrapper?  Then, backends that aren't done yet can continue to work normally until they are updated, including ARM.  Or, maybe extend the #ifdef in NativeCommon.h to list the backends that are already fixed.

I'm surprised the "typedef uint32_t Register" definition in NativeCommon.h doesn't conflict with the enum definitions in each NativeXXX.h.  But only a little bit... C is lax.

a REGNAME function might be handy for debugging, maybe.  (in gdb:  p REGNAME(r) instead of p regNames[r.n])... or, gpn(r) would work if it was an inline function instead of a macro.
Attachment #479675 - Flags: feedback?(edwsmith) → feedback+
Just a question in regards to NativeCommon.h; Native.h appears to serve a similar purpose.  

Should we move the class/struct declarations from latter to the former?
Attachment #479675 - Flags: feedback?(stejohns) → feedback+
(In reply to comment #13)
> 
> Could we organize the code so each backend can select the struct or non-struct
> wrapper?  Then, backends that aren't done yet can continue to work normally
> until they are updated, including ARM.  Or, maybe extend the #ifdef in
> NativeCommon.h to list the backends that are already fixed.

Nice idea, I'll try to work out a way to do it.

> I'm surprised the "typedef uint32_t Register" definition in NativeCommon.h
> doesn't conflict with the enum definitions in each NativeXXX.h.  But only a
> little bit... C is lax.
> 
> a REGNAME function might be handy for debugging, maybe.  (in gdb:  p REGNAME(r)
> instead of p regNames[r.n])... or, gpn(r) would work if it was an inline
> function instead of a macro.

I'll add something.

(In reply to comment #14)
> Just a question in regards to NativeCommon.h; Native.h appears to serve a
> similar purpose.  
> 
> Should we move the class/struct declarations from latter to the former?

They don't serve the same purpose.  NativeCommon.h contains stuff that is used in all NativeXYZ.{h,cpp} files.  Native.h is what is included by files that want to import a back-end but are being agnostic about which one it is.  A picture may help:

              NativeCommon.h
          /         |          \
  Nativei386.h    NativeX64.h    ...
          \         |          /
                 Native.h

Make sense?
(In reply to comment #13)
> 
> I'm surprised the "typedef uint32_t Register" definition in NativeCommon.h
> doesn't conflict with the enum definitions in each NativeXXX.h.  

It does conflict... which is why I removed the enum from the back-ends I've done so far.
Here's a patch that does i386 and X64 properly, and modifies the other back ends the smallest amount possible to preserve the existing behaviour.  I had to change Register from an enum to a uint32_t, and there's a constant NJ_USE_UINT32_REGISTER that the unchanged back-ends have to define.

I tried testing on the Tamarin buildbot.  I was stymied by the fact that the build logs aren't visible outside the Adobe firewall :(  wmaddox helped me a bit, but I figure the platform maintainers are in a better position to fix my syntax errors!  (From what wmaddox told me, though, I'm worried that the Sparc compiler doesn't accept the "int x = { 3 }" syntax.)  android-compile-sandbox succeeded, so I think the ARM back-end works, at least.

I tried turning gpn() into a function but had compile troubles.  Is typing "regnames[r.n]" much harder than "regnames[r]"?
Attachment #479675 - Attachment is obsolete: true
Attachment #480001 - Flags: review?(edwsmith)
(In reply to comment #7)
> (b) it still uses macros for codegen which makes it really hard to work out which values should
> be integers and which should be Registers.

In bug 592223 I proposed converting all the macros to in-line functions. No, I haven't made any progress yet, but if it's becoming something important (rather than a nice-to-have) I can re-prioritize it.
(In reply to comment #17)

> I tried testing on the Tamarin buildbot.  I was stymied by the fact that the
> build logs aren't visible outside the Adobe firewall :(

Sorry.  I knew this would be a problem and created bug 581664 back in July; I will expedite it now.

> I tried turning gpn() into a function but had compile troubles.  Is typing
> "regnames[r.n]" much harder than "regnames[r]"?

Nope, not much harder. to punt on gpn() is fine with me.
Attachment #480704 - Flags: review?(rreitmai)
Comment on attachment 480001 [details] [diff] [review]
patch, v2 (against TM 54657:b7a7105dc80f)

Looks good.  I'm doing a TR sandbox run now, but I also tried this test case on the solaris builder and it compiled fine:

// test.c
#include <stdio.h>
int main(int argc, char **argv)
{
        int x = { 1 };
        printf("%d\n",x);
}
Attachment #480001 - Flags: review?(edwsmith) → review+
Attachment #480704 - Flags: review?(rreitmai) → review+
http://hg.mozilla.org/tracemonkey/rev/e339ce37e157
http://hg.mozilla.org/tracemonkey/rev/b75687a3151e
Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tracemonkey
(In reply to comment #21)

> // test.c
> #include <stdio.h>
> int main(int argc, char **argv)
> {
>         int x = { 1 };
>         printf("%d\n",x);
> }

unfortunately, if you use CC (C++ compiler), and change the line to
static int x = { 1 };

Solaris Studio will report an error: Expected an expression.
OS: Windows XP → All
Shoot.  I could have sworn that is what I was testing.  I just tried this, and it worked fine.  What are we doing differently?  Is this a bug in Solaris Studio?

// test.cpp
#include <stdio.h>
int main(int argc, char **argv)
{
    static int x = { 1 };
    printf("%d\n",x);
}

$ CC -V
CC: Sun C++ 5.9 SunOS_sparc Patch 124863-17 2009/10/27

$ CC -o test ./test.cpp
$ ./test
1
Blocks: 602387
Sorry, my bad.
It failed on
static const int x = { 1 };
not
int x = { 1 };
or
static int x = { 1 };
or
const int x = { 1 };
(In reply to comment #25)
> Is this a bug in Solaris Studio?

Absolutely (assuming our reading of the C/C++ standard is correct). That said, we still probably need to work around it...
(In reply to comment #26)
This is crazy.
If it is inside a namespace,
const int x = { 1 }; also fails with Solaris Studio C++.

So, unless we use
#if defined(__SUNPRO_CC) && (defined(NJ_USE_UINT32_REGISTER) || !defined(DEBUG))
Register rEAX = 1;
...
#else
Register rEAX = { 1 };
...
#endif

We have to compromise some performance for Solaris Studio.
I mean, to get around the bug of the compiler, we need either always define Register as struct, or use "static Register" instead of "static const Register".

Filed Bug 603560.
http://hg.mozilla.org/mozilla-central/rev/e339ce37e157
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.