Closed Bug 585604 Opened 14 years ago Closed 14 years ago

Avoid some ARM CPU arch related runtime tests depending on the build target

Categories

(Core Graveyard :: Nanojit, enhancement)

ARM
All
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glandium, Assigned: glandium)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey)

Attachments

(2 files, 3 obsolete files)

Attached patch Proposed patch (against m-c) (obsolete) — Splinter Review
Currently, some parts of the code are protected with _config.arm_arch tests. Depending on the ARM target used by the compiler, some of these tests may be pointless. For example, when the target is armv7t, it's pointless to test whether the runtime arch is armv5t or more, considering other parts of the code will already have failed on armv5t because it's compiled for armv7t.

The proposed patch introduces a ARM_ARCH_AT_LEAST macro that does both build time and run time tests. Optimization will get rid of useless branches.
Attachment #464047 - Attachment is patch: true
Attachment #464047 - Attachment mime type: application/octet-stream → text/plain
Attachment #464047 - Flags: review?(Jacob.Bramley)
Blocks: 552624
Comment on attachment 464047 [details] [diff] [review]
Proposed patch (against m-c)

Nice!

>+#define ARM_ARCH_AT_LEAST(wanted) ((NJ_COMPILER_ARM_ARCH >= wanted) || (_config.arm_arch >= wanted))

The parameters in the expansion ought to be bracketed by themselves in case a compound expression is passed that changes the meaning.

----

Otherwise, it looks good.

Could we have the same functionality for VFP? From memory, I think that condition is tested rather more frequently than the architecture version.
Attachment #464047 - Flags: review?(Jacob.Bramley) → review+
(In reply to comment #1)
> Comment on attachment 464047 [details] [diff] [review]
> Proposed patch (against m-c)
> 
> Nice!
> 
> >+#define ARM_ARCH_AT_LEAST(wanted) ((NJ_COMPILER_ARM_ARCH >= wanted) || (_config.arm_arch >= wanted))
> 
> The parameters in the expansion ought to be bracketed by themselves in case a
> compound expression is passed that changes the meaning.
> 
> ----
> 
> Otherwise, it looks good.

I actually spotted 3 more arm_arch in NativeARM.h.

> Could we have the same functionality for VFP? From memory, I think that
> condition is tested rather more frequently than the architecture version.

It should be possible, but the only gain would be for architectures without vfp (though arguably, they are the slower ones). Also, I'm not sure there would be a better build time test than testing arch < 7.
That would look like this. (totally untested, and i don't have access to enough different arm systems to do a full test).
Assignee: nobody → mh+mozilla
Attachment #464047 - Attachment is obsolete: true
Attachment #464054 - Flags: review?(Jacob.Bramley)
The definition of ARM_VFP changes existing logic:

   #define ARM_VFP ((NJ_COMPILER_ARM_ARCH >= 7) && (_config.arm_vfp))

Existing code merely tested the runtime flag, whereas the new code requires the compiler arch to be >= 7. This meant that previously we could compile for a baseline architecture, then opportunistically use VFP at JIT time if present. The new code precludes that. (I'm not sure if it's an important use case or not, but it *is* a difference.)

Perhaps you really meant

    #define ARM_VFP ((NJ_COMPILER_ARM_ARCH >= 7) || (_config.arm_vfp))


?
> Existing code merely tested the runtime flag, whereas the new code requires the
> compiler arch to be >= 7. This meant that previously we could compile for a
> baseline architecture, then opportunistically use VFP at JIT time if present.
> The new code precludes that. (I'm not sure if it's an important use case or
> not, but it *is* a difference.)

My reasoning was flawed. I was thinking the other way around, and thought that it would make sense to keep the possibility to disable vfp when supported. Obviously, this is not compatible with being able to enable it at runtime when supported while nanojit was built for an older target.
If keeping vfp disableable at all times, then using the ARM_VFP is simply not possible, either way.
Thinking a bit more about it, maybe we could make the definitions different for normal and debug builds.
Something like:
#ifdef DEBUG
#define ARM_VFP ((NJ_COMPILER_ARM_ARCH >= 7) || (_config.arm_vfp))
#else
#define ARM_VFP (_config.arm_vfp)
#endif

The same could be done for ARM_ARCH_AT_LEAST.
Summary: Avoid runtime test for CPU arch when building for that arch or more → Avoid some ARM CPU arch related runtime tests depending on the build target
ARMv6+VFP is important.

(In reply to comment #2)
> It should be possible, but the only gain would be for architectures without vfp
> (though arguably, they are the slower ones). Also, I'm not sure there would be
> a better build time test than testing arch < 7.

Hmm. If you build the C code with VFP enabled then you will always want to generate VFP code. If you build without VFP, you might still want to JIT-compile code with VFP support if you detect it at run time.

Something like this would do the trick:

#define ARM_VFP ((NJ_COMPILER_ARM_VFP) || (_config.arm_vfp))

I'd leave out the architecture stuff for ARM_VFP. (I didn't check that NJ_COMPILER_ARM_VFP actually exists.)
NJ_COMPILER_ARM_VFP unfortunately doesn't exist. So instead, should that be NJ_COMPILER_ARM_ARCH >= 6 ?
(In reply to comment #7)
> NJ_COMPILER_ARM_VFP unfortunately doesn't exist. So instead, should that be
> NJ_COMPILER_ARM_ARCH >= 6 ?

That breaks ARMv6 devices without VFP? Actually, I suppose ARMv6+VFP devices could simply keep the run-time check so checking for >=7 isn't a big deal. Perhaps you were right before anyway.

Alternatively, the attached patch implements NJ_COMPILER_ARM_VFP for GCC and RVCT. It's not tested, however, and I don't know the runes for Visual Studio.
cf. comment 5, I added some #ifdef DEBUG.

As for vfp, I kept it with NJ_COMPILER_ARM_ARCH >= 7 for the moment.
Attachment #464054 - Attachment is obsolete: true
Attachment #464805 - Flags: review?(Jacob.Bramley)
Attachment #464054 - Flags: review?(Jacob.Bramley)
Hum, the previous patch was totally not the right one.
Attachment #464805 - Attachment is obsolete: true
Attachment #464806 - Flags: review?(Jacob.Bramley)
Attachment #464805 - Flags: review?(Jacob.Bramley)
Attachment #464806 - Flags: review?(Jacob.Bramley) → review+
IIUC, you're assuming that it's known ahead of time which ARM target a binary is being compiled for, eg. that this binary will only run on armv7-a targets.  It's not clear to me that this is true.  I don't know for sure, but I can imagine this breaking Mozilla and/or Adobe's build processes for ARM.

The flexibility provided by the run-time tests seems desirable to me, and
the run-time cost is not high -- a few extra branches, and a slightly
larger binary.
(In reply to comment #11)
> IIUC, you're assuming that it's known ahead of time which ARM target a binary
> is being compiled for, eg. that this binary will only run on armv7-a targets. 
> It's not clear to me that this is true.  I don't know for sure, but I can
> imagine this breaking Mozilla and/or Adobe's build processes for ARM.

Any given compiler has a baseline target. For example, Debian's gcc targets armv4t. I think Mozilla's targets armv5t. And I know there is an Ubuntu port targeting armv7-something. If *you* don't know for sure where a binary will run, how it is compiled will definitely limit that for you. That's the point of this patch: if your compiler is generating armv7 code, your binary is very likely not to work on armv6 and less. It's then pointless for it to be able to assemble armv5t instructions (except when debugging, but the #ifdef DEBUG takes care or that).

> The flexibility provided by the run-time tests seems desirable to me, and
> the run-time cost is not high -- a few extra branches, and a slightly
> larger binary.

The runtime tests stay, except the ones below the baseline.
> If *you* don't know for sure where a binary will
> run, how it is compiled will definitely limit that for you. That's the point of
> this patch: if your compiler is generating armv7 code, your binary is very
> likely not to work on armv6 and less. It's then pointless for it to be able to
> assemble armv5t instructions (except when debugging, but the #ifdef DEBUG takes
> care or that).

It's unclear to me what you mean by "binary" here.  Taking Mozilla as an example, there's the Firefox binary, and then there's the code generated at run-time by the JIT compiler.  Which are you referring to?

Maybe I'm missing something about this bug.  AFAICT you're possibly saving a handful of arch tests at run-time.  That doesn't seem like much of a benefit.
(In reply to comment #13)
> > If *you* don't know for sure where a binary will
> > run, how it is compiled will definitely limit that for you. That's the point of
> > this patch: if your compiler is generating armv7 code, your binary is very
> > likely not to work on armv6 and less. It's then pointless for it to be able to
> > assemble armv5t instructions (except when debugging, but the #ifdef DEBUG takes
> > care or that).
> 
> It's unclear to me what you mean by "binary" here.  Taking Mozilla as an
> example, there's the Firefox binary, and then there's the code generated at
> run-time by the JIT compiler.  Which are you referring to?

I mean the executable generated by the build process.

> Maybe I'm missing something about this bug.  AFAICT you're possibly saving a
> handful of arch tests at run-time.  That doesn't seem like much of a benefit.

It has no benefit if your target is armv5t, it can have if your target is armv7. The hidden benefit is that the support for armv4t from bug 552624 would have no runtime impact at all, since obviously the build target for all current nanojit users is armv5 or more (anything below that would fail to match njcpudetect.h)
FWIW, Debian too is starting a port for armv7+vfp.

May I go further and land this ?
(In reply to comment #0)
> Created attachment 464047 [details] [diff] [review]
> Proposed patch (against m-c)
>
> The proposed patch introduces a ARM_ARCH_AT_LEAST macro that does both build
> time and run time tests. Optimization will get rid of useless branches.

The patch looks fine, but I think the rationale used when the previous setup was conceived, was that the fields in _config could/would be declared as static const under the right build configuration, and so the optimization would happen anyway.  

Does that not work for some reason?  If there's a logistical reason for preferring #defined predicates, then we may want to use it in Nativei386.cpp as well, e.g. for Mac which always has SSE2.
(In reply to comment #16)
> The patch looks fine, but I think the rationale used when the previous setup
> was conceived, was that the fields in _config could/would be declared as static
> const under the right build configuration, and so the optimization would happen
> anyway.

That would force the JIT to always compile for the target given at build time, which is not the intent of this patch. This patch still allows the JIT to compile for various targets, except it won't compile for targets "lower" than the one used by gcc/msvc/whatever when building nanojit.

> Does that not work for some reason?  If there's a logistical reason for
> preferring #defined predicates, then we may want to use it in Nativei386.cpp
> as well, e.g. for Mac which always has SSE2.

Well, for such an extension of the ARM case to other platforms to make sense, you'd need to have more than two code paths.
I see now, thanks for the explanation.  I don't see a reason not to land it.
http://hg.mozilla.org/tamarin-redux/rev/956ee477776a
Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tamarin
http://hg.mozilla.org/tracemonkey/rev/d65beb604088
Whiteboard: fixed-in-nanojit, fixed-in-tamarin → fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/d65beb604088
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: