Closed Bug 534765 Opened 10 years ago Closed 9 years ago

[nj,arm] implement EXPANDED_LOADSTORE for ARM

Categories

(Core Graveyard :: Nanojit, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: vlad, Assigned: vlad)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey)

Attachments

(1 file, 4 obsolete files)

NJ_EXPANDED_LOADSTORE support (8-bit and 16-bit loads/stores, 32-bit float loads/stores) aren't implemented for ARM; I started implementing the same for an unrelated before seeing that support was in nj already for x86.  Attached is my work for ARM, totally untested, for a starting point; it's missing support for 32-bit float loads/stores (the VFP case is easy, but the softfloat case isn't, since it might need to call out to functions to convert).

Jacob, does this look right?  Also, are the 32f cases painful to handle?
for non-vfp, I would suggest handling the 32f cases in SoftFloatFilter (like doubles are), then all the backend sees are plain 32bit int load/store.  SoftFloatFilter of course needs helpers for float32<-->64.
Attached patch updated (obsolete) — Splinter Review
Updated; this also includes an implementation of f2i, from andreas's patch in another bug.  f32 stuff doesn't work correctly yet, but the 16-bit/8-bit stuff seems to work fine.
Attachment #417588 - Attachment is obsolete: true
Attached patch updated againSplinter Review
Now with stupid bugs fixed, f32 stuff seems to work fine.
Attachment #417625 - Attachment is obsolete: true
Comment on attachment 417638 [details] [diff] [review]
updated again

>diff --git a/js/src/jstracer.cpp b/js/src/jstracer.cpp
>--- a/js/src/jstracer.cpp
>+++ b/js/src/jstracer.cpp

Your changes in jstracer don't seem to apply any more.

>+            if (!isS12(dr)) {

This will conflict with Nicholas's code in bug 527178, as that modifies the same code. In his latest patch, isS12 no longer exists.

>+                if (!isS8(offset >> 2) || (offset&3) != 0) {

This is more strict than it needs to be. Yes, the original code was wrong too. In this case, I would use something like this:
if (((offset & ~0x3fc) == 0) || ((-offset & ~0x3fc) == 0))

Some macros need updating too as they include assertions.

Nicholas mentioned that he wanted to go through and check all these cases, so it might be worth talking to him (if you haven't already).

>+                NanoAssertMsg(0, "ld32f not supported with non-VFP, fix me");

Did you mean to implement the soft-float version in this patch, or in a future one?

>+                Register rb = findRegFor(base, GpRegs);
>+
>+                if (value->isconstq()) {
>+                    underrunProtect(LD32_size*2 + 8);
>+
>+                    // XXX use another reg, get rid of dependency
>+                    STR(IP, rb, dr);
>+                    asm_ld_imm(IP, value->imm64_0(), false);
>+                    STR(IP, rb, dr+4);
>+                    asm_ld_imm(IP, value->imm64_1(), false);
>+
>+                    return;
>+                }

This bit doesn't look right. If I understand correctly, this code is storing a 32-bit float, so this should be a single STR (+asm_ld_imm).

It's also confusing that a 32-bit float is stored by asm_store64, even though the source value had 64 bits. I don't suppose it matters much, but a comment might be helpful, particularly in the case where value->isconstq(), because that never needs to handle a 64-bit value at all.

>+#define STRB(_d,_n,_off) do {                                           \
>+        NanoAssert(IsGpReg(_d) && IsGpReg(_n));                         \
>+        NanoAssert(isS12(_off));                                        \
>+        underrunProtect(4);                                             \
>+        if ((_off)<0)   *(--_nIns) = (NIns)( COND_AL | (0x54<<20) | ((_n)<<16) | ((_d)<<12) | ((-(_off))&0xFFF) ); \
>+        else            *(--_nIns) = (NIns)( COND_AL | (0x5C<<20) | ((_n)<<16) | ((_d)<<12) | ((_off)&0xFFF) ); \

I don't like the if/else indenting. Apart from my own personal preference, it's not consistent with the other macros.

>+/*
>+ * The following instructions can only be used with S14 as the
>+ * single-precision register; that limitation can be removed if
>+ * needed, but we'd have to teach NJ about all the single precision
>+ * regs, and their encoding is strange (top 4 bits usually in a block,
>+ * low bit elsewhere).
>+ */

The encoding isn't that important because we can hide it behind macros anyway. Would it be hard to teach NJ about the single-precision registers? It wasn't worth it before, but now we do some single-precision stuff it might become useful. In any case, that's something for another patch.

Do we plan to have single-precision maths, or just loads and stores?

>+#define FSITOD(_Dd,_Sm) do {                                            \
>+        underrunProtect(4);                                             \
>+        NanoAssert(ARM_VFP);                                            \
>+        NanoAssert(IsFpReg(_Dd) && ((_Sm) == FpSingleScratch));         \

You called it "S14" in the comment, and both are defined, but can we be consistent and use exclusively "FpSingleScratch" or "S14"?

>+#define FMSR(_Sn,_Rd) do {                                              \
>+        underrunProtect(4);                                             \
>+        NanoAssert(ARM_VFP);                                            \
>+        NanoAssert(((_Sn) == FpSingleScratch) && IsGpReg(_Rd));         \
>+        *(--_nIns) = (NIns)( COND_AL | (0xE0<<20) | (0x7<<16) | ((_Rd)<<12) | (0xA<<8) | (0<<7) | (0x1<<4) ); \
>+        asm_output("fmsr %s,%s", gpn(_Sn), gpn(_Rd));                  \
>+    } while (0)

You defined FMSR twice.

>+#define FCVTSD(_Sd,_Dm) do {                        \
>+        underrunProtect(4);                         \
>+        NanoAssert(ARM_VFP);                        \
>+        NanoAssert(((_Sd) == S14) && IsFpReg(_Dm)); \
>+        *(--_nIns) = (NIns)( COND_AL | (0xEB7<<16) | (0x7<<12) | (0xBC<<4) | (FpRegNum(_Dm)) ); \
>+        asm_output("[0x%08x] fcvtsd s14,%s", *_nIns, gpn(_Dm));                          \
>+    } while (0)

We don't normally print *_nIns here. Is this left-over debug code? It's in a few other new macros too.


----


Everything else looks good!
(In reply to comment #0)
> Created an attachment (id=417588) [details]
> starting point for 8bit/16bit load store stuff in nj

Whoa, I just noticed this -- this is great, I thought I'd have to implement these myself! :-)

I will definitely review this from the TR point of view, but at present I'm swamped with unrelated work, so it will probably be a few days (maybe after the holidays).

(In reply to comment #4)

> This bit doesn't look right. If I understand correctly, this code is storing a
> 32-bit float, so this should be a single STR (+asm_ld_imm).

right; the semantics are that a 64-bit float is downconverted to a 32-bit float, and stored in a 32-bit memory location.
 
> It's also confusing that a 32-bit float is stored by asm_store64

yeah, it is confusing. the sole reason I did it that way originally was because in the x86 backend it ended up making less redundant code (vs asm_store32). if we want to add a new "asm_store_weird" or some such, I'd be fine with it, but I think the status quo is fine if we add some comments indicating what's going on.
One comment from a quick skim:

re: "must be aligned to two bytes on some architectures, but we never make unaligned accesses so a simple assertion is sufficient here" 

TR will definitely want to do unaligned accesses on the ARM architectures that support it (basically ARMv7 or later). Perhaps we should add (say) NJ_UNALIGNED_ACCESS as a compiletime flag, so we can determine whether such access is acceptable or not?
I'm going to mark this as blocked by bug 527178 because that's a bug I've been trying to fix for a while now (I'm working on it right now), particularly as it blocks several others, and I'd rather avoid the conflicts.  Vlad, I hope this is ok.
Depends on: 527178
(In reply to comment #4)
> Everything else looks good!

Ahem.. you're being too kind, this was actually an older version of the patch since I forgot to qrefresh :-)  I'll upload a newer one as soon as I get home.

(In reply to comment #7)
> I'm going to mark this as blocked by bug 527178 because that's a bug I've been
> trying to fix for a while now (I'm working on it right now), particularly as it
> blocks several others, and I'd rather avoid the conflicts.  Vlad, I hope this
> is ok.

No problem at all; I saw that bug the other day and was going to wait for you to go, especially since this isn't fully done yet.
Ping... what's our status on this? I'd love to get this landed soon if we think it's pretty close. (I'll go ahead and try it out myself locally in the meantime...)
Duplicate of this bug: 532241
Attached patch Rebased, updated patch (obsolete) — Splinter Review
Rebased and updated version of patch 417638 (but only the nanojit parts, not TM-specific stuff). I took the liberty of removing the unaligned-access asserts, since TR will want to enable unaligned use of halfword access on ARM architectures that allow it.
Attachment #422896 - Flags: review?(vladimir)
Comment on attachment 422896 [details] [diff] [review]
Rebased, updated patch

Withdrawing review request; I've found a few bugs in the existing patches and want to do more testing before asking for a review.
Attachment #422896 - Flags: review?(vladimir)
Attached patch rebased, *tested* patch (obsolete) — Splinter Review
Original patch had numerous minor bugs. This patch actually allows various Flash-based acceptance tests to pass on ARM.
Attachment #423026 - Flags: review?(vladimir)
Attachment #422896 - Attachment is obsolete: true
Attachment #423026 - Flags: review?(rreitmai)
Comment on attachment 423026 [details] [diff] [review]
rebased, *tested* patch

This looks fine -- sorry that the patch fell off my radar!

One small comment:

>     // special value referring to S14
>-    FpSingleScratch = 24
>+    FpSingleScratch = 24,
>+    S14 = 24

Should we just go ahead and replace all uses of FpSingleScratch with S14?  FpSingleScratch seemed like a good idea at the time to make it explicit that it's just a scratch register, but it's a mouthful.  We should probably just use one or the other though, since they're used in the same way.
Attachment #423026 - Flags: review?(vladimir) → review+
(In reply to comment #14)
> Should we just go ahead and replace all uses of FpSingleScratch with S14? 

Sure.
Attachment #423026 - Flags: review?(rreitmai) → review+
Whiteboard: fixed-in-nanojit
Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tamarin
http://hg.mozilla.org/tracemonkey/rev/0881e5d1d6a2
Whiteboard: fixed-in-nanojit, fixed-in-tamarin → fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/0881e5d1d6a2
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee: nobody → vladimir
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.