Closed Bug 527178 Opened 15 years ago Closed 15 years ago

NJ: all our efforts at handling valid displacements are defeated

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: graydon, Assigned: graydon)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(2 files, 3 obsolete files)

Another fine stinky pile of bugs found by the lirasm fuzzer!

Turns out that all the isValidDisplacement machinery set up in bug 519805 doesn't work. Or rather, it works a tiny bit -- when you're lucky, and only for stores -- but not for loads, and even for stores only sometimes.

Why? Oh, it's horrible. The load case fails because we simply override the LirWriter virtual that implements it, so it never runs. If you push it down to LirBufWriter, though, you *still* get defeated. Why? Because of course, in non-PEDANTIC mode we rewrite alloc-based loads to include the FP-based memory offset of the alloc region. Which conveniently ignores the whole notion of "valid displacement". And if you fix *that*, you're still dead because some of the valid displacement judgments in ARM are 8-bit and some are 12-bit, and our implementation only checks 12-bit.

I'm of half an opinion to try another go at this entirely, say by pushing the logic down into the assembler. The only backend that cares at the moment is ARM, and we (should? always?) have IP free there, so I think we can possibly just fudge our store-and-zero-offset business using it.

Alternatively: this patch fixes the problem. To do so it has to:

 - Thread the opcode through to the valid-displacement predicate so it can differentiate the 12-bit and 8-bit cases. 
 - Move the declaration of LOpcode to make it visible to those functions.
 - Thread the opcode through getBaseReg.
 - Move the displacement-checking code in the writer pipeline down to LirBufWriter

Phew! With all this in place, the ARM fuzzer passes. What a mess. Shall I land this as-is, or shall I tinker with an IP-based approach?
Flags: blocking1.9.2?
Attachment #410928 - Flags: review?(gal)
Flags: blocking1.9.2? → blocking1.9.2+
Why does LOpcode have to move from LIR.h to Native.h?
(In reply to comment #1)
> Why does LOpcode have to move from LIR.h to Native.h?

To make the type visible to the NativeFoo.h headers. You can't forward-declare an enum.
Attachment #410928 - Flags: review?(gal)
Attachment #410928 - Flags: review?(edwsmith)
Attachment #410928 - Flags: review?(dvander)
Comment on attachment 410928 [details] [diff] [review]
fix the numerous bugs

I wonder how things seemed to work before. Are large displacements very rare?
Attachment #410928 - Flags: review?(dvander) → review+
LIR_ldcs and LIR_ldcb are rare, and LIR_ldcb is actually probably always going to work since the constant will be co-situated on a page due to the location of nSlot, and the offset there gets 12 bits to play with. I think. So it's really only LIR_ldcs, and we never generate that.

The old path (the one we actually got working) was for "anything that generates the str instruction in the ARM backend", which is *lots* of callers.
Attachment #410928 - Flags: review?(edwsmith) → review+
* Assembler.h: would be nice to have some comments on what "op" means Assembler:getBaseReg (i.e. opcode for what?)

* sparc and ppc are missing out since they also take limited-range displacements.  (fix later? bug?)
http://hg.mozilla.org/tracemonkey/rev/bafd723a2a18
Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tracemonkey
Though, thinking about this further: I have no idea why we are doing this via the current convoluted method in LIR, and not just emitting the adds and rewriting the displacements when necessary, down at the instruction-selection / backend level. What am I missing?
This has come up before; the alternative I think you're thinking of is to drop the displacement field from load/store instructions so they just take an address expression, so load/store instructions don't contain an implicit add.  the main cost is lots more LIR instructions, but there's no reason it couldn't be made to work.  I think this notation (no implicit addressing mode) is also how hotpath and luajit2.0 are designed.
(In reply to comment #9)
> This has come up before; the alternative I think you're thinking of is to drop
> the displacement field from load/store instructions so they just take an
> address expression, so load/store instructions don't contain an implicit add. 
> the main cost is lots more LIR instructions, but there's no reason it couldn't
> be made to work.  I think this notation (no implicit addressing mode) is also
> how hotpath and luajit2.0 are designed.

No, I'm thinking of an even simpler approach: in NativeARM.cpp, in say Assembler::asm_ld for example, when we have a LIR_ldcs, we emit LDRH alone, which requires that the displacement be u8. Then we do all this isValidDisplacement business up in LIR to ensure that the displacement is u8 before it hits Assembler.

The alternative I'm proposing is that in Assembler::asm_ld, we do:

  if (!isU8(d)) { 
    LDRH(rr, ra, 0);
    asm_add_imm(ra, ra, d);
  } else {
    LDRH(rr, ra, 0);
  }

just handle the case at a low level we're currently handling at a high level.
before any of this happened, the code that each backend generated was already like your example -- handle the small case with the CPU's immediate displacement, otherwise do something with more instructions, generating at least one immediate constant in the code.  (which is awkward when it requires a temporary register).

It might be that the ARM backend didn't generate as clean code in your example, but apart from that, it already did what you propose.

My understand is that the reason for handling this at a higher level is to expose the intermediate constants and add instructions to the optimizer passes, particularly CSE.

if done right, you get backend code that only ever creates explicit constants as a result of LIR_int, with a trackable register assignment; anything else is guaranteed to fold into the load instruction that uses it.
http://hg.mozilla.org/mozilla-central/rev/bafd723a2a18
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
graydon says we don't currently generate any code that trips this, so unblocking 192
Flags: blocking1.9.2+ → blocking1.9.2-
Attached patch another approach (obsolete) — Splinter Review
Nick noticed in bug 527754 that the CseFilter can't handle downstream modifications like this. One option is to move the displacement-checking upstream of CseFilter, but I think actually the more I dwell on this I'm uncomfortable with all the mess required to get this information fed back into the lir-writing pipeline. It's best handled just before the instructions hit memory, in the backend. Indeed, it's done this way in numerous other cases in existing backends, where we're chosing instruction patterns by operand-size.

So I'm proposing this patch, which backs out the previous approach and fixes the original problem via localized changes to NativeARM.cpp's handling of LIR_ldcs. I think the simplification here is worth more than any likely gain from CSE'ing a few large-displacement expressions (which we was speculative anyway, and which we're currently not even attempting).
Attachment #415507 - Flags: review?(edwsmith)
This patch gets a big thumbs up from me, partly because it fixes my problem in bug 527754 and partly because it just looks much simpler.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 415507 [details] [diff] [review]
another approach

Okay, I give in :-)

adding Gal SR? since he was also originally advocating for a way to better handle large displacements.

i think a frontend that generates LIR and knows its on ARM (etc) is still free to (say) always use 0 displacement and do pointer math. a backend could sniff that and fold the pointer math into an immediate displacement.  a frontend could even do this with knowlege of the backend's limits.

the simplifying trick is that LIR shouldn't *require* small displacements or pointer math.  Postel's law, basically.

The only flaw in that argument i can think of is if we have a filter that aggressively tries to find constants in the pointer expression and fold them into the displacement without knowlege of the backend's limits.
Attachment #415507 - Flags: review?(edwsmith) → superreview?(gal)
The backend doesn't have a displacement "limit" in this patch, it just has a case where it happens to emit an extra instruction or two to handle larger displacements. All displacements are allowed.
To repeat myself a little:  this new patch fixes a real, current problem with the CseFilter, and IMO that trumps any problems with hypothetical future cases.
Attachment #415507 - Flags: superreview?(gal) → superreview+
Attached patch fix ARM suboptimality (obsolete) — Splinter Review
The previous patch could allocated two registers for a single value on ARM, which is obviously wasteful.  This version avoids it.  (Thus NativeARM.cpp is the main thing to focus on for the review, the rest is unchanged except for bit-rot fixes.)
Attachment #415507 - Attachment is obsolete: true
Attachment #417594 - Flags: review?(graydon)
Attachment #417594 - Flags: review?(graydon) → review+
http://hg.mozilla.org/projects/nanojit-central/rev/2bd8a736808d
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit
Attached patch more ARM fixes (obsolete) — Splinter Review
Ok, I'm not happy with one aspect of this patch.  In nanojit.h we have this:

  #define isS8(i)  ( int32_t(i) == int8_t(i) )
  #define isU8(i)  ( int32_t(i) == uint8_t(i) )
  #define isS16(i) ( int32_t(i) == int16_t(i) )
  #define isU16(i) ( int32_t(i) == uint16_t(i) )

In NativeARM.h we have this:

  #define isS12(offs) ((-(1<<12)) <= (offs) && (offs) < (1<<12))
  #define isU12(offs) (((offs) & 0xfff) == (offs))

isU12 matches isU8 and isU16, but isS12 does not match isS8 and isS16 -- it allows the range -4096..4095 but if it matched it would only allow -2048..2047.  In other words, it should be called isS13.

But then I suspect that it should really be checking for the range -4096..4096, not -4096..4095, since the instructions get a different encoding for positive and negative offsets, so there's no reason why 4096 should be disallowed.  (In comparison, asm_ldr_chk() checks for the range -4096..4096.)

The new patch addresses these matters.  It removes isS12 altogether.

There are several isS8(d) tests still in the ARM backend that I think should actually be (isU8(d) || isU8(-d)) but I don't want to change them right now.  If people think I'm on the right track I can file a follow-up for that.

I've tested this patch on TM trace-tests but not on NJ-central lirasm tests because the latter give me illegal instructions on my ARM VM.  Hmm.
Attachment #417619 - Flags: review?(Jacob.Bramley)
Comment on attachment 417619 [details] [diff] [review]
more ARM fixes

>   #define isS12(offs) ((-(1<<12)) <= (offs) && (offs) < (1<<12))
>   #define isU12(offs) (((offs) & 0xfff) == (offs))

Yes, this is wrong, and it should allow ±4095. You said ±4096, but the patch is correct. The original code actually allowed an invalid offset (-4096) here!

>+            // These are expected to be 2 or 4-byte aligned.

That's not necessary; LDRB can always make byte-aligned accesses. (It'd be pretty useless otherwise.) LDRH can make byte-aligned accesses if you have unaligned access support, but can always make 2-byte-aligned accesses.

>+                // If the offset doesn't fit in 12-bits we can't use it, so we
>+                // recompute 'base' with a smaller (zero) offset.  If
>+                // getBaseReg() returned FP, we can't use it, and so we must
>+                // find a GpReg for 'base'.  Otherwise getBaseReg() must have
>+                // returned a GpReg.
>+                if (ra == FP)
>+                    ra = findRegFor(base, GpRegs);

Why can't you use FP? If it's Ok for the isU12 case, why is it not here? (FP _is_ a GP reg.)

>+                // Nb: getBaseReg() may have modified 'd', so we must use
>+                // ins->disp() here instead of 'd'.
>+                asm_add_imm(IP, ra, ins->disp());

Yes, that's true, but you used 'd' earlier too. Can you guarantee that 'd' won't be moved by getBaseReg if it ends up being within the relevant limits? If so, that's not obvious.

The rest looks sensible, though I didn't review anything outside of NativeARM.{cpp,h} in much detail because it looks like it's already been reviewed & I'm probably not the best person to look at it anyway.

> I've tested this patch on TM trace-tests but not on NJ-central lirasm tests
> because the latter give me illegal instructions on my ARM VM.  Hmm.

The patch doesn't apply cleanly to nanojit-central, but the bit that fails to apply doesn't look relevant any more anyway. The "make check" tests in nanojit-central pass on my Cortex-A8.
Attachment #417619 - Flags: review?(Jacob.Bramley) → review+
(In reply to comment #23)
> (From update of attachment 417619 [details] [diff] [review])
> >   #define isS12(offs) ((-(1<<12)) <= (offs) && (offs) < (1<<12))
> >   #define isU12(offs) (((offs) & 0xfff) == (offs))
> 
> Yes, this is wrong, and it should allow ±4095. You said ±4096, but the patch is
> correct.

Ah, yes.

> The original code actually allowed an invalid offset (-4096) here!

Bad!

> Why can't you use FP? If it's Ok for the isU12 case, why is it not here? (FP
> _is_ a GP reg.)

I didn't realise FP was a GpReg (it's not on i386 or X64).  That simplifies things, I'll post a follow-up patch shortly.

> The patch doesn't apply cleanly to nanojit-central, but the bit that fails to
> apply doesn't look relevant any more anyway.

That's correct, the rejected chunks can be ignored.
Jacob, can you rerun the tests on your machine again?  Sorry my VM isn't working with NJ-central, I'll have to get some ARM hardware somehow...  I did run the TM tests though, they all pass.
Attachment #417594 - Attachment is obsolete: true
Attachment #417619 - Attachment is obsolete: true
Attachment #418029 - Flags: review?(Jacob.Bramley)
Comment on attachment 418029 [details] [diff] [review]
hopefully the last version

nanojit-central's "make check" works fine. I still get rejected chunks, but they aren't relevant.
Attachment #418029 - Flags: review?(Jacob.Bramley) → review+
http://hg.mozilla.org/tamarin-redux/rev/d1bb609bb3fe
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
http://hg.mozilla.org/mozilla-central/rev/0eda8726e2ff
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
(In reply to comment #13)
> graydon says we don't currently generate any code that trips this, so
> unblocking 192

Then does this need to be a hidden security bug?
(In reply to comment #31)
> (In reply to comment #13)
> > graydon says we don't currently generate any code that trips this, so
> > unblocking 192
> 
> Then does this need to be a hidden security bug?

Does Adobe do so in tamarin-redux?  (not actually a rhetorical question)
nanojit has not yet shipped in any Adobe product (well, there are betas of Flash 10.1 with it floating around) so I doubt it's an issue for us.
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: