Closed Bug 911254 Opened 12 years ago Closed 12 years ago

Odinmonkey: (ARM) use the 'compare immediate' instruction for bounds checks

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla27

Tracking Flags:

Tracking

Status

firefox26

---

fixed

firefox27

---

fixed

People

(Reporter: dougc, Assigned: dougc)

References

(Blocks 1 open bug)

Details

(Whiteboard: [games])

Attachments

(1 file, 6 obsolete files)

Backend support. 12 years ago Douglas Crosher [:dougc] 6.48 KB, patch		Details \| Diff \| Splinter Review
Further optimize the load-heap instruction sequence. 12 years ago Douglas Crosher [:dougc] 6.74 KB, patch	mjrosenb : review+	Details \| Diff \| Splinter Review
Rebase ARM backend changes. 12 years ago Douglas Crosher [:dougc] 6.87 KB, patch	dougc : review+	Details \| Diff \| Splinter Review
Fix rebase. 12 years ago Douglas Crosher [:dougc] 7.01 KB, patch	dougc : review+	Details \| Diff \| Splinter Review
Enable use of a wider range of heap lengths rather than just powers of two. 12 years ago Douglas Crosher [:dougc] 19.36 KB, patch		Details \| Diff \| Splinter Review
Catch heap sizes less than 4096 bytes, and rework the code for improved clarity. 12 years ago Douglas Crosher [:dougc] 19.36 KB, patch	luke : review+	Details \| Diff \| Splinter Review
Address reviewer feedback. 12 years ago Douglas Crosher [:dougc] 18.85 KB, patch	dougc : review+ lsblakk : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

Douglas Crosher [:dougc]

Assignee

Description

•

12 years ago

The ARM backend currently uses a logical shift and comparison to zero for bounds checks but this only works for heap lengths that are a power of two. Using the 'compare immediate' instruction allows a wider range of heap lengths. The immediate value to this instruction is an eight bit value that can be shifted in multiples of two bits. Thus the quantum by which the heap length can be increased depends on the heap length as follows: up to 4080M: 16M up to 1020M: 4M up to 255M: 1M up to 65280k: 256k up to 16320k: 64k up to 4080k: 16k up to 1020k: 4k It seems unfortunate to expose the asm.js spec. to such ARM constraints, but it's better than the power-of-two restriction. Perhaps a subset of these could be exposed. There is also a compare-negative-immediate instruction that could add some other supported lengths but it's not clear that these would be particularly useful.

Douglas Crosher [:dougc]

Assignee

Comment 1

•

12 years ago

Attached patch Backend support. (obsolete) — Details — Splinter Review

This patch has the backend support and switches to using the compare-immediate instruction. TODO update the AsmJS linker to accept some range of the supported heap lengths and add tests for these.

Luke Wagner [:luke]

Comment 2

•

12 years ago

The max size of an ArrayBuffer is INT32_MAX at the moment, so I think that means we only need a quanta of 8MB to support the largest size. I'm inclined to *always* require a quanta of 8MB since this is conveniently close and bigger than the "huge page" size option on Linux/Windows which conceivably some day we'd like to support (right now we'll just disable asm.js compilation on x64 if the system page size isn't 4KB).

Luke Wagner [:luke]

Comment 3

•

12 years ago

Another question: is there any performance difference on the awfy asmjs-apps benchmarks?

Douglas Crosher [:dougc]

Assignee

Comment 4

•

12 years ago

(In reply to Luke Wagner [:luke] from comment #2) > The max size of an ArrayBuffer is INT32_MAX at the moment, so I think that > means we only need a quanta of 8MB to support the largest size. I'm > inclined to *always* require a quanta of 8MB since this is conveniently > close and bigger than the "huge page" size option on Linux/Windows which > conceivably some day we'd like to support (right now we'll just disable > asm.js compilation on x64 if the system page size isn't 4KB). The instruction encoding only supports shifts of the 8 bits that are a multiple of two, so a quanta of 8MB might not achievable for a maximum heap size of INT32_MAX. E.g. the encoding options are: abcdefgh 00000000 00000000 00000000 00abcdef gh000000 00000000 00000000 ...

Luke Wagner [:luke]

Comment 5

•

12 years ago

Ah, I missed the "multiple of two" part. Yes, so then perhaps we could start with a simple 16MB quanta.

Douglas Crosher [:dougc]

Assignee

Comment 6

•

12 years ago

Support for non-power-of-two heap lengths can help with the optimization of heap access, particularly when using masking for safe heap access. For example, HEAP8[(i&0x07FFFFFF)+96] could be proven to be a safe heap access if the heap length is 128M+1M. The top 1M would be a buffer zone that could still be accessed by a non-masked index. Similarly for a base pointer plus an offset proven to have a limited range. For example, HEAP8[(i&0x07FFFFFF)+(k&0xff)] could proven to be a safe heap access. From the perspective of this use, the table in comment 0 could be rewritten to relate the mask heap length to the minimum possible buffer zone size beyond the masked length: Masked length, minimum quanta 2048M 16M ?? 1024M 16M 512M 4M 256M 4M 128M 1M 64M 1M 32M 256k 16M 256k 8M 64k 4M 64k 2M 16k 1M 16k 512k 4k This scheme hits a problem for a 2048M masked heap length if the maximum heap length is also 2048M. An alternative is to use memory protection to implement such a buffer zone beyond the heap length, and this would only be restricted by the pages size. Both approaches might be offered, giving the programmer the option to choose an explicit buffer zone length or use a default implicit buffer zone using memory protection.

Douglas Crosher [:dougc]

Assignee

Comment 7

•

12 years ago

(In reply to Luke Wagner [:luke] from comment #3) > Another question: is there any performance difference on the awfy asmjs-apps > benchmarks? Testing on a Nexus 4 (Krait) shows a good gain in performance! For example, almost a 10% reduction in run time on memops. This is an unexpected result, but welcomed. asmjs-ubench memops: mean 42.962 vs 38.955 std 0.29575 seconds Double checked the code, and it looks ok. Here's the hottest block: 0xb5ffa614: adds r4, r0, r3 0xb5ffa618: cmp r4, #134217728 ; 0x8000000 <<< bounds check 0xb5ffa61c: ldrsbcc r4, [r11, r4] 0xb5ffa620: movcs r4, #0 0xb5ffa624: and r5, r4, #1 0xb5ffa628: adds r4, r5, r2 0xb5ffa62c: adds r2, r3, #1 0xb5ffa630: cmp r2, #1048576 ; 0x100000 0xb5ffa634: bge 0xb5ffa644

Douglas Crosher [:dougc]

Assignee

Comment 8

•

12 years ago

Comment on attachment 797918 [details] [diff] [review] Backend support. Requesting approval to land the backend changes on the basis that the change gives a useful performance improvement. Could we keep the bug open and land a separate patch to enable the use of some non-power-of-two heaps sizes once the spec. changes are decided?

Attachment #797918 - Flags: review?(luke)

Douglas Crosher [:dougc]

Assignee

Comment 9

•

12 years ago

Comment on attachment 797918 [details] [diff] [review] Backend support. It appears we can do even better performance wise for the load-heap sequence. Let me explore this and double check.

Attachment #797918 - Flags: review?(luke)

Douglas Crosher [:dougc]

Assignee

Comment 10

•

12 years ago

Attached patch Further optimize the load-heap instruction sequence. (obsolete) — Details — Splinter Review

Testing shows that swapping the two load instructions gives a further performance gain. For the memops benchmark the gain is around 10% and for a particular game the gain is 2% on significant noise but still trending in the right direction. Tested on the Krait processor in the Nexus 4. This patch includes only the ARM backend changes related to this bug. It assumes bug 865516 has landed.

Assignee: general → dtc-moz

Attachment #797918 - Attachment is obsolete: true

Attachment #799465 - Flags: review?(mrosenberg)

Luke Wagner [:luke]

Comment 11

•

12 years ago

Nice! Could you describe what the new instruction sequence is?

Douglas Crosher [:dougc]

Assignee

Comment 12

•

12 years ago

(In reply to Luke Wagner [:luke] from comment #11) > Nice! Could you describe what the new instruction sequence is? The ARM uses a test (now compare-immediate) followed by two conditional loads, one load for the in-bounds case and another for the out-of-bounds case. The latest change just swapped the order of these two loads, so that the out-of-bounds conditional load is now first.

Marty Rosenberg [:mjrosenb]

Comment 13

•

12 years ago

Comment on attachment 799465 [details] [diff] [review] Further optimize the load-heap instruction sequence. Review of attachment 799465 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/arm/CodeGenerator-arm.cpp @@ +1844,3 @@ > } else { > + Register d = ToRegister(ins->output()); > + masm.ma_mov(Imm32(0), d, NoSetCond, Assembler::AboveOrEqual); I was thinking of just making this unconditional. Can you see if that has any effect on perf?

Attachment #799465 - Flags: review?(mrosenberg) → review+

Douglas Crosher [:dougc]

Assignee

Comment 14

•

12 years ago

(In reply to Marty Rosenberg [:mjrosenb] from comment #13) > Comment on attachment 799465 [details] [diff] [review] > Further optimize the load-heap instruction sequence. ... > > + Register d = ToRegister(ins->output()); > > + masm.ma_mov(Imm32(0), d, NoSetCond, Assembler::AboveOrEqual); > > I was thinking of just making this unconditional. Can you see if that has > any effect on perf? Yes, this was tried but slowed performance.

Douglas Crosher [:dougc]

Assignee

Comment 15

•

12 years ago

Attached patch Rebase ARM backend changes. (obsolete) — Details — Splinter Review

Carry forward r+

Attachment #799465 - Attachment is obsolete: true

Attachment #801054 - Flags: review+

Douglas Crosher [:dougc]

Assignee

Comment 16

•

12 years ago

Attached patch Fix rebase. (obsolete) — Details — Splinter Review

Sorry, rebase was broken. Carrying forward r+.

Attachment #801054 - Attachment is obsolete: true

Attachment #801059 - Flags: review+

Douglas Crosher [:dougc]

Assignee

Updated

•

12 years ago

Keywords: checkin-needed

Whiteboard: leave open

Ryan VanderMeulen [:RyanVM]

Comment 17

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/4551a70c2fcf

Keywords: checkin-needed

Whiteboard: leave open → [leave open]

Phil Ringnalda (:philor)

Comment 18

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/4551a70c2fcf

Douglas Crosher [:dougc]

Assignee

Comment 19

•

12 years ago

This seems to be the cause of 23% performance regression on the awft asmjs-ubench-memops Tegra-2 benchmark. Performance of the awft asm.js apps seems to be more positive: asmjs-apps-bullet and asmjs-apps-zlib seem to have improved. So this does not appear to be as simple as reverting back to using 'mov' for some processors. A patch that supports both mov and cmp for use as a bounds check plus some instruction order variations has been code up, and I will test on a wider range of processors.

Douglas Crosher [:dougc]

Assignee

Comment 20

•

12 years ago

Attached patch Enable use of a wider range of heap lengths rather than just powers of two. (obsolete) — Details — Splinter Review

This patch enables use of all possible heap lengths as constrained by the x64 page size restriction and the ARM 'cmp immediate' instruction encoding. If you want to enable use of just a subset then it should be trivial so just let me know. The heap length policy has been abstracted into two functions: roundUpAsmJSHeapLength, and IsValidAsmJSHeapLength. On a link failure due to an invalid heap length the message reports the failing length and makes a suggestion by rounding up to the next valid length.

Attachment #804559 - Flags: review?(luke)

Douglas Crosher [:dougc]

Assignee

Comment 21

•

12 years ago

Attached patch Catch heap sizes less than 4096 bytes, and rework the code for improved clarity. (obsolete) — Details — Splinter Review

Forgot to test for heap sizes less than 4096 bytes. Added this and reworked the code to hopefully be a little clearer. Note the sparc has a page size of 8192 bytes, and perhaps it would be prudent to increase the minimum to 8192 and depreciate the 4096 minimum heap length. It's not clear how important this is to ff26 at this stage.

Attachment #804559 - Attachment is obsolete: true

Attachment #804559 - Flags: review?(luke)

Attachment #805144 - Flags: review?(luke)

Luke Wagner [:luke]

Comment 22

•

12 years ago

Comment on attachment 805144 [details] [diff] [review] Catch heap sizes less than 4096 bytes, and rework the code for improved clarity. Review of attachment 805144 [details] [diff] [review]: ----------------------------------------------------------------- Great job! ::: js/src/jit/AsmJS.cpp @@ +929,5 @@ > +// 000000ab cdefgh00 00000000 00000000 > +// 00000000 abcdefgh 00000000 00000000 > +// 00000000 00abcdef gh000000 00000000 > +// 00000000 0000abcd efgh0000 00000000 > +// ... On the ellipsis: are there any other immediates supported? @@ +932,5 @@ > +// 00000000 0000abcd efgh0000 00000000 > +// ... > +// > +// ARM Thumb T2 encoding supports the following immediate constants and although > +// this mode is not currently used it might be worth considering as a constraint: If I'm reading correctly, ARM Thumb T2 seems like a strict generalization of the immediate constraints on ARMv7. Is that right? If so, then it seems like you could remove the chart and comment. @@ +961,5 @@ > +// Heap length 0x00100000 to 0x003fc000 quanta 0x00004000 > +// Heap length 0x00001000 to 0x000ff000 quanta 0x00001000 > +// > +uint32_t > +js::roundUpAsmJSHeapLength(uint32_t length) { { on new line @@ +977,5 @@ > + return (length + 0x000fffff) & ~0x000fffff; > + if (length < 0x10000000u) // < 1024M quanta 4M > + return (length + 0x003fffff) & ~0x003fffff; > + // < 4096M quanta 16M. Note zero is returned if over 0xff000000 but such > + // length are not currently valid. Can you JS_ASSERT(length <= 0xff000000)? @@ +982,5 @@ > + return (length + 0x00ffffff) & ~0x00ffffff; > +} > + > +bool > +js::IsValidAsmJSHeapLength(uint32_t length) { { on new line ::: js/src/jit/AsmJS.h @@ +79,5 @@ > static const size_t AsmJSAllocationGranularity = 4096; > > +// These functions define the valid heap lengths. > +extern uint32_t > +roundUpAsmJSHeapLength(uint32_t length); Capitalize 'r' and also, how about 'RoundUpToNextValidAsmJSHeapLength'? Otherwise 'round' is a bit ambiguous. ::: js/src/jit/AsmJSLink.cpp @@ +214,5 @@ > return LinkFail(cx, "bad ArrayBuffer argument"); > > heap = &bufferVal.toObject().as<ArrayBufferObject>(); > > + if (!IsValidAsmJSHeapLength(heap->byteLength()) || heap->byteLength() < AsmJSAllocationGranularity) { Can you move the >= AsmJSAllocationGranularity check into IsValidAsmJSHeapLength? @@ +216,5 @@ > heap = &bufferVal.toObject().as<ArrayBufferObject>(); > > + if (!IsValidAsmJSHeapLength(heap->byteLength()) || heap->byteLength() < AsmJSAllocationGranularity) { > + return LinkFail(cx, JS_smprintf("ArrayBuffer byteLength 0x%x is not a valid heap length. The next valid length is 0x%x", > + heap->byteLength(), roundUpAsmJSHeapLength(heap->byteLength()))); Nice message!

Attachment #805144 - Flags: review?(luke) → review+

Douglas Crosher [:dougc]

Assignee

Comment 23

•

12 years ago

Attached patch Address reviewer feedback. — Details — Splinter Review

Address reviewer feedback. Carry forward r+

Attachment #805144 - Attachment is obsolete: true

Attachment #805612 - Flags: review+

Douglas Crosher [:dougc]

Assignee

Comment 24

•

12 years ago

(In reply to Luke Wagner [:luke] from comment #22) > Comment on attachment 805144 [details] [diff] [review] > Catch heap sizes less than 4096 bytes, and rework the code for improved > clarity. > > Review of attachment 805144 [details] [diff] [review]: > ----------------------------------------------------------------- Thank you for the quick review. > ::: js/src/jit/AsmJS.cpp > @@ +929,5 @@ > > +// 000000ab cdefgh00 00000000 00000000 > > +// 00000000 abcdefgh 00000000 00000000 > > +// 00000000 00abcdef gh000000 00000000 > > +// 00000000 0000abcd efgh0000 00000000 > > +// ... > > On the ellipsis: are there any other immediates supported? Yes, smaller immediates are supported but this is all we need because the minimum granularity is 4096 or 0x1000. There is also a compare-negative-immediate instruction that could add some more, but it is not clear if they are useful. > @@ +932,5 @@ > > +// 00000000 0000abcd efgh0000 00000000 > > +// ... > > +// > > +// ARM Thumb T2 encoding supports the following immediate constants and although > > +// this mode is not currently used it might be worth considering as a constraint: > > If I'm reading correctly, ARM Thumb T2 seems like a strict generalization of > the immediate constraints on ARMv7. Is that right? If so, then it seems > like you could remove the chart and comment. Ok, the table has been removed, and a small comment added noting that it is also compatible with the Thumb T2 encodeing. > @@ +961,5 @@ > > +// Heap length 0x00100000 to 0x003fc000 quanta 0x00004000 > > +// Heap length 0x00001000 to 0x000ff000 quanta 0x00001000 > > +// > > +uint32_t > > +js::roundUpAsmJSHeapLength(uint32_t length) { > > { on new line Done > @@ +977,5 @@ > > + return (length + 0x000fffff) & ~0x000fffff; > > + if (length < 0x10000000u) // < 1024M quanta 4M > > + return (length + 0x003fffff) & ~0x003fffff; > > + // < 4096M quanta 16M. Note zero is returned if over 0xff000000 but such > > + // length are not currently valid. > > Can you JS_ASSERT(length <= 0xff000000)? Done. > @@ +982,5 @@ > > + return (length + 0x00ffffff) & ~0x00ffffff; > > +} > > + > > +bool > > +js::IsValidAsmJSHeapLength(uint32_t length) { > > { on new line Done. > ::: js/src/jit/AsmJS.h > @@ +79,5 @@ > > static const size_t AsmJSAllocationGranularity = 4096; > > > > +// These functions define the valid heap lengths. > > +extern uint32_t > > +roundUpAsmJSHeapLength(uint32_t length); > > Capitalize 'r' and also, how about 'RoundUpToNextValidAsmJSHeapLength'? > Otherwise 'round' is a bit ambiguous. Done. > ::: js/src/jit/AsmJSLink.cpp > @@ +214,5 @@ > > return LinkFail(cx, "bad ArrayBuffer argument"); > > > > heap = &bufferVal.toObject().as<ArrayBufferObject>(); > > > > + if (!IsValidAsmJSHeapLength(heap->byteLength()) || heap->byteLength() < AsmJSAllocationGranularity) { > > Can you move the >= AsmJSAllocationGranularity check into > IsValidAsmJSHeapLength? Good suggestion. Done.

Douglas Crosher [:dougc]

Assignee

Updated

•

12 years ago

Whiteboard: [leave open]

Douglas Crosher [:dougc]

Assignee

Updated

•

12 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 25

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/06f35795582a

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 26

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/06f35795582a

Status: NEW → RESOLVED

Closed: 12 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → mozilla27

Douglas Crosher [:dougc]

Assignee

Comment 27

•

12 years ago

Comment on attachment 805612 [details] [diff] [review] Address reviewer feedback. [Approval Request Comment] Bug caused by (feature/regressing bug #): User impact if declined: asm.js code will not be able to use the new non-power-of-two heap lengths and backwards compatibility requirements would prevent app developers using in future until ff26 is history. This feature might help reduce memory usage and also improve performance of apps. Testing completed (on m-c, etc.): includes tests Risk to taking this patch (and alternatives if risky): low String or IDL/UUID changes made by this patch:

Attachment #805612 - Flags: approval-mozilla-aurora?

Douglas Crosher [:dougc]

Assignee

Updated

•

12 years ago

Depends on: 917800

Luke Wagner [:luke]

Updated

•

12 years ago

Whiteboard: [games]

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Attachment #805612 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

Attachment #801059 - Attachment is obsolete: true

Ryan VanderMeulen [:RyanVM]

Comment 28

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/65f593f311f8

status-firefox26: --- → fixed

status-firefox27: --- → fixed

Marco Mucci [:MarcoM]

Updated

•

12 years ago

Blocks: gecko-games

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Depends on: CVE-2015-0817

You need to log in before you can comment on or make changes to this bug.