Closed Bug 1207827 Opened 9 years ago Closed 9 years ago

Simplify AssemblerBufferWithConstantPools for ARM and ARM64

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: jolesen, Assigned: jolesen)

References

Details

Attachments

(12 files, 5 obsolete files)

2.91 KB, patch
nbp
: review+
Details | Diff | Splinter Review
1.40 KB, patch
nbp
: review+
Details | Diff | Splinter Review
1.35 KB, patch
nbp
: review+
Details | Diff | Splinter Review
28.14 KB, patch
nbp
: review+
Details | Diff | Splinter Review
2.76 KB, patch
nbp
: review+
Details | Diff | Splinter Review
11.14 KB, patch
nbp
: review+
Details | Diff | Splinter Review
6.44 KB, patch
nbp
: review+
Details | Diff | Splinter Review
6.38 KB, patch
nbp
: review+
Details | Diff | Splinter Review
1.89 KB, patch
Details | Diff | Splinter Review
7.94 KB, patch
Details | Diff | Splinter Review
8.08 KB, patch
Details | Diff | Splinter Review
51.71 KB, patch
Details | Diff | Splinter Review
The assembler buffer with constant pools is described well in the leading comment in IonAssemblerBufferWithConstantPools.h. The current design maintains a list of constant pools out-of-band, supports at most one constant pool per buffer slice and rewrites branch offsets as part of executableCopy().

This design has a few disadvantages:

- When the constant pools are merged into the instruction stream during executableCopy(), branches are rewritten, and it is possible for branch offsets to go out of range, leading to CrashAtUnhandlableOOM(). This is quite likely to happen for tbz/tbnz instructions on ARM64 which only have a +/- 32 KB range.

- It is difficult to predict if a branch is going to be in range - the BufferOffsets don't account for the size of the constant pools. (The guard jump over the pool is accounted for, but not the pool itself.)

- Memory is wasted when slices are perforated and when keeping multiple Pool objects around.

After the simplifications in Bug 1026919, some of the features of this design are no longer needed and a simpler approach is possible:

- Maintain a single pending pool of entries that have yet to be inserted into the instruction stream, just like AssemblerBufferWithConstantPools does now.

- When the constant pool needs to be flushed, copy its data directly into the AssemblerBuffer memory and patch constant pool references immediately. Then forget about the pool entirely.

- Don't associate pools with slices, don't keep track of which instructions are branches, and don't attempt to rewrite branch offsets in executableCopy().

When constant pool data is inserted inline immediately, all BufferOffsets are accurate and it becomes much easier to predict if a branch target is in range. The BufferSliceTail class can be deleted because we don't need to maintain a bit vector of branch instructions and pool pointers per slice.
See Also: → 1208674
This might not solve the real problem here. Perhaps what is needed is support for patchable instruction sizes, so that extra jumps can be inserted/removed when finalizing. There was some preliminary work done in this direction, trying to make the larger code compatible with the required constraints.
Blocks: 1210554
Douglas, I did some work on LLVM's constant island pass which does more or less what you are suggesting.
See http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMConstantIslandPass.cpp?view=annotate

IIRC, this approach requires an iterative algorithm because increasing the size of one jump instruction can affect other jumps straddling it. It's pretty hard to get right, and it can be difficult to generate tests that require two or more iterations to make sure that the algorithm is working correctly. Things are even worse if you support code or pool entries with alignment requirements.

I would prefer a single-pass algorithm, even if the generated code is slightly worse. This is in line with the current design where constant pools are always emitted after the code referencing the constants. There is a small benefit to be had by allowing constant pools to be placed before the referencing code, but it is much harder to get right.

Bug 1210554 illustrates the problem with near branches going out of range when inserting constant pools.
A single pass algorithm sounds like a good plan, if it will work. I removed the backwards constants to simplify the code and this seemed a good trade off. It was a relative big change and needed extensive testing. There are runtime options to insert fill instructions to help test such code that might be of interest, see insertNopFill().
I think we can incorporate limited-range branches with similar methods:

1. Flush constant pools into the main buffer memory immediately as described above. Only maintain a single pending constant pool.

2. Emit backwards branches immediately. Since BufferOffsets are accurate and don't require rewriting, it is possible to determine right away if a backwards branch target is in range. Use a different instruction sequence if the branch is not in range. For example, arm64 has these options for "if (r0 & 2) goto previous_label;":

a:
  tbnz r0, #2, previous_label  ; backwards range = -32 KB

b:
  tst r0, #2
  b.eq previous_label      ; backwards range = -1 MB

c:
  tbz r0, #2, skip_branch
  b previous_label         ; backwards range = -128 MB (branch predictor prefers b: above)
skip_branch:

d:
  tbz r0, #2, skip_branch
  ldr scratch, constant_pool_entry
  br scratch                ; range = anywhere, requires relocation when copying code
skip_branch:

3. Emit forward branches optimistically and track them similarly to pending constant pool entries. If a forward branch is still pending and about to go out of range, emit a constant pool entry "jump pad" that is an unconditional forward branch to the still unbound label. Then bind the pending branch to the constant pool entry. Like this:

in_range:
  b.eq real_branch_target

out_of_range:
  b.eq constant_pool_jump_pad
  ...
  ...
constant_pool_jump_pad:
  b real_branch_target ; which we still don't know 
other_constant_pool_entries:
  .data foo
  ...

When a forward branch is emitted, we can choose to be optimistic or pessimistic. Optimistically, we expect the forward branch to be in range, and we depend on a constant pool jump pad if we are wrong. Pessimistically, we emit a longer-range branch sequence that is less likely to need a constant pool jump pad. We can chose to be optimistic or pessimistic depending on the nature of the forward branch.

This approach makes it possible to use short-range branches most of the time. This results in better code quality than using pessimistic long-range branches everywhere.
From my reading of Assembler::RetargetNearBranch() in js/src/jit/arm, the 32-bit ARM JIT doesn't deal with out-of-range branches either. It detects out-of-range branches and calls CrashAtUnhandlableOOM("BOffImm").

The A32 encoding of a branch instruction has a +/- 32 MB range, so this is less likely to cause problems.

The Thumb encodings of branches are more restricted:
  T1: +/- 256 B
  T2: +/- 2 KB (uncond only, but can be last in IT block)
  T3: +/- 1 MB
  T4: +/- 16 MB (uncond only, but can be last in IT block)
If we want to generate Thumb code, we need to deal with limited-range branches.
This new method copies a large amount of data into the assembler
buffer, potentially splitting it across multiple slices.

The existing putBytes() method can only copy into a single slice which
limits the maximum since of data that can be inserted and potentially
wastes space at the end of the previous slice.
Attachment #8674978 - Flags: review?(nicolas.b.pierron)
These were exposed by gc/oomInRegExp.js when the buffer slice
alignments changed.
Attachment #8674979 - Flags: review?(nicolas.b.pierron)
This error was exposed by the following patch changing allocation patterns.
This was exposed by the following patch.
Attachment #8674982 - Flags: review?(nicolas.b.pierron)
Since constant pools have been simplified such that they always follow
the load instructions referencing them, it is now possible to copy the
pool data into the main assembly buffer right away in finishPool()
instead of deferring the copy to executableCopy(). This has the
important effect that BufferOffset values don't need to be adjusted to
account for the size of injected constant pools.

- In finishPool(), copy the constant pool data into the main assembler
  buffer with putBytesLarge() which allows the potentially large pool
  data to be split over multiple slices.

- Stop copying pool data in executableCopy(). It has already
  been copied in finishPool().

- Simplify the PoolInfo data structure to a simple (firstEntry, offset)
  pair which records the necessary information about a single emitted
  constant pool.

- Rewrite poolEntryOffset() to use the new PoolInfo data.

- Simplify poolSizeBefore() to just return 0. This function will be
  removed entirely in a followup patch.

- Stop patching branches in executableCopy() and delete the
  patchBranch() method. Since buffer offsets are reliable when the
  branches are inserted and when labels are bound, there is no need to
  adjust branches to account for constant pools.

- Delete the BufferSliceTail class. It is no longer necessary to
  maintain a bit vector of branch instructions since we don't rewrite
  branches any longer. Use the plain AssemblerBuffer::Slice class
  instead.

This patch implements the basic functional change of copying pool data
immediately instead of deferring it. Followup patches will clean up the
now redundant code.
Attachment #8674983 - Flags: review?(nicolas.b.pierron)
There is no longer a 1-1 correspondence between buffer slices and
constant pools. We no longer need the ability to truncate a buffer
slice prematurely.

This uses less memory.
Attachment #8674984 - Flags: review?(nicolas.b.pierron)
Attached patch Use a Vector for poolInfo_. (obsolete) — Splinter Review
Use a Vector with 8 preallocated slots and a LifoAllocPolicy allocating
from the parent AssemblerBuffer's lifoAlloc_. We'll rarely need more
than 8 constant pools in a single assembler.

We can't actually allocate memory from this->lifoAlloc_ in the
constructor, but it is OK to capture allocator references like the
Vector constructor does.

Add an assertion to initWithAllocator() to verify that we didn't
allocate anything from lifoAlloc_ before the MacroAssembler constructor
had a chance to install an AutoJitContextAlloc.

Note that this also improves OOM simulation granularity, see Bug 1215555.
Attachment #8674986 - Flags: review?(nicolas.b.pierron)
This is possible because we no longer PodCopy the pools.

Use a more reasonable size for the inline memory in the loadOffsets
vector (8 loads instead of 512). This saves stack space when a
MacroAssembler is created on the stack. Use a matching inline size of 8
entries for the poolData array itself.

Don't drop memory in Pool::reset() by calling placement new - call
Vector::clear() instead which hangs on to previously allocated memory
for use by the next pool.

Delete an unused array of pools in arm64/vixl/MozBaseAssembler-vixl.h.

Note that this also improves OOM simulation granularity, see Bug 1215555.
Attachment #8674987 - Flags: review?(nicolas.b.pierron)
This method was used by the Assembler::actualOffset() methods to
translate buffer offsets from pre-pool to post-pool numbers. Since
pools are now injected immediately, there is no need to translate
offsets.

All the actualOffset() methods have become no-ops in all our supported
targets.
Attachment #8674988 - Flags: review?(nicolas.b.pierron)
The ARM64 assembler no longer needs to keep track of code offsets for
later translation to 'final' offsets. The AssemblerBuffer offsets are
directly usable now.

Remove tmpDataRelocations_, tmpPreBarriers_, tmpJumpRelocations_, and
the finalOffset() method.
Attachment #8674989 - Flags: review?(nicolas.b.pierron)
The ARM assembler no longer needs to keep track of code offsets for
later translation to 'actual' offsets. The AssemblerBuffer offsets
are directly usable now.

Remove tmpDataRelocations_, tmpPreBarriers_, and tmpJumpRelocations_.
Attachment #8674990 - Flags: review?(nicolas.b.pierron)
Attachment #8674981 - Flags: review?(nicolas.b.pierron)
Assignee: nobody → jolesen
The implemetation of Assembler::actualOffset() is now a no-op in all
targets, and it is no longer necessary to rewrite assembler buffer
offsets to their final form after finishing the assembler buffer.

- Delete Assembler::actualOffset() in all targets.
- Delete AsmJSModule::CodeRange::updateOffsets().
- Delete AsmJSModule::updateCodeOffset().
- Delete AsmJSModule::updateOffsets().
- Delete CodeOffsetLabel::fixup().
- Delete ICEnty::fixupReturnOffset().
- Delete LSafepoint::fixupOffset().
- Delete OsiIndex::fixUpOffset().
- Delete PCMappingEntry::fixupNativeOffset().

Also delete all code calling these functions.
Attachment #8675179 - Flags: review?(nicolas.b.pierron)
Attachment #8674978 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8674979 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8674981 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8674982 [details] [diff] [review]
Fix ARM64 OOM bugs in Assembler::bind().

Review of attachment 8674982 [details] [diff] [review]:
-----------------------------------------------------------------

This deserve a comment saying that to be safe, we do not update pointers within the buffer in case of ooms, even if the Label were used.
Attachment #8674982 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8674983 [details] [diff] [review]
Copy pool data into buffer immediately.

Review of attachment 8674983 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/IonAssemblerBufferWithConstantPools.h
@@ +564,5 @@
>          // Dump the pool with a guard branch around the pool.
> +        BufferOffset guard = this->putBytes(guardSize_ * InstSize, nullptr);
> +        BufferOffset header = this->putBytes(headerSize_ * InstSize, nullptr);
> +        BufferOffset data =
> +          this->putBytesLarge(pool_.getPoolSize(), (const uint8_t*)pool_.poolData());

Do we expect to mutate pool entries after registering them?
Should we align the large chunks on sizeof(PoolAllocUnit)?
Attachment #8674983 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8674984 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8674986 [details] [diff] [review]
Use a Vector for poolInfo_.

Review of attachment 8674986 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/IonAssemblerBufferWithConstantPools.h
@@ +364,5 @@
>          poolMaxOffset_(poolMaxOffset),
>          pcBias_(pcBias),
>          pool_(),
>          instBufferAlign_(instBufferAlign),
> +        poolInfo_(this->lifoAlloc_),  // OK: Vector() doesn't allocate, append() does.

This comment is not necessary as this is common all over the engine.
All our constructors are infallible, and thus none of them are supposed to allocate memory.
Attachment #8674986 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8674987 [details] [diff] [review]
Switch jit::Pool to a LifoAllocPolicy.

Review of attachment 8674987 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/IonAssemblerBufferWithConstantPools.h
@@ +181,5 @@
>      unsigned insertEntry(unsigned num, uint8_t* data, BufferOffset off, LifoAlloc& lifoAlloc) {
>          if (oom_)
>              return OOM_FAIL;
> +        unsigned ret = numEntries();
> +        if (!poolData_.append((PoolAllocUnit*)data, num) || !loadOffsets.append(off)) {

We should probably open a follow-up bug to get rid of this cast.
Attachment #8674987 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8674988 [details] [diff] [review]
Eliminate poolSizeBefore().

Review of attachment 8674988 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Assembler-arm.cpp
@@ +633,5 @@
>      isFinished = true;
>  
>      for (unsigned int i = 0; i < tmpDataRelocations_.length(); i++) {
>          size_t offset = tmpDataRelocations_[i].getOffset();
> +        dataRelocations_.writeUnsigned(offset);

Sounds like we can make a follow-up patch to remove the tmpDataRelocations structure, and just use the dataRelocations_ structure.
Attachment #8674988 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8674989 [details] [diff] [review]
Remove ARM64 temporary offset buffers.

Review of attachment 8674989 [details] [diff] [review]:
-----------------------------------------------------------------

oh, … I should read ahead!
Nice work!

::: js/src/jit/arm64/Assembler-arm64.cpp
@@ +84,5 @@
>      // The jump relocation table starts with a fixed-width integer pointing
>      // to the start of the extended jump table.
> +    if (jumpRelocations_.length()) {
> +        MOZ_ASSERT(jumpRelocations_.length() >= sizeof(uint32_t));
> +        *(uint32_t*)jumpRelocations_.buffer() = ExtendedJumpTable_.getOffset();

Skip this if we OOM before.

Add a comment to mention that the space is reserved by Assembler::addJumpRelocation, before the first entry of the jump relocation table.
Attachment #8674989 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8674990 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8675179 [details] [diff] [review]
Delete Assembler::actualOffset() and transitive closure. r=nbp

Review of attachment 8675179 [details] [diff] [review]:
-----------------------------------------------------------------

Nice Work !!!

I think this is one of the best introduction to the low-level parts of our Jit. :)

::: js/src/jit/IonCaches.cpp
@@ +80,5 @@
>       setAbsolute();
>  }
>  
>  void
> +CodeOffsetJump::xfixup(MacroAssembler* masm)

nit: xfixup?

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ -637,2 @@
>              safepoints_.encode(safepoint);
>          }

nit: Remove these curly braces as well.
Attachment #8675179 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #20)
> Comment on attachment 8674983 [details] [diff] [review]
> Copy pool data into buffer immediately.
> 
> Review of attachment 8674983 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/shared/IonAssemblerBufferWithConstantPools.h
> @@ +564,5 @@
> >          // Dump the pool with a guard branch around the pool.
> > +        BufferOffset guard = this->putBytes(guardSize_ * InstSize, nullptr);
> > +        BufferOffset header = this->putBytes(headerSize_ * InstSize, nullptr);
> > +        BufferOffset data =
> > +          this->putBytesLarge(pool_.getPoolSize(), (const uint8_t*)pool_.poolData());
> 
> Do we expect to mutate pool entries after registering them?

Yes, but not while they live in the AssemblerBuffer AFAICT. Jump table entries are added as constant pool entries, and CodeOffsetJump::{repoint,fixup} translates a pool entry index into a final buffer offset.

It is not completely obvious when these jump table entries may be modified.

> Should we align the large chunks on sizeof(PoolAllocUnit)?

We should perhaps think about a more general handling of constant pool alignments. I think that both ARM and ARM64 would benefit from 16-byte aligned entries for SIMD vectors for example. The more general allocation signature would be f(void *poolEntryData, size_t poolEntryBytes, size_t poolEntryAlignment).

At the moment, the code seems to be relying on sizeof(Inst) == sizeof(PoolAllocUnit) and getting alignment from that.

Filed bug 1216211 - Add support for different constant pool alignments
See Also: → 1216211
(In reply to Nicolas B. Pierron [:nbp] from comment #22)
> Comment on attachment 8674987 [details] [diff] [review]
> Switch jit::Pool to a LifoAllocPolicy.
> 
> Review of attachment 8674987 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/shared/IonAssemblerBufferWithConstantPools.h
> @@ +181,5 @@
> >      unsigned insertEntry(unsigned num, uint8_t* data, BufferOffset off, LifoAlloc& lifoAlloc) {
> >          if (oom_)
> >              return OOM_FAIL;
> > +        unsigned ret = numEntries();
> > +        if (!poolData_.append((PoolAllocUnit*)data, num) || !loadOffsets.append(off)) {
> 
> We should probably open a follow-up bug to get rid of this cast.

I added a note to bug 1216211. I think poolData_ should be a Vector<uint8_t>, and we should support variable pool entry sizes and alignments.
Add comment elaborating how the OOM handling works.
Attachment #8674982 - Attachment is obsolete: true
Remove comment about the infallible constructor.
Attachment #8674986 - Attachment is obsolete: true
Handle OOM, update comment.
Attachment #8674989 - Attachment is obsolete: true
Get rid of 'xfixup' whicih shouldn't have been there in the first place.

Remove unnecessary curly braces.
Attachment #8675179 - Attachment is obsolete: true
Fix build error on 32-bit x86.
Attachment #8675873 - Attachment is obsolete: true
When you push to try, you'll need to rebase onto the current tip of whatever branch you're on (m-c, inbound, etc) to pick up some releng fixes so your try push will actually upload to the right place.
That log needs to be downloaded and extracted to view the actual text file.

The failure is in /builds/slave/m-in_lx-d_sm-arm-sim-000000000/src/js/src/jit-test/tests/gc/oomInArrayProtoTest.js


TEST-PASS | js/src/jit-test/tests/gc/oomInArrayProtoTest.js | Success (code 0, args "--baseline-eager")
Assertion failure: cx->isExceptionPending(), at /builds/slave/m-in_lx-d_sm-arm-sim-000000000/src/js/src/builtin/TestingFunctions.cpp:1161
Exit code: -11
FAIL - gc/oomInArrayProtoTest.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/gc/oomInArrayProtoTest.js | Assertion failure: cx->isExceptionPending(), at /builds/slave/m-in_lx-d_sm-arm-sim-000000000/src/js/src/builtin/TestingFunctions.cpp:1161 (code -11, args "")
INFO exit-status     : -11
INFO timed-out       : False
INFO stderr         2> Assertion failure: cx->isExceptionPending(), at /builds/slave/m-in_lx-d_sm-arm-sim-000000000/src/js/src/builtin/TestingFunctions.cpp:1161
Depends on: 1217061
Thanks, Wes.

This looks like an intermittent missed OOM error that is getting more frequent when applying these patches.

Filed bug 1217061 as a blocker.
Flags: needinfo?(jolesen)
https://hg.mozilla.org/mozilla-central/rev/3aa6ee8288c0
https://hg.mozilla.org/mozilla-central/rev/6d93ac9755f9
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Reopening. I forgot to set the leave-open bit, and only 2 of 11 patches have been committed.

Remaining patches need to wait for bug 1217061.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: