Optimize and simplify nursery allocation

RESOLVED FIXED in Firefox 63

Status

()

RESOLVED FIXED
8 months ago
8 months ago

People

(Reporter: pbone, Assigned: pbone)

Tracking

unspecified
mozilla63
Points:
---

Firefox Tracking Flags

(firefox63 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

8 months ago
Share the JIT bump pointer allocation code between nursery strings allocation and nursery object allocation.  This also optimises nursery object allocation since it now includes fewer literal 64-bit numbers.
(Assignee)

Comment 1

8 months ago
I noticed that while some effort was made to avoid abosolute addresses in the nursery strings allocation path.  The same wasn't done for other nursery objects, and there were a few absolute addresses here.  This patch uses the same code for both allocations and with some initial testing improves octane by 1.5%.  I'm waiting for the talos jobs now.
Assignee: nobody → pbone
Status: NEW → ASSIGNED
Attachment #8995929 - Flags: review?(sphink)
Will this code conceivably be used by wasm when wasm has object support?  There must be strictly zero absolute addresses in wasm jit code that can be cached to disk; not saying you can't do this, just that we need to be aware of that.
Comment on attachment 8995929 [details] [diff] [review]
Bug 1479360 - Share and optimize bump-pointer allocation code

Review of attachment 8995929 [details] [diff] [review]:
-----------------------------------------------------------------

Really r=me, but I guess I'd like to see the final version first, to decide whether I need to get the review double-checked by sumbuddy smarte. Waldo, maybe.

(I'll be working tonight for several hours before our meeting, so we should have lots of overlap and you won't need to wait another day for a review cycle from me.)

::: js/src/jit/MacroAssembler.cpp
@@ +979,3 @@
>      // The nursery position (allocation pointer) and the nursery end are stored
>      // very close to each other -- specifically, easily within a 32 bit offset.
>      // Use relative offsets between them, to avoid 64-bit immediate loads.

This should either add "nursery" to the function name, or (much better) remove it from the comment.

@@ +983,4 @@
>      loadPtr(Address(temp, 0), result);
>      addPtr(Imm32(totalSize), result);
>      const ptrdiff_t endOffset =
> +        uintptr_t(curEndAddr) - uintptr_t(posAddr);

Hm... this is from my pre-existing code, but it seems like we should be asserting away the implementation-defined behavior here.

Say posAddr was a little bit bigger than curEndAddr. Then uintptr_t(curEndAddr) - uintptr_t(posAddr)  is defined behavior and would return a very large unsigned value, but casting it to ptrdiff_t is implemented-defined behavior and might clamp or do something else stupid (in theory).

But I'm not language lawyer. I attempted to come up with a full set of assertions to permit only the desired behavior, and it was taking too long so I decided perhaps it would be best to give up and rely on CheckedInt.

How about

  CheckedInt<uintptr_t> curEndPtr = curEndAddr;
  CheckedInt<uintptr_t> posPtr = posAddr;
  CheckedInt<int32_t> endOffset = curEndAddr - posAddr;
  MOZ_ASSERT(endOffset.isValid());
  branchPtr(Assembler::Below, Address(temp, endOffset.value()), result, fail);

It would be nice to have something more static.

Note that my original optimization eliminated one more 64-bit constant in most cases, by checking whether the Zone pointer was also close enough -- and I got it completely wrong, such that the optimization was statically provable to never be applied. :( So don't trust me on any of this!

::: js/src/jit/MacroAssembler.h
@@ +2280,5 @@
>      void nurseryAllocateObject(Register result, Register temp, gc::AllocKind allocKind,
>                                 size_t nDynamicSlots, Label* fail);
> +    void bumpPointerAllocate(Register result, Register temp, Label* fail,
> +        void* posAddr, const void* curEddAddr,
> +        uint32_t totalSize, uint32_t size);

It seems like the standard parameter order in this file puts the fail label last.
Attachment #8995929 - Flags: review?(sphink)
(Assignee)

Comment 5

8 months ago
(In reply to Steve Fink [:sfink] [:s:] from comment #4)
> Comment on attachment 8995929 [details] [diff] [review]
> Bug 1479360 - Share and optimize bump-pointer allocation code
> 
> Review of attachment 8995929 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Really r=me, but I guess I'd like to see the final version first, to decide
> whether I need to get the review double-checked by sumbuddy smarte. Waldo,
> maybe.
> 
> (I'll be working tonight for several hours before our meeting, so we should
> have lots of overlap and you won't need to wait another day for a review
> cycle from me.)

Okay cool.  Yeah the 'overnight reviews' is sometimes tricky.  Sometimes I work late to prepare a patch that evening so I know I don't have to wait another day.  If I can manage it a good solution will be to keep several patches to several bugs in-flight.

> ::: js/src/jit/MacroAssembler.cpp
> @@ +983,4 @@
> >      loadPtr(Address(temp, 0), result);
> >      addPtr(Imm32(totalSize), result);
> >      const ptrdiff_t endOffset =
> > +        uintptr_t(curEndAddr) - uintptr_t(posAddr);
> 
> Hm... this is from my pre-existing code, but it seems like we should be
> asserting away the implementation-defined behavior here.
> 
> Say posAddr was a little bit bigger than curEndAddr. Then
> uintptr_t(curEndAddr) - uintptr_t(posAddr)  is defined behavior and would
> return a very large unsigned value, but casting it to ptrdiff_t is
> implemented-defined behavior and might clamp or do something else stupid (in
> theory).

I changed the types here from your original code, and I have other patches that add assertions in my workspace.  AIUI ptrdiff_t is always the same size as uintptr_t, the only difference is that ptrdiff_t is signed.  I can still see problems if one address is near 0 and the other is near 1<<63.  So CheckedInt might make everyone feel safer.

> But I'm not language lawyer. I attempted to come up with a full set of
> assertions to permit only the desired behavior, and it was taking too long
> so I decided perhaps it would be best to give up and rely on CheckedInt.
> 
> How about
> 
>   CheckedInt<uintptr_t> curEndPtr = curEndAddr;
>   CheckedInt<uintptr_t> posPtr = posAddr;
>   CheckedInt<int32_t> endOffset = curEndAddr - posAddr;
>   MOZ_ASSERT(endOffset.isValid());
>   branchPtr(Assembler::Below, Address(temp, endOffset.value()), result,
> fail);
> 
> It would be nice to have something more static.

Maybe, but it might need to be added to the Nursery class, and rather than return pointers to both the current end and position, instead return one as an offset from the other, and do the static check in the Nursery class.

> Note that my original optimization eliminated one more 64-bit constant in
> most cases, by checking whether the Zone pointer was also close enough --
> and I got it completely wrong, such that the optimization was statically
> provable to never be applied. :( So don't trust me on any of this!

Yes, I saw that, but I didn't follow through enough to convice myself that it was broken.  We may be able to address that, and maybe even Lars' query if we re-arrange these structures and place all three fields into one structure.  While thinking of that, Lars, is there one absolute address pointing to a structure with everything wasm will already have?  or maybe a register for this purpose?  Something like a thread local data register?
(Assignee)

Comment 6

8 months ago
I've done some better tests.  I get a 2.1% improvment on octane in the shell.  with 8 samples before and after the change.  It is statistically sagnificant with p = 0.0394
(Assignee)

Comment 7

8 months ago
The assertions are fixed in Bug 1478902 and apply before this patch within my workspace.
Attachment #8995929 - Attachment is obsolete: true
Attachment #8996214 - Flags: review?(sphink)
(Assignee)

Updated

8 months ago
Blocks: 1473213
Attachment #8996214 - Flags: review?(sphink) → review+

Comment 8

8 months ago
Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a914cedebde5
Share and optimize bump-pointer allocation code r=sfink
(Assignee)

Comment 9

8 months ago
Overholt asked if I could test this with speedometer.  It makes a 2.4% improvment to speedometer when I test locally.  p=0.0004  :D :D
Flags: needinfo?(overholt)

Comment 10

8 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a914cedebde5
Status: ASSIGNED → RESOLVED
Last Resolved: 8 months ago
status-firefox63: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
(In reply to Paul Bone [:pbone] from comment #9)
> Overholt asked if I could test this with speedometer.  It makes a 2.4%
> improvment to speedometer when I test locally.  p=0.0004  :D :D

\o/
Flags: needinfo?(overholt)
(In reply to Paul Bone [:pbone] from comment #9)
Nice!  

Also this shows that we do need to be careful when adding anything to the allocation fast path.
(Assignee)

Comment 13

8 months ago
:jonco  True, I'm already going to make an improvment to Bug 1473213 based on what I learnt here.
You need to log in before you can comment on or make changes to this bug.