Closed Bug 969012 Opened 6 years ago Closed 6 years ago

Figure out why MNewSlot sometimes leaks in GGC builds

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: terrence, Assigned: terrence)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 13 obsolete files)

82.59 KB, patch
terrence
: review+
terrence
: checkin+
Details | Diff | Splinter Review
978 bytes, patch
terrence
: review+
terrence
: checkin+
Details | Diff | Splinter Review
1.45 KB, patch
jandem
: review+
Details | Diff | Splinter Review
It seems to be intermittent, so maybe a race?
Blocks: 961795
Attached patch notify_initial_slots-v0.diff (obsolete) — Splinter Review
Right, the basic problem here is that I accidentally dropped notifyInitialSlots when fixing the way we allocate initial slots in the nursery in the interpreter. The attached patch fixes this, but I'm going to keep the bug open to fix the underlying problem as well.

The way we implement allocation in the JIT doesn't quite match up with either our old or new behavior in the interpreter. The current situation: we never allocate in the jit if the object/array requires malloc. This is controlled by MNewFoo::shouldUseVM. The one exception is for CallObject, which I guess is critical for EarleyBoyer. In the case of CallObject, we create an MNewSlots, which does a callVM to js_malloc, then links its exit to MNewCallObject which follows the normal inline object path, then overwrites the inline slots pointer. 

I think we should push this behavior down into the allocator itself and allow any random object with external slots to be jit allocated. I'll try to whip something up this week.
Attachment #8373511 - Flags: review?(jcoppeard)
Comment on attachment 8373511 [details] [diff] [review]
notify_initial_slots-v0.diff

Review of attachment 8373511 [details] [diff] [review]:
-----------------------------------------------------------------

Right, yes this meant we didn't track slots that were externally allocated and passed in to JSObject::create().
Attachment #8373511 - Flags: review?(jcoppeard) → review+
Whiteboard: [leave open
This patch simplifies the interpreter side of things by removing the externalSlots parameter from JSObject::create. On the jit side we still allocate and use the MNewSlots slots, leaking the interpreter-allocated slots if we happen to take the non-jit path. There are more dependencies to remove before we can kill off the MNewSlots and plug the leak, but otherwise jit-tests and octane pass.

Note: I'll be squashing this entire series of 5 patches together before landing, but it should be much easier to review split up like this.

Note2: This series is more or less performance neutral currently; I still need to apply the new optimization to object allocations other than CallObject.
Attachment #8375801 - Flags: review?(jdemooij)
Part 2 of 5: refactor the jit GC object allocator. This is mostly code motion an comment fixes. It also adds the ability to allocate slots inline in the nursery. With this patch we now leak the interpreter allocated slots /and/ the new inline nursery slots. Otherwise, octane and jit-tests still pass.
Attachment #8375805 - Flags: review?(jdemooij)
Part 3 of 5: teach the jit how to malloc (and free on error) external slots when allocating in the tenured generation. This is required so that we can kill off MNewSlots without taking a 1000 point performance hit on EarleyBoyer when GGC is disabled. I think after GGC is permanently enabled we should rip out this extra complexity.
Attachment #8375808 - Flags: review?(jdemooij)
Attached patch 4_of_5_rip_out_mnewslots-v0.diff (obsolete) — Splinter Review
Part 4 of 5: rip out MNewSlots.
 12 files changed, 39 insertions(+), 232 deletions(-)

Unfortunately, PJS was depending on MNewSlots, however I believe this is more an artifact of cargo culting than intention. I needed to trim down the size of two of their tests to ensure they stay compiling with only inline slots in CallObjects. It should be easy enough to fix if they were actually depending on this for performance, but I guess they don't have any key benchmarks yet.
Attachment #8375815 - Flags: review?(jdemooij)
Attached patch 5_of_5_tests-v0.diff (obsolete) — Splinter Review
Part 5 of 5: test the new code.

It turns out the JSAPI methods for toggling GGC had rotted quite a bit and weren't exposed in the shell either. The tests themselves are copies of date-format-tofte with greatly stripped down iteration counts. I used masm.printf to ensure that the relevant allocation paths are getting hit. I tested the free path manually, but was not able to reduce a reliable test for it: oomAfterALlocations is a hard tool to use manually. I will fold and request fuzzing once I get review on the series.
Attachment #8375820 - Flags: review?(jdemooij)
Comment on attachment 8375801 [details] [diff] [review]
1_of_5_stop_accepting_external_slots-v0.diff

Review of attachment 8375801 [details] [diff] [review]:
-----------------------------------------------------------------

Nice simplification.

::: js/src/jit/VMFunctions.cpp
@@ +525,5 @@
>  }
>  
>  JSObject *
>  NewCallObject(JSContext *cx, HandleScript script,
> +              HandleShape shape, HandleTypeObject type)

Nit: this fits on one line now I think.
Attachment #8375801 - Flags: review?(jdemooij) → review+
Comment on attachment 8375805 [details] [diff] [review]
2_of_5_refactor_jit_allocator-v0.diff

Review of attachment 8375805 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonMacroAssembler.cpp
@@ +733,5 @@
> +{
> +    checkAllocatorState(fail);
> +    if (nurseryAllocate(result, allocKind, fail, nDynamicSlots, initialHeap))
> +        return;
> +    freeSpanAllocate(result, allocKind, fail);

Nice.
Attachment #8375805 - Flags: review?(jdemooij) → review+
Comment on attachment 8375808 [details] [diff] [review]
3_of_5_malloc_external_slots_for_tenured_jit_allocs-v0.diff

Review of attachment 8375808 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great. r=me with comments addressed.

I agree we should take this out once GGC is enabled, but for now this looks fine.

::: js/src/jit/CodeGenerator.cpp
@@ +145,5 @@
> +// API call to malloc for slots.
> +template <typename LIR>
> +class OutOfLineMallocSlots : public OutOfLineCodeBase<CodeGenerator>
> +{
> +    LIR *lir_;

This member is not used, can we remove it + the LIR template parameter? Same for the other OOL class.

@@ +259,5 @@
> +// objects that require them. In the case that such paths are not required, the
> +// given labels are returned as nullptr.
> +template <typename T>
> +bool
> +CodeGenerator::emitOutOfLineMallocFree(T *lir, const Register &reg, size_t nDynamicSlots,

You can remove the lir argument + LIR template parameter here IIUC. Also from visitOutOfLineMallocSlots + visitOutOfLineFreeSlots.

@@ +275,5 @@
> +    if (!addOutOfLineCode(oolMalloc))
> +        return false;
> +
> +    OutOfLineFreeSlots<T> *oolFree =
> +        new(alloc()) OutOfLineFreeSlots<T>(lir, reg, fallback);

Nit: fits on one line.

::: js/src/jit/IonMacroAssembler.cpp
@@ +758,5 @@
> +        push(result);
> +
> +        // Allocate the object and write the malloced slots.
> +        freeSpanAllocate(result, allocKind, &allocFail);
> +        pop(Operand(Address(result, JSObject::offsetOfSlots())));

Please add |pop(const Address &dest)| to Assembler-x86-shared.h (just like the |push(const Address &src)| there), so that you don't need the extra Operand(...).

We should also add it to MacroAssembler-arm.h, there you can use the scratch register I think. The ARM simulator can be used to test this (build an x86 shell and pass --enable-arm-simulator to configure).

@@ +897,5 @@
> +    uint32_t nslots = templateObject->lastProperty()->slotSpan(templateObject->getClass());
> +    if (nslots == 0)
> +        return;
> +
> +    //masm.moveValue(UndefinedValue(), reg); then masm.storePtr(reg, ...)

Nit: remove this line.
Attachment #8375808 - Flags: review?(jdemooij) → review+
Attachment #8375815 - Flags: review?(jdemooij) → review+
Comment on attachment 8375820 [details] [diff] [review]
5_of_5_tests-v0.diff

Review of attachment 8375820 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/ion/callobj-nursery-slots.js
@@ +1,1 @@
> +// |jit-test| tz-pacific

The other test has disableGenerationalGC(), but does this one check anything not tested by check-date-format-tofte?

::: js/src/jsfriendapi.cpp
@@ +946,2 @@
>  #ifdef JSGC_GENERATIONAL
> +    if (!IsGenerationalGCEnabled(rt)) {

Good catch.
Attachment #8375820 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #12)
> ::: js/src/jit/CodeGenerator.cpp
> @@ +145,5 @@
> > +// API call to malloc for slots.
> > +template <typename LIR>
> > +class OutOfLineMallocSlots : public OutOfLineCodeBase<CodeGenerator>
> > +{
> > +    LIR *lir_;
> 
> This member is not used, can we remove it + the LIR template parameter? Same
> for the other OOL class.

D'oh! I was using that to only sync the actually live registers using the safepoint, but forgot to remove it when I realized it's not actually that useful.
 
(In reply to Jan de Mooij [:jandem] from comment #13)
> Comment on attachment 8375820 [details] [diff] [review]
> ::: js/src/jit-test/tests/ion/callobj-nursery-slots.js
> @@ +1,1 @@
> > +// |jit-test| tz-pacific
> 
> The other test has disableGenerationalGC(), but does this one check anything
> not tested by check-date-format-tofte?

This test gets us out-of-line CallObject slots in the nursery. The other test disables the nursery, forcing those slots to use the malloc path. Otherwise, it is very hard to test the fallback in --enable-gcgenerational builds because you need a singleton call object.
 
> ::: js/src/jsfriendapi.cpp
> @@ +946,2 @@
> >  #ifdef JSGC_GENERATIONAL
> > +    if (!IsGenerationalGCEnabled(rt)) {
> 
> Good catch.

I'm pretty sure I added that code in the first place. O_O

========

I spent all of yesterday taking a closer look at allocation in Octane. We have 9 ways to create an object in the jit and their relative usage looks like this:

Opcode                      calls | percentage
--------------------------------------------
CreateThisWithTemplate 53,312,388 | 62.98%
NewArray               10,231,017 | 12.08%
NewObject               8,778,040 | 10.37%
NewCallObject           7,557,391 | 8.93%
Lambda                  4,760,668 | 5.62%
Array::concat               8,797 | 0.01%
NewDeclEnvObject              471 | 0%
NewStringObject                 0 | 0%
Rest                            0 | 0%
========================================
Total                  84648772 | 100%

Of the 62% CreateThisWithTemplate, 19,670,001 (e.g. 37%) of them occur in EarleyBoyer with a size class of OBJECT2. Of the NewCallObject, 4,964,684 (e.g. 66%) of them occur in EarleyBoyer. Of those, only a couple thousand use dynamic slots, needing this optimization. However, if we do just disable this optimization (e.g. use callVM instead of callAPI), we lose about 1000 points on EarleyBoyer.

I thought it was simple noise at first, but with the above series applied we lose about 50 points on EB. The only difference in codegen is that the new code inverts the stores for |slots| and |elements|, making them occur out-of-order. I need to play around a bit more here. Should have a patch soon that at the very least fixes our ordering here.
Fixing the object initialization write order fixed the regression on EB. Moreover, requesting an extra temp register on all of the above allocators did not affect Octane performance. In combination, I think this means we can get a big boost to allocation performance by making use of a second register. In particular it would let us kill at least 27M unneeded loads of a constant UndefinedValue into a register, even on x86, and take at least one sub out of the hottest allocation path. I'll work on this next.
Great success; with this patch and the next, we gain ~100-300 points on EarleyBoyer. This patch adds the extra register to the allocation paths, but doesn't do anything with it. As far as I can tell running locally, this has no effect on octane performance.
Attachment #8376613 - Flags: review?(jdemooij)
Comment on attachment 8376613 [details] [diff] [review]
6_of_5_add_extra_temp_register_for_allocator-v0.diff

Review of attachment 8376613 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonMacroAssembler.h
@@ +773,5 @@
>      void checkAllocatorState(Label *fail);
> +    bool nurseryAllocate(const Register &result, const Register &temp, gc::AllocKind allocKind,
> +                         Label *fail, size_t nDynamicSlots, gc::InitialHeap initialHeap);
> +    void freeSpanAllocate(const Register &result, const Register &temp, gc::AllocKind allocKind,
> +                          Label *fail);

Nit: we use both "const Register &reg" and "Register reg" for Register arguments. I think Register is ok as it's a small class that fits in 32-bits, and using that for these methods would make these signatures a bit shorter/more legible.
Attachment #8376613 - Flags: review?(jdemooij) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/bea3f06585ec
Green try run at: https://tbpl.mozilla.org/?tree=Try&rev=3528ebd9858b

(In reply to Jan de Mooij [:jandem] from comment #13)
> ::: js/src/jit-test/tests/ion/callobj-nursery-slots.js
> @@ +1,1 @@
> > +// |jit-test| tz-pacific
> 
> The other test has disableGenerationalGC(), but does this one check anything
> not tested by check-date-format-tofte?

No, it isn't. I'm not sure why I misread the question before, but I've removed the new duplicate test now.
We can also get rid of one more sub in the CallObject-with-extended-slots path with our new temp reg and get another 50 points on EarleyBoyer on top of the slot-initialization gains. This may not be a huge performance win, but it is a pretty nice simplicity win. I just hope the extra register doesn't regress x86. Try run is up to check that: https://tbpl.mozilla.org/?tree=Try&rev=a9c9e9e42e0b

I left out the loop-based initialization since there are only a handful of objects that would hit that path currently. I tried to make the CreateThisWithTemplate path use extended slots to test this, but we go to extreme lengths to hide these from IonBuilder entirely. I have yet to figure out how to make that stop happening; e.g. if I make grow-slots on the various CreateThisWithTemplate template objects work as expected, we get several thousand templates floating around that would use extended slots, but somehow none end up flowing into compilations. I expect there is probably a hardcoded test in one of the analysis passes that I haven't found yet.

I think at this point NewArray is likely to be lower hanging fruit, so I'll probably try that next.
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d1e797181e3

This seems to be failing on m-i, even though my local tests ran fine. I'll investigate tomorrow.
The issue here is that we are now generating more than 32MiB of code on arm for some jit-tests, so the arm assembler is exploding. Marty thinks there will be a patch fixing this available soon, so I'm going to hold off on landing until then.
These ARM failures are unexpected and Marty wants to debug them locally.
Attachment #8382401 - Flags: feedback?(mrosenberg)
Attached patch 7_use_extra_reg_in_init-v0.diff (obsolete) — Splinter Review
As far as I can tell locally this is performance neutral on octane on an x86 build running on x64 and a tiny perf win on x64.
Attachment #8377351 - Attachment is obsolete: true
Attachment #8383299 - Flags: review?(jdemooij)
Comment on attachment 8383299 [details] [diff] [review]
7_use_extra_reg_in_init-v0.diff

Review of attachment 8383299 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonMacroAssembler.cpp
@@ +803,5 @@
> +    movl(Imm32(jv.s.payload.i32), temp);
> +    for (unsigned i = start; i < nfixed; i++)
> +        movl(temp, ToPayload(Address(obj, JSObject::getFixedSlotOffset(i))));
> +#else
> +    moveValue(UndefinedValue(), temp);

This won't work on ARM, maybe we could use JS_NUNBOX32 instead of JS_CODEGEN_X86 and change movl to store32?

@@ +861,5 @@
>          int elementsOffset = JSObject::offsetOfFixedElements();
>  
>          addPtr(Imm32(elementsOffset), obj);
>          storePtr(obj, Address(obj, -elementsOffset + JSObject::offsetOfElements()));
>          addPtr(Imm32(-elementsOffset), obj);

Doesn't have to happen in this patch, but with the temp register we could do something like this instead:

computeEffectiveAddress(Address(obj, elementsOffset), temp);
storePtr(temp, Address(obj, JSObject::offsetOfElements());

We have similar-looking code in newGCThing that could maybe do this as well.
Attachment #8383299 - Flags: review?(jdemooij)
https://hg.mozilla.org/integration/mozilla-inbound/rev/3784de966811

I also switching out the same add/store/sub pattern in the allocators for lea/store. This seems to be worth ~1000 on EB, at least on ggc x64.
Keywords: leave-open
Whiteboard: [leave open]
Attachment #8376613 - Flags: checkin+
This is a big and complicated patch with lots of moving parts. Sorry for asking for review for a second time; at least the changed bits are all fairly isolated.

Differences in this version:

1) Switch per-op OOL malloc/free code to a single stub on the JitRuntime to work around an ARM codegen bug and to save us lots of code space as we add more of these. Of particular note: I'm curious if there is a normal or expected way to write the code in MacroAssembler::call{Malloc|Free}Stub. I've just used move/xchg to setup the correct register state, but it seems like there is probably a tool I don't know about for this? Exchanging some allocated registers for a hard-coded set of registers seems like it would be a common problem.

2) Merge new/initGCThing to createGCObject where possible. If we allocate external slots, we return that in the temp reg. I wanted to prevent people from accidentally clobbering the temp reg between newGCThing and initGCThing. It seems that all the callers that can't trivially be merged do not allocate external slots yet, so I've added assertions to the other case.

3) Rework fillSlotsWithUndefined to take an address and manually bump the offset. This lets us use the same code for fixed and dynamic slots. Also invert the two loops for the 32bit case so that we actually fill low to high, as intended. Whoops!

This is still missing some important optimizations that will become more important as we start allocating more things with external slots.

1) We currently fill the fixed and dynamic slots separately. This ends up with us loading the same UndefinedValue immediate into the same register a second time. Not a huge deal, but as we have no hyperoptimizer/peepholer, we might want to pass a bool to kill the second load.

2) Make the slots initialization use a loop. We'll need a scratch reg for the counter, so I guess we only want to do this on ARM and x64?

3) When we allocate the external slots in the nursery, they are right after the fixed slots. We should be able to do the initialization of both in a single loop, rather than having two back-to-back loops. This will also keep us from having to push/pop for a spare reg.

4) And of course, actually  use this code for more than NewCallObject. Sadly this is not entirely straightforward, so will have to be done as followup.

Also, for the record, the underlying problem this bug is solving is the leak of the MNewSlots malloc pointer when we inline allocation of the gc thing in the nursery for the MNewCallObject that takes the slots. We only end up recording the slots in hugeSlots if we bail; when we stay in jitcode, LNewCallObject just writes the non-nursery slots into the object without telling the nursery to free them later.
Attachment #8375801 - Attachment is obsolete: true
Attachment #8375805 - Attachment is obsolete: true
Attachment #8375808 - Attachment is obsolete: true
Attachment #8375815 - Attachment is obsolete: true
Attachment #8375820 - Attachment is obsolete: true
Attachment #8387896 - Flags: review?(jdemooij)
Attachment #8383299 - Flags: checkin+
Comment on attachment 8387896 [details] [diff] [review]
inline_allocation_of_external_slots-v1.diff

Review of attachment 8387896 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great. It's indeed pretty complicated to get right, but splitting it up in different MacroAssembler methods helps a lot.

::: js/src/jit/Ion.cpp
@@ +288,5 @@
>      return true;
>  }
>  
>  JitCode *
> +JitRuntime::generateMallocStub(JSContext *cx)

Please move these methods to CodeGenerator.cpp (generateStringConcatStub is also there).

@@ +302,5 @@
> +    regs.takeUnchecked(regNBytes);
> +    masm.PushRegsInMask(regs);
> +
> +    masm.setupUnalignedABICall(2, regTemp);
> +    masm.movePtr(ImmPtr(GetIonContext()->runtime), regRuntime);

Nit: cx->runtime() (GetIonContext() is relatively slow and although it won't matter here, we try to avoid it as much as possible.)

::: js/src/jit/IonMacroAssembler.cpp
@@ +723,5 @@
> +
> +    JS_ASSERT(nbytes > 0);
> +
> +    if (result != regNBytes)
> +        movePtr(regNBytes, result);

As discussed, it seems simpler to use push(regNBytes); here and movePtr(regNBytes, result); pop(regNBytes); later.

@@ +724,5 @@
> +    JS_ASSERT(nbytes > 0);
> +
> +    if (result != regNBytes)
> +        movePtr(regNBytes, result);
> +    mov(ImmWord(nbytes), regNBytes);

Nit: move32(Imm32(nbytes), regNBytes);

@@ +738,5 @@
> +    // This register must match the one in JitRuntime::generateFreeStub.
> +    const Register regSlots = CallTempReg0;
> +
> +    if (slots != regSlots)
> +        xchg(slots, regSlots);

I think it'd be simplest to use push/pop here as well.

@@ +918,3 @@
>  
> +    Address addr = base;
> +    mov(ImmWord(jv.s.payload.i32), temp);

move32(Imm32(...), temp);

@@ +925,1 @@
>      mov(ImmWord(jv.s.tag), temp);

Same here.

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +601,5 @@
>      }
> +    void xchg(Register reg1, Register reg2) {
> +        ma_eor(reg2, reg1);
> +        ma_eor(reg1, reg2);
> +        ma_eor(reg2, reg1);

You can remove this method now (FWIW, on ARM we have a scratch register, so we could use 3 moves.)
Attachment #8387896 - Flags: review?(jdemooij) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/907d5bc3bd26

Passes jit-tests on all 3 tier-1 platforms with and without ggc.
And backed out for windows exclusive M-3 and T-d failures. Will investigate before re-landing.

https://hg.mozilla.org/integration/mozilla-inbound/rev/20e8191247fd
Seems to be Linux and Windows exclusive. Only Mac has finished successfully.
I think the problem is that I dropped the same-reg guards before popping, which clobbers the output. Remind me never to program when I'm sick, no matter how simple it looks.

https://tbpl.mozilla.org/?tree=Try&rev=c214de088292
Manually verified that the orange in the push above was from the other patch.

https://hg.mozilla.org/integration/mozilla-inbound/rev/893b864b4b18
Depends on: 984653
(In reply to Terrence Cole [:terrence] from comment #33)
> Manually verified that the orange in the push above was from the other patch.
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/893b864b4b18

This needs to be backed out. This causes an OS restart crash in bug 984653 on Firefox Marketplace in Firefox OS.
(In reply to Jason Smith [:jsmith] from comment #35)
> This needs to be backed out. This causes an OS restart crash in bug 984653
> on Firefox Marketplace in Firefox OS.

Backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7181bf175776
Depends on: 985732
Blocks: 875863
No longer blocks: ggcfuzz, 961795
Assignee: terrence → jcoppeard
Attached patch bug969012-ion-slots (obsolete) — Splinter Review
Rebased patch.  Passes SM tests.
Attachment #8373511 - Attachment is obsolete: true
Attachment #8376613 - Attachment is obsolete: true
Attachment #8382401 - Attachment is obsolete: true
Attachment #8383299 - Attachment is obsolete: true
Attachment #8387896 - Attachment is obsolete: true
Attachment #8382401 - Flags: feedback?(mrosenberg)
Attachment #8408377 - Flags: review+
Duplicate of this bug: 890937
Attached patch bug969012-ion-slots (obsolete) — Splinter Review
Rebased patch.
Attachment #8408377 - Attachment is obsolete: true
Attachment #8418054 - Flags: review+
Comment on attachment 8418054 [details] [diff] [review]
bug969012-ion-slots

:jonco requested a fuzz-run over email (preferably on ARM) for this patch, adding :decoder too.
Attachment #8418054 - Flags: feedback?(gary)
Attachment #8418054 - Flags: feedback?(choller)
Jon, the reason you weren't seeing any dynamic slots is that there was a check in CodeGenerator::visitCallObject that was totally disabling the optimization if numDynamicSlots(). I assume this is simple merge bustage. With this hunk fixed, I was able to repro and fix the test in bug 985732: we just need to compile a jump to the ool path instead of asserting.

Even with that fixed, there was still an opt crash in b2g with this, so I'd like to get fuzzing on it. Gary, do we have arm fuzzing now? If so, this patch would benefit.
Assignee: jcoppeard → terrence
Attachment #8418054 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8418054 - Flags: feedback?(gary)
Attachment #8418054 - Flags: feedback?(choller)
Attachment #8418329 - Flags: review+
Attachment #8418329 - Flags: feedback?(gary)
Depends on: 1006909
Attachment #8418329 - Flags: feedback?(choller)
Depends on: 1006992
(In reply to Terrence Cole [:terrence] from comment #43)
Sorry, I messed up the merge when I rebased this the first time.  In hindsight I should have got you to check the results when it unexpectedly worked...
(In reply to Jon Coppeard (:jonco) from comment #44)
> (In reply to Terrence Cole [:terrence] from comment #43)
> Sorry, I messed up the merge when I rebased this the first time.  In
> hindsight I should have got you to check the results when it unexpectedly
> worked...

I read the new patch 3 times before debugging, so it wouldn't have helped ;-).
(In reply to Terrence Cole [:terrence] from comment #43)
> Even with that fixed, there was still an opt crash in b2g with this, so I'd
> like to get fuzzing on it. Gary, do we have arm fuzzing now? If so, this
> patch would benefit.

The first few hours on opt ARM fuzzing aren't showing up anything yet, but I'll leave this on overnight.
Hmmm, I get crashes like the following with unreproducible testcases so far:

#0  setNext (next_=0x0, this=0xb4427ac8) at /home/odroid/trees/mozilla-inbound/js/src/jit/shared/IonAssemblerBuffer.h:68
#1  perforate (this=0xb1d6e594) at /home/odroid/trees/mozilla-inbound/js/src/jit/shared/IonAssemblerBuffer.h:251
#2  perforate (this=0xb1d6e594) at /home/odroid/trees/mozilla-inbound/js/src/jit/shared/IonAssemblerBufferWithConstantPools.h:620
#3  markGuard (this=0xb1d6e594) at /home/odroid/trees/mozilla-inbound/js/src/jit/shared/IonAssemblerBufferWithConstantPools.h:997
#4  js::jit::Assembler::as_b (this=0xb1d6e380, off=..., c=<optimized out>, isPatchable=<optimized out>) at /home/odroid/trees/mozilla-inbound/js/src/jit/arm/Assembler-arm.cpp:1839
#5  0x0018279c in js::jit::Assembler::as_b (this=0xb1d6e380, l=0xb62efcd0, c=js::jit::Assembler::Always, isPatchable=<optimized out>) at /home/odroid/trees/mozilla-inbound/js/src/jit/arm/Assembler-arm.cpp:1871
#6  0x00182804 in js::jit::Assembler::as_b (this=<optimized out>, l=l@entry=0xb62efcd0, c=c@entry=js::jit::Assembler::Always, isPatchable=isPatchable@entry=false) at /home/odroid/trees/mozilla-inbound/js/src/jit/arm/Assembler-arm.cpp:1876
#7  0x000c3c70 in jump (label=0xb62efcd0, this=<optimized out>) at /home/odroid/trees/mozilla-inbound/js/src/jit/arm/MacroAssembler-arm.h:711
#8  js::jit::CodeGenerator::visitCallKnown (this=0xb1d6e350, call=0xb24761f8) at /home/odroid/trees/mozilla-inbound/js/src/jit/CodeGenerator.cpp:2398
#9  0x000c54a0 in js::jit::CodeGenerator::generateBody (this=this@entry=0xb1d6e350) at /home/odroid/trees/mozilla-inbound/js/src/jit/CodeGenerator.cpp:3417
#10 0x000c79e4 in js::jit::CodeGenerator::generate (this=this@entry=0xb1d6e350) at /home/odroid/trees/mozilla-inbound/js/src/jit/CodeGenerator.cpp:6419
#11 0x00120528 in GenerateCode (lir=0xb2eaf5f0, mir=0x17297980) at /home/odroid/trees/mozilla-inbound/js/src/jit/Ion.cpp:1662
#12 js::jit::CompileBackEnd (mir=mir@entry=0x17297980) at /home/odroid/trees/mozilla-inbound/js/src/jit/Ion.cpp:1680
#13 0x001fbacc in js::WorkerThread::handleIonWorkload (this=this@entry=0xed5838) at /home/odroid/trees/mozilla-inbound/js/src/jsworkers.cpp:806
#14 0x001fbd74 in js::WorkerThread::threadLoop (this=0xed5838) at /home/odroid/trees/mozilla-inbound/js/src/jsworkers.cpp:1043
#15 0xb6f5e11e in _pt_root () from /home/odroid/Desktop/shell-cache/js-opt-32-dm-ts-hfp-linux-8a5a9a06f59a-969012-c43-3368daa84dec/libnspr4.so
#16 0xb6f76fbc in start_thread (arg=0xb62f0450) at pthread_create.c:314
#17 0xb6d92b3c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


Are these somewhat related?
I have no idea. Maybe Marty will know what's going on here?
Flags: needinfo?(mrosenberg)
Man, I wish all the assembler buffer bugs were this easy.
Attachment #8419685 - Flags: review?(terrence)
Flags: needinfo?(mrosenberg)
Comment on attachment 8419685 [details] [diff] [review]
derp_oom-r0.patch

Review of attachment 8419685 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

Great find, Gary!
Attachment #8419685 - Flags: review?(terrence) → review+
Relanding with the test from bug 985732:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f844291b895b

No try run this time, but we have a day and a half of fuzzing and it didn't destroy the tree last time it landed.
Duplicate of this bug: 985732
Comment on attachment 8418329 [details] [diff] [review]
remove_mnewslots_and_use_malloc_directly-rebased-20140506.diff

Thanks for the fuzzing!
Attachment #8418329 - Flags: feedback?(gary)
Attachment #8418329 - Flags: feedback?(choller)
Attachment #8418329 - Flags: checkin+
Comment on attachment 8419685 [details] [diff] [review]
derp_oom-r0.patch

And Marty's oom fix, since I'm pushing things now.
https://hg.mozilla.org/integration/mozilla-inbound/rev/01f27ad85b1b
Attachment #8419685 - Flags: checkin+
Keywords: leave-open
Yeah, I forgot to put a header on the test telling it to expect a js error.

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/1a7b71c4b284
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/accdf191ac4e
Flags: needinfo?(terrence)
this bug  appears to have regressed Octane-Raytrace on machine 11 by 6.3% on AWFY.

From AWFY : http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=11291a7123f4&tochange=accdf191ac4e
(In reply to mayankleoboy1 from comment #57)
> this bug  appears to have regressed Octane-Raytrace on machine 11 by 6.3% on
> AWFY.
> 
> From AWFY :
> http://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=11291a7123f4&tochange=accdf191ac4e

Needinfo? from Terrence about comment 57, perhaps it needs a new bug?
Flags: needinfo?(terrence)
Confirmed it is indeed this patch. I can reproduce locally and bisecting gives http://hg.mozilla.org/integration/mozilla-inbound/rev/accdf191ac4e .
https://hg.mozilla.org/mozilla-central/rev/1a7b71c4b284
https://hg.mozilla.org/mozilla-central/rev/accdf191ac4e
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Found it. Here is the code we are now generating for Allocation and init:

Check if the object metadata hook is set:
   0x7fffeec2c4dc:	cmpl   $0x0,0xf2ade8
   0x7fffeec2c4e4:	jne    0x7fffeec7d684
Allocate object4 from the nursery:
   0x7fffeec2c4ea:	mov    0xf2acc8,%rdi
   0x7fffeec2c4f2:	lea    0x40(%rdi),%r8
   0x7fffeec2c4f6:	cmp    %r8,0xf2acd8
   0x7fffeec2c4fe:	jbe    0x7fffeec7d684
   0x7fffeec2c504:	mov    %r8,0xf2acc8
Assign shape and type from template:
   0x7fffeec2c50c:	movabs $0x7fffee967740,%r11
   0x7fffeec2c516:	mov    %r11,(%rdi)
   0x7fffeec2c519:	movabs $0x7fffee9682e0,%r11
   0x7fffeec2c523:	mov    %r11,0x8(%rdi)
Assign null slots and empty elements:
   0x7fffeec2c527:	movq   $0x0,0x10(%rdi)
   0x7fffeec2c52f:	movq   $0xc84610,0x18(%rdi)
Initialize all slots with undefined:
   0x7fffeec2c538:	movabs $0xfff9000000000000,%r8
   0x7fffeec2c542:	mov    %r8,0x20(%rdi)
   0x7fffeec2c546:	mov    %r8,0x28(%rdi)
   0x7fffeec2c54a:	mov    %r8,0x30(%rdi)
=> 0x7fffeec2c54e:	mov    %r8,0x38(%rdi)

This is completely identical before and after with the exception of the last instruction. The slot-span for the object is less than the fixed slot count. The fix is a one-liner, naturally.

Of course immediately after this we go off to "initialize" the slots with zero. This looks like:
=> 0x7ffff7e36a1c:	movabs $0xfff8800000000000,%r11
   0x7ffff7e36a26:	mov    %r11,0x20(%rdx)
   0x7ffff7e36a2a:	movabs $0xfff8800000000000,%r11
   0x7ffff7e36a34:	mov    %r11,0x28(%rdx)
   0x7ffff7e36a38:	movabs $0xfff8800000000000,%r11
   0x7ffff7e36a42:	mov    %r11,0x30(%rdx)

We should be able to kill some of these loads and some of these writes, somehow.
Attachment #8421359 - Flags: review?(jdemooij)
Flags: needinfo?(terrence)
Comment on attachment 8421359 [details] [diff] [review]
raytrace_regression-v0.diff

Review of attachment 8421359 [details] [diff] [review]:
-----------------------------------------------------------------

Subtle, good find.
Attachment #8421359 - Flags: review?(jdemooij) → review+
Awfy seems to agree this is fixed. Thanks!
Blocks: 1507445
You need to log in before you can comment on or make changes to this bug.