Closed Bug 1303399 Opened 8 years ago Closed 8 years ago

IonMonkey: Inlining failure should not cause a compilation failure.

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: nbp, Unassigned)

References

Details

Attachments

(4 files)

While investigating Bug 1300848, I noticed that some function are compiled more than the number of time they are invalidated [1].  The reason being that some of these functions are not successfully compiling all the time.  One of the problem seems to be that we start a compilation in which we attempt to inline a lot of callees into it, but the callees are failing to be properly inlined, such as the following, which is origin of a failure of a function which successfully got compiled 2 times previously.

#0  0x00007fb4f6ac8d05 in js::jit::MIRGenerator::addAbortedPreliminaryGroup
[…]
#3  0x00007fb4f6a2ee02 in js::jit::IonBuilder::shouldAbortOnPreliminaryGroups
#4  0x00007fb4f6a395db in js::jit::IonBuilder::jsop_length_fastPath
[…]
#8  0x00007fb4f6a135f9 in js::jit::IonBuilder::buildInline
[…]
#15 0x00007fb4f6a135f9 in js::jit::IonBuilder::buildInline
[…]
#22 0x00007fb4f6a135f9 in js::jit::IonBuilder::buildInline
[…]
#29 0x00007fb4f6a135f9 in js::jit::IonBuilder::buildInline
[…]
#37 0x00007fb4f6a0f231 in js::jit::IonBuilder::build
[…]
#41 0x00007fb4f6a0a0c2 in js::jit::IonCompileScriptForBaseline

What we can do, is when we abort an MCall due to inlining reasons, we could revert the MIRGraph to its original state, prior the inlining, and continue by generating a MCall with a known target.  In addition we could also trash the content of the LifoAlloc, as long as we ensure that nothing got buffered in the MIRGenerator / MIRGraph.

[1] http://people.mozilla.org/~npierron/hearthsim-joust.log
This patch adds a BackupPoint, which is used to restore the state of the
graph at the point of the backup.

This removes all the instructions added in the "current" basic block and
removes all blocks added to the MIRGraph since the BackupPoint.

The CFG graph reconstruction adds a lot of issues as blocks which are
populated with the inlining content are not necesseray the last one, nor the
last in the list of blocks, as they are moved around in the basic block
list.  I guess, we could cache the last created block on the MIRGraph, which
would remove the loop from the BackupPoint constructor.

Apart from a remaining intermittent assertion on one test that I have to fix
(potentially in another patch), which is unlikely to cause a rewrite of this
patch, this works fine on the test suite.
Attachment #8794275 - Flags: review?(hv1989)
The current patch fails on 3 test cases on the test suite at the moment.

This add a oom function in IonBuilder (which should be moved to
MIRGenetator), and calls to this functions are inserted after every call
out-of-IonBuilder which can OOM.

I added assertions to check the consistency with the returned value, which
seems represents 2 of the test case failures.

I fear that our coverage of these failures is not well supported, and that
it would be hard to support.  As a work-around, I suggest we treat any
failure with AbortReason_NoAbort as an AbortReason_Disable for the moment,
to get a workable solution on release builds.
Attachment #8795325 - Flags: feedback?(hv1989)
Priority: -- → P2
Attachment #8794253 - Flags: review?(hv1989) → review+
Comment on attachment 8794275 [details] [diff] [review]
part 2 - IonMonkey: Fallback when we fail to inline an uninlinable function.

Review of attachment 8794275 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work! Can you open a follow-up bug to have this BackupPoint for restartLoop also!

::: js/src/jit/MIRGraph.cpp
@@ +1635,5 @@
>  }
>  
> +MBasicBlock::BackupPoint::BackupPoint(MBasicBlock* block)
> +  : block_(block)
> +  , lastBlock_(nullptr)

Style nit: The comma should be at the end, except maybe for the first item in the #ifdef DEBUG

@@ +1720,5 @@
> +        block_->slots_[i] = slots_[i];
> +
> +    MOZ_ASSERT(block_->id() == id_);
> +    MOZ_ASSERT(predecessorsCheckSum_ == computePredecessorsCheckSum(block_));
> +    MOZ_ASSERT(instructionsCheckSum_ == computeInstructionsCheckSum(block_));

We assume that if we tested all predecessors blocks are equal and all instructions in the current block, everything is fine. We don't look at the instructions in the predecessors blocks? Why? Should we also check them?

::: js/src/jit/MIRGraph.h
@@ +662,5 @@
> +    // This class is used for reverting the graph within IonBuilder.
> +    class BackupPoint {
> +        friend MBasicBlock;
> +
> +        MBasicBlock* block_;

rename block to curBlock_ or curentBlock_ or current_ ?

@@ +683,5 @@
> +#endif
> +      public:
> +        BackupPoint(MBasicBlock* block);
> +        MOZ_MUST_USE bool init(TempAllocator& alloc);
> +        void restore(MBasicBlock** block);

I think we should have:
MBasicBlock* restore(MBasicBlock*);

and do
current = backupPoint.restore(current);

?
Attachment #8794275 - Flags: review?(hv1989) → review+
Comment on attachment 8795325 [details] [diff] [review]
part 3 - Report OOM using the MIRGenerator abort reason.

Review of attachment 8795325 [details] [diff] [review]:
-----------------------------------------------------------------

I'm really not a fan of the "return oom(result_type)".

"return false"
has always been the "memory allocation failed".
I.e. abortReason_ should default to "AbortReason_Alloc".

If we fail and the reason is not alloc, we should adjust the abortReason_.
Which is currently done by calling abort().
Attachment #8795325 - Flags: feedback?(hv1989)
(In reply to Hannes Verschore [:h4writer] from comment #5)
> I'm really not a fan of the "return oom(result_type)".
> 
> "return false"
> has always been the "memory allocation failed".
> I.e. abortReason_ should default to "AbortReason_Alloc".
> 
> If we fail and the reason is not alloc, we should adjust the abortReason_.
> Which is currently done by calling abort().

I agree this would be the simplest way forward, but I for-see one of the issue that we got until today, which is 

The problem I see with this way, would be that we cannot properly ensure that people do not forget to specify the reasons on the failure.  If we explicitly specify OOMs and aborts, then the assertions added in the patch would catch when we forget to report a failure.

What do you think?
Flags: needinfo?(hv1989)
(In reply to Nicolas B. Pierron [:nbp] from comment #6)
> (In reply to Hannes Verschore [:h4writer] from comment #5)
> > I'm really not a fan of the "return oom(result_type)".
> > 
> > "return false"
> > has always been the "memory allocation failed".
> > I.e. abortReason_ should default to "AbortReason_Alloc".
> > 
> > If we fail and the reason is not alloc, we should adjust the abortReason_.
> > Which is currently done by calling abort().
> 
> I agree this would be the simplest way forward, but I for-see one of the
> issue that we got until today, which is 
> 
> The problem I see with this way, would be that we cannot properly ensure
> that people do not forget to specify the reasons on the failure.  If we
> explicitly specify OOMs and aborts, then the assertions added in the patch
> would catch when we forget to report a failure.
> 
> What do you think?

I think we had a discussion about this before. I agree that it would nice to make sure this doesn't happen anymore. Though I think that if we want to annotate every "return false" with a oom(), we are worse. In that case it would be a small step to just make every "return false;" return an enum type: "OOM / ABORT" ? Which would be a better IMHO.
Flags: needinfo?(hv1989)
Can we get the r+ patches landed? Would be a waste of time to just let them bit rot?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Hannes Verschore [:h4writer] from comment #7)
> I think we had a discussion about this before. I agree that it would nice to
> make sure this doesn't happen anymore. Though I think that if we want to
> annotate every "return false" with a oom(), we are worse. In that case it
> would be a small step to just make every "return false;" return an enum
> type: "OOM / ABORT" ? Which would be a better IMHO.

Like we discussed IRL bug 1277368 would be good step forward. Seems Jan stepped in and we could have this soonish! Which would be awesome and we could refactor the return type and use the Result.
(In reply to Hannes Verschore [:h4writer] from comment #4)
> ::: js/src/jit/MIRGraph.cpp
> @@ +1635,5 @@
> >  }
> >  
> > +MBasicBlock::BackupPoint::BackupPoint(MBasicBlock* block)
> > +  : block_(block)
> > +  , lastBlock_(nullptr)
> 
> Style nit: The comma should be at the end, except maybe for the first item
> in the #ifdef DEBUG

Done.

Should we change our policy to use leading commas instead of trailing commas.  I much prefer having the comman align with the colon, and never bother changing it each time there is a #ifdef DEBUG.

> @@ +1720,5 @@
> > +        block_->slots_[i] = slots_[i];
> > +
> > +    MOZ_ASSERT(block_->id() == id_);
> > +    MOZ_ASSERT(predecessorsCheckSum_ == computePredecessorsCheckSum(block_));
> > +    MOZ_ASSERT(instructionsCheckSum_ == computeInstructionsCheckSum(block_));
> 
> We assume that if we tested all predecessors blocks are equal and all
> instructions in the current block, everything is fine. We don't look at the
> instructions in the predecessors blocks? Why? Should we also check them?

We don't look inside the predecessor block because they are not supposed to change.  Only the blocks within the current scope are supposed to change.  When we inline a function, this means the current block and any other added after it.

If we wanted full correctness we should do a full checksum of the whole graph, but then we might notice extra side-effects of MIR instruction flags that might be too costly in terms of memory to revert them.

> ::: js/src/jit/MIRGraph.h
> @@ +662,5 @@
> > +    // This class is used for reverting the graph within IonBuilder.
> > +    class BackupPoint {
> > +        friend MBasicBlock;
> > +
> > +        MBasicBlock* block_;
> 
> rename block to curBlock_ or curentBlock_ or current_ ?

Done, current_.

> @@ +683,5 @@
> > +#endif
> > +      public:
> > +        BackupPoint(MBasicBlock* block);
> > +        MOZ_MUST_USE bool init(TempAllocator& alloc);
> > +        void restore(MBasicBlock** block);
> 
> I think we should have:
> MBasicBlock* restore(MBasicBlock*);
> 
> and do
> current = backupPoint.restore(current);
> 
> ?

Done.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #10)
> (In reply to Hannes Verschore [:h4writer] from comment #4)
> > I think we should have:
> > MBasicBlock* restore(MBasicBlock*);
> > 
> > and do
> > current = backupPoint.restore(current);
> > 
> > ?
> 
> Done.

Correction, I changed it to:

MBasicBlock* restore();

and do

current = backupPoint.restore();

As the argument was not read.
Due to the lack of proper identification of OOM (AbortReason_Alloc) we
attempt to continue the compilation without inlining a JSFunction.

As we attempt to remove the basic blocks, we discard all the instructions
and their resume points.  Unfortunately, we cannot remove the resume point
which we failed to initialized because we did not attached it to any
instructions.

This patch discard the resume point from the basic block, if we failed to
initialize its operand vector.  Also, it marks the allocation failure as
being an allocation failure such that we do not attempt to fallback without
inlining, as we do not reclaim any memory.

This patch fixes an intermittent issue on the try server.
Attachment #8805212 - Flags: review?(hv1989)
Comment on attachment 8805212 [details] [diff] [review]
part 3 - Remove resume points from the basic block list when we OOM.

Review of attachment 8805212 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonBuilder.cpp
@@ +8070,2 @@
>          return false;
> +    }

Oh this is so wrong! Not on your side, but generally in IonBuilder.cpp like you already mentioned.
The "return false" should default to OOM failure, which apparently isn't. There are probably a lot of other places where we need to set the abortReason_ because of this, which we don't.
Luckily Jan looks like being quite far in the Result<> patch. Lets do this like here and
as soon as the Result<> patch land fix this properly.
Attachment #8805212 - Flags: review?(hv1989) → review+
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c0f71d17d659
part 1 - Use InliningStatus as return value of inlineScriptedCall. r=h4writer
https://hg.mozilla.org/integration/mozilla-inbound/rev/47e4fb57325d
part 2 - IonMonkey: Fallback when we fail to inline an uninlinable function. r=h4writer
https://hg.mozilla.org/integration/mozilla-inbound/rev/726ec9c83018
part 3 - Remove resume points from the basic block list when we OOM. r=h4writer
Blocks: 1313655
Blocks: 1373323
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: