Load after store on octane richards

RESOLVED FIXED in mozilla35

Status

()

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: h4writer, Assigned: nbp)

Tracking

unspecified
mozilla35
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 3 obsolete attachments)

While looking at the generated IonMonkey code of richards "runRichards" (:47).

> /home/h4writer/Build/octane/richards.js:47
> var e6 = __createthiswithtemplate()
> e6[0] = 0; // storeslot
> e6[1] = 0; // storeslot
> var e12 = newarray {...} /* object (Array) */
> __postwritebarrier(e6, e12)
> e6[2] = e12; // storeslot
> e6[3] = null; // storeslot
> e6[4] = null; // storeslot
> e6[5] = null; // storeslot
> var e26 = __createthiswithtemplate()
> __postwritebarrier(e26, e6)
> e26[0] = e6; // storeslot
> e26[1] = 1; // storeslot
> e26[2] = 1000; // storeslot
> var e36 = e6[3]; // loadslot
> var e38 = __createthiswithtemplate()
> __postwritebarrier(e38, e36)
> e38[0] = e36; // storeslot
> e38[1] = 0; // storeslot
> e38[2] = 0; // storeslot
> e38[3] = null; // storeslot
> __postwritebarrier(e38, e26)
> e38[4] = e26; // storeslot
> ...

This is some pseudo-js code generated from the IM. Now the part I find interesting is the:
> ...
> e6[3] = null; // storeslot
> ...
> var e36 = e6[3]; // loadslot
> ...

This is a load after store. We might just take the variable. I had a small fix for this in "bug 801872", but our AA wasn't good enough to do this globally, only in the same block. Now this pass would work for this case (block wich follow each other).

Now since this isn't the hottest function (it is the root function). It might not make much of a difference. Haven't tested it. I could give some estimate with bug 801872 if this make a difference.

@nbp: now we have some better structure. Does this fit somewhere in the current scalar replacement / recover instructions / ...
Oh I forgot to mention that this function keeps going on. And multiple such occurencies are in that function. I only mentioned the first one ;).
Similar on raytrace:

> /home/h4writer/Build/octane/raytrace.js:560
> ...                                                           
> var e22 = __createthiswithtemplate();                                                                 
> * e22[0] = e16; // storeslot                                                                            
> * e22[1] = e18; // storeslot                                                                            
> * e22[2] = e20; // storeslot
> var e39 = e6[2]; // loadslot
> var e41 = e39[0]; // loadslot                                                                         
> var e42 = e41 * e8
> var e43 = e39[1]; // loadslot                                                                         
> var e44 = e43 * e8
> var e45 = e39[2]; // loadslot                                                                         
> var e46 = e45 * e8
> var e47 = __createthiswithtemplate();                                                                 
> e47[0] = e42; // storeslot                                                                            
> e47[1] = e44; // storeslot                                                                            
> e47[2] = e46; // storeslot
> * var e66 = e22[0]; // loadslot                                                                         
> var e67 = e66 - e42
> * var e68 = e22[1]; // loadslot                                                                         
> var e69 = e68 - e44
> * var e70 = e22[2]; // loadslot                                                                         
> var e71 = e70 - e46
> var e72 = __createthiswithtemplate();                                                                 
> e72[0] = e67; // storeslot                                                                            
> e72[1] = e69; // storeslot                                                                            
> e72[2] = e71; // storeslot
> ...

This one looks very hot.
(In reply to Hannes Verschore [:h4writer] from comment #0)
> > /home/h4writer/Build/octane/richards.js:47
> > var e6 = __createthiswithtemplate()
> > e6[0] = 0; // storeslot
> > e6[1] = 0; // storeslot
> > var e12 = newarray {...} /* object (Array) */
> > __postwritebarrier(e6, e12)
> > e6[2] = e12; // storeslot
> > e6[3] = null; // storeslot
> > e6[4] = null; // storeslot
> > e6[5] = null; // storeslot
> > var e26 = __createthiswithtemplate()
> > __postwritebarrier(e26, e6)
> > e26[0] = e6; // storeslot
> > e26[1] = 1; // storeslot
> > e26[2] = 1000; // storeslot
> > var e36 = e6[3]; // loadslot
> 
> @nbp: now we have some better structure. Does this fit somewhere in the
> current scalar replacement / recover instructions / ...

I don't think our current implementation of scalar replacement would be able to do anything, knowing that it currently consider e6 as being escaped by e26.

On the other hand Bug 1049691 should already handle the aliasing of e6[3] loadelement with e6[3] storeelement.

I will investigate this issue.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #3)
> (In reply to Hannes Verschore [:h4writer] from comment #0)
> > > /home/h4writer/Build/octane/richards.js:47
> > > var e6 = __createthiswithtemplate()
> > > e6[0] = 0; // storeslot
> > > e6[1] = 0; // storeslot
> > > var e12 = newarray {...} /* object (Array) */
> > > __postwritebarrier(e6, e12)
> > > e6[2] = e12; // storeslot
> > > e6[3] = null; // storeslot
> > > e6[4] = null; // storeslot
> > > e6[5] = null; // storeslot
> > > var e26 = __createthiswithtemplate()
> > > __postwritebarrier(e26, e6)
> > > e26[0] = e6; // storeslot
> > > e26[1] = 1; // storeslot
> > > e26[2] = 1000; // storeslot
> > > var e36 = e6[3]; // loadslot
> > 
> > @nbp: now we have some better structure. Does this fit somewhere in the
> > current scalar replacement / recover instructions / ...
> 
> I will investigate this issue.

The alias analysis seems to work fine here, and it is well marked as being dependent on the above store.
The problem we see here is the last condition of MloadFixedSlot::foldsTo, which is that we store and load the same type fails.  In this case we store a MIRType_Null and read a MIRType_Value.

I think in such case we can handle reading MIRType_Value by boxing the value given as argument of the store instruction.

bhackett: Also, I guess this is a sign that we should improve TI / baseline to give us a MIRType_Null instead of a MIRType_Value.
Flags: needinfo?(bhackett1024)
(In reply to Nicolas B. Pierron [:nbp] from comment #4)
> bhackett: Also, I guess this is a sign that we should improve TI / baseline
> to give us a MIRType_Null instead of a MIRType_Value.

It looks like the read here is for this.list in Scheduler.prototype.addTask. TI and Baseline don't know it's always null because the GETPROP also produces non-null values in other cases. Here we inline addTask and that gives us a bit more information than TI has.
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
Flags: needinfo?(bhackett1024)
Posted patch IonMonkey: Spew slot numbers. (obsolete) — Splinter Review
Attachment #8486412 - Flags: review?(hv1989)
I did not expected that, so I triple checked the results
Attachment #8486420 - Flags: review?(hv1989)
Attachment #8486424 - Flags: review?(sstangl)
Flags: needinfo?(nicolas.b.pierron)
Comment on attachment 8486412 [details] [diff] [review]
IonMonkey: Spew slot numbers.

Review of attachment 8486412 [details] [diff] [review]:
-----------------------------------------------------------------

Can you adjust the spewing to:
LoadFixedSlot 0
StoreFixedSlot 0 var1
LoadSlot 0
StoreSlot 0 var1
Attachment #8486412 - Flags: review?(hv1989)
Posted patch IonMonkey: Spew slot numbers. (obsolete) — Splinter Review
Attachment #8486412 - Attachment is obsolete: true
Attachment #8486451 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #11)
> (In reply to Nicolas B. Pierron [:nbp] from comment #8)
> > I did not expected that, so I triple checked the results
> 

And I should have checked more, the master branch against which I compared the results was 3600 commits behind :(

Now, everything falls back under the noise level, but the loads are still being replaced by MBox instructions.
Comment on attachment 8486451 [details] [diff] [review]
IonMonkey: Spew slot numbers.

Review of attachment 8486451 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.cpp
@@ +3046,5 @@
> +    PrintOpcodeName(fp, op());
> +    fprintf(fp, " ");
> +    getOperand(0)->printName(fp);
> +    fprintf(fp, " %zu ", slot());
> +    getOperand(1)->printName(fp);

Why printing in the middle of the operands of the instructions?  This format might make sense for the serialized method that you are trying to do, but it does not make any sense for spewing an assembly-like output.

So far we have been writting things at the end because this was more convenient to overload printOpcode, and call MDefinition::printOpcode to print everything before printing the useful info.

But as printing fields at the end does not satisfy everybody, I think it makes more sense to print it after the opcode and before the list of operands, but not in the middle of the list of operands.
Attachment #8486451 - Flags: review?(nicolas.b.pierron)
Attachment #8486451 - Attachment is obsolete: true
Attachment #8487081 - Attachment description: IonMonkey: Spew slot numbers. r= → IonMonkey: Spew slot numbers.
Attachment #8487081 - Flags: review?(sunfish)
(In reply to Nicolas B. Pierron [:nbp] from comment #15)
> Created attachment 8487081 [details] [diff] [review]
> IonMonkey: Spew slot numbers.

This patch add a printFields function such as we do not have to either call MDefinition::printOpcode, or redo the loop for printing each operand in each printOpcode function.  The output give something like:

  (MRelation {fields}) {operands}

Which keeps the relational part on the same side in the spew of IonGraph.
(In reply to Nicolas B. Pierron [:nbp] from comment #14)
> Comment on attachment 8486451 [details] [diff] [review]
> IonMonkey: Spew slot numbers.
> 
> Review of attachment 8486451 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/MIR.cpp
> @@ +3046,5 @@
> > +    PrintOpcodeName(fp, op());
> > +    fprintf(fp, " ");
> > +    getOperand(0)->printName(fp);
> > +    fprintf(fp, " %zu ", slot());
> > +    getOperand(1)->printName(fp);
> 
> Why printing in the middle of the operands of the instructions?  This format
> might make sense for the serialized method that you are trying to do, but it
> does not make any sense for spewing an assembly-like output.

To have a similar structure as "MStoreElement"
Which prints: "MStoreElement obj slot item".

Else it might be confusing by having different orders:
MStoreElement obj slot item
MStoreFixedSlot obj item slot
Comment on attachment 8486420 [details] [diff] [review]
IonMonkey: Replace untyped loads by boxed value of the store.

Review of attachment 8486420 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if you agree with my comment, otherwise please file another patch for review

::: js/src/jit/MIR.h
@@ +829,5 @@
> +    // if the types are match, and boxing the value if they do not match.
> +    //
> +    // Note: There is no need for such function in AsmJS functions as they do
> +    // not use any MIRType_Value.
> +    MDefinition *foldsToLoaddedValue(TempAllocator &alloc, MDefinition *loaded);

I agree with the whole patch, except for this name. (btw the double d is a typo right?)
What about
- foldsLoadAfterStore()
- operandIfLoadRedundant() // In this case you maybe can also give the slot as argument and test it in that function?
Attachment #8486420 - Flags: review?(hv1989) → review+
Comment on attachment 8487081 [details] [diff] [review]
IonMonkey: Spew slot numbers.

Review of attachment 8487081 [details] [diff] [review]:
-----------------------------------------------------------------

I find it a bit low to change r? because you didn't like my review comments.

I still think it would be good to have similar structure for MStoreElement and these MStoreFixedSlot and MStoreSlot. Like mentioned in comment 17.
Like:
MStoreElement obj slot item
MStoreFixedSlot obj slot item

Instead of (IIUC)
MStoreFixedSlot slot obj item

Though it is not my call anymore.
(In reply to Hannes Verschore [:h4writer] from comment #18)
> ::: js/src/jit/MIR.h
> @@ +829,5 @@
> > +    // if the types are match, and boxing the value if they do not match.
> > +    //
> > +    // Note: There is no need for such function in AsmJS functions as they do
> > +    // not use any MIRType_Value.
> > +    MDefinition *foldsToLoaddedValue(TempAllocator &alloc, MDefinition *loaded);
> 
> I agree with the whole patch, except for this name. (btw the double d is a
> typo right?)
> What about
> - foldsLoadAfterStore()
> - operandIfLoadRedundant() // In this case you maybe can also give the slot
> as argument and test it in that function?

Only by looking at the names, here is what I can understand from the code:

foldsToLoadedValue -- implies we are on a load and that we want to replace the load by the value which we know in advance that it would be the output of this load.

foldsLoadAfterStore -- implies that we would be checking the dependency pointer, and then do the logic for replacing the load.

operandIfLoadRedundant -- The names sounds miss-leading, and it seems to suggest that we replace a Load by its operand if we have seen multiple identical loads.  Which does nowhere near what we want to mean here.

So, based on the code which is in the patch, I still prefer the foldsToLoadedValue name for this function.
(see comment 20)
Attachment #8486420 - Attachment is obsolete: true
Attachment #8487303 - Flags: review?(hv1989)
(In reply to Nicolas B. Pierron [:nbp] from comment #20)
> (In reply to Hannes Verschore [:h4writer] from comment #18)
> > ::: js/src/jit/MIR.h
> > @@ +829,5 @@
> > > +    // if the types are match, and boxing the value if they do not match.
> > > +    //
> > > +    // Note: There is no need for such function in AsmJS functions as they do
> > > +    // not use any MIRType_Value.
> > > +    MDefinition *foldsToLoaddedValue(TempAllocator &alloc, MDefinition *loaded);
> > 
> > I agree with the whole patch, except for this name. (btw the double d is a
> > typo right?)
> > What about
> > - foldsLoadAfterStore()
> > - operandIfLoadRedundant() // In this case you maybe can also give the slot
> > as argument and test it in that function?

Feel free to give other examples.
 
> Only by looking at the names, here is what I can understand from the code:
> 
> foldsToLoadedValue -- implies we are on a load and that we want to replace
> the load by the value which we know in advance that it would be the output
> of this load.

The issue I had with it is that "Loaded" is so generic and can mean anything.
Shouldn't it be "foldsToStoredValue" in that case? Actually?

> 
> foldsLoadAfterStore -- implies that we would be checking the dependency
> pointer, and then do the logic for replacing the load.

Hmm. Though the name doesn't imply it, I can see that we mostly use "foldsTo" that way so might be confusing.

> 
> operandIfLoadRedundant -- The names sounds miss-leading, and it seems to
> suggest that we replace a Load by its operand if we have seen multiple
> identical loads.  Which does nowhere near what we want to mean here.

Yeah, indeed.

boxOnTypeMismatch // But this is incorrect for load type being narrower than stored type.
foldsToStoredValue

Maybe even:

FoldLoadWithStore<Store, Load>(Store *store, Load *load) {
    if (!store->block()->dominates(load->block()))
        return this;
    
    if (store->getElement(0) != load->elements(0))
        return this;

    if (store->index() != load->index())
        return this;
    
    // If the type are matching then we return the value which is used as
    // argument of the store.
    if (store->value()->type() != load->type()) {
         // If we expect to read a type which is more generic than the type seen
         // by the store, then we box the value used by the store.
         if (load->type() != MIRType_Value)
              return this;
		 
         MOZ_ASSERT(store->value()->type() < MIRType_Value);
         MBox *box = MBox::New(alloc, store->value());
         block()->insertBefore(loa, box);
         return box;
}
		 
		return loaded;
}
Pressed enter by accident:

> FoldLoadWithStore<Store, Load>(Store *store, Load *load) {
>     if (!store->block()->dominates(load->block()))
>         return this;
>     
>     if (store->getElement(0) != load->elements(0))
>         return this;
> 
>     if (store->index() != load->index())
>         return this;
>     
>     // If the type are matching then we return the value which is used as
>     // argument of the store.
>     if (store->value()->type() != load->type()) {
>          // If we expect to read a type which is more generic than the type seen
>          // by the store, then we box the value used by the store.
>          if (load->type() != MIRType_Value)
>               return this;
> 		 
>          MOZ_ASSERT(store->value()->type() < MIRType_Value);
>          MBox *box = MBox::New(alloc, store->value());
>          block()->insertBefore(loa, box);
>          return box;
>      }
>      return store->value();
> }

and

> MDefinition *
> MLoadElement::foldsTo(TempAllocator &alloc)
> {
>     if (!dependency() || !dependency()->isStoreElement())
>          return this;
>     
>     return FoldLoadWithStore(dependency()->toStoreElement(), this);
> }
Oh damn. We use slots and index for the LoadFixedSlot/LoadSlot and LoadElement. So this last thing might also not fly :(
(In reply to Hannes Verschore [:h4writer] from comment #24)
> Oh damn. We use slots and index for the LoadFixedSlot/LoadSlot and
> LoadElement. So this last thing might also not fly :(

Can we add "slotOrIndex()" function? Then it should work :D
(sorry for the span. It is getting late.)

(In reply to Hannes Verschore [:h4writer] from comment #23)
> >     if (store->getElement(0) != load->elements(0))
> >         return this;

Being (store->getOperand(0) != load->getOperand(0)) // since the obj/elements/slots are all the first operand
(In reply to Hannes Verschore [:h4writer] from comment #22)
> (In reply to Nicolas B. Pierron [:nbp] from comment #20)
> > Only by looking at the names, here is what I can understand from the code:
> > 
> > foldsToLoadedValue -- implies we are on a load and that we want to replace
> > the load by the value which we know in advance that it would be the output
> > of this load.
> 
> The issue I had with it is that "Loaded" is so generic and can mean anything.

Yes, and this is exactly the intent, as we could use this for other form of Stores / Loads.  I would not see any issue with going with this name, it is as generic as the code that it contains.

> Shouldn't it be "foldsToStoredValue" in that case? Actually?

I think both are ok, the little detail being that this name implies that we are checking that it is coming from a Store, and for exampled, not inferred from any TypeInference / Baseline monitoring.


(In reply to Hannes Verschore [:h4writer] from comment #25)
> (In reply to Hannes Verschore [:h4writer] from comment #24)
> > Oh damn. We use slots and index for the LoadFixedSlot/LoadSlot and
> > LoadElement. So this last thing might also not fly :(
> 
> Can we add "slotOrIndex()" function? Then it should work :D

I do not think there is need to add indirection for factoring things into one unreadable function.
Also should I add a virtual function to check that the dependency is a match?
  LoadFixedSlot -> StoreFixedSlot
  LoadSlot -> StoreSlot

and so on … I think it makes much more sense to just keep the current patch.
Don't you think?
Comment on attachment 8487303 [details] [diff] [review]
IonMonkey: Replace untyped loads by boxed value of the store.

Review of attachment 8487303 [details] [diff] [review]:
-----------------------------------------------------------------

Lets use "foldsToStoredValue".

The other approach would have been awesome. Good for bonuspoints ;)
Attachment #8487303 - Flags: review?(hv1989) → review+
https://hg.mozilla.org/mozilla-central/rev/8cba07b33924
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Comment on attachment 8487081 [details] [diff] [review]
IonMonkey: Spew slot numbers.

Review of attachment 8487081 [details] [diff] [review]:
-----------------------------------------------------------------

I have some sympathy for the concern in comment 17. If you prefer the way this patch prints info for MStoreFixedSlot, what if we reordered the operands of MStoreElement to match it?
Attachment #8487081 - Flags: review?(sunfish)
Attachment #8486411 - Flags: review?(sstangl) → review+
Comment on attachment 8486424 [details] [review]
IonGraph: Display Alias Analysis result.

Merged on github.
Attachment #8486424 - Flags: review?(sstangl) → review+
You need to log in before you can comment on or make changes to this bug.