Closed Bug 760642 Opened 12 years ago Closed 7 years ago

IonMonkey: Refactor IonAssemblyBuffer

Categories

(Core :: JavaScript Engine, defect)

25 Branch
ARM
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mjrosenb, Assigned: mjrosenb)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ion:t] [games:p?] [patches landed in Firefox 25])

Attachments

(7 files, 10 obsolete files)

4.34 KB, patch
jbramley
: review+
Details | Diff | Splinter Review
4.28 KB, patch
jbramley
: review+
Details | Diff | Splinter Review
39.20 KB, application/x-gzip
Details
4.13 KB, text/plain
Details
322.11 KB, patch
Details | Diff | Splinter Review
12.86 KB, text/plain
Details
12.64 KB, text/plain
Details
Several poor design decisions were made while writing the AssemblerBuffer code, which have been repeatedly hacked around, so I intend to undo those when the tree is stable enough I can hack on this for a week.

Things that need to be fixed:
* the API for speaking with the Assembler involves passing pointers into the AssemblerBuffer when you want to put data into a particular pool.  I think I did this so there was no temptation to do arithmetic on the values, and there would be a dedicated "NONE" value.  This later caused many headaches, since I was unable to just allocate a new pool, memcpy and resetting the old pool was required.

* The code for determining if pools are full feels klunky and redundant.
* There are ~10 places where we iterate over the list of subPools, taking into account their size
* On every instruction insertion, we check every subPool to see if it is full
* indexing into the instruction stream requires a linear walk every time.

Expect more bullet points as I inspect the code some more.
This is currently the cause of pathologically long asm.js compile times on ARM (e.g. 2.5s on desktop, 5 minutes on a Nexus 4)
Whiteboard: [ion:t] → [ion:t] [games:p1]
Despite what the name says, I haven't actually tested the perf of this patch yet, I've only tested to make sure it doesn't break anything.
Attachment #767791 - Flags: review?
Attachment #767791 - Flags: review? → review?(Jacob.Bramley)
Comment on attachment 767791 [details] [diff] [review]
/home/mjrosenb/patches/SuperSpeedAssemblerBuffers-r0.patch

Review of attachment 767791 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/shared/IonAssemblerBuffer.h
@@ +63,3 @@
>      void setNext(BufferSlice<SliceSize> *next_) {
>          JS_ASSERT(this->next == NULL);
>          this->next = next_;

Shouldn't this set next_->prev too, to make sure the doubly-linked list is always consistent?

The same applies to setPrev, of course.

@@ +188,5 @@
>              }
>              JS_ASSERT(cur != NULL);
>          }
>          // the offset within this node should not be larger than the node itself.
> +        // this check is now completely bogus since a slightly different algorithm is used.

As discussed, this check probably still holds. If it doesn't, delete it (rather than commenting it out) and work out what's going on, and if the subsequent array lookup is actually safe.
Attachment #767791 - Flags: review?(Jacob.Bramley) → review+
part 2, use a finger to get the last bit of perf out of getInst.

The time to compile the demo is now 1m19s, down from 2m53.254s, Yay‽
the bad news comes from the perf results:

Pre patches, the first few entries were:

    61.37%       js  js                   [.] js::ion::Assembler::bind(js::ion::Label*, js::ion::BufferOffset)
     3.73%       js  js                   [.] js::ion::VirtualRegister::intervalFor(js::ion::CodePosition)
     3.15%       js  js                   [.] js::ion::LiveInterval::addRange(js::ion::CodePosition, js::ion::CodePosition)
     2.31%       js  [kernel.kallsyms]    [k] 0xc01240a4
     1.65%       js  js                   [.] js::ion::LiveInterval::covers(js::ion::CodePosition)
     1.53%       js  js                   [.] js::ion::LinearScanAllocator::findBestFreeRegister(js::ion::CodePosition*)
     1.06%       js  js                   [.] js::ion::LinearScanAllocator::UnhandledQueue::enqueueBackward(js::ion::LiveInterval*)
     0.95%       js  js                   [.] js::ion::EliminatePhis(js::ion::MIRGenerator*, js::ion::MIRGraph&, js::ion::Observability)
     0.88%       js  js                   [.] js::ion::LinearScanAllocator::resolveControlFlow()
     0.79%       js  js                   [.] js::ion::LiveRangeAllocator<js::ion::LinearScanVirtualRegister>::buildLivenessInfo()
     0.79%       js  js                   [.] js::ion::LinearScanAllocator::allocateRegisters()
     0.67%       js  js                   [.] js::frontend::TokenStream::getTokenInternal()
     0.55%       js  js                   [.] js::ion::MDefinition::replaceAllUsesWith(js::ion::MDefinition*)

A bit of math without any testing says that the improvement was about 60%.
Concrete numbers from testing show:
     9.78%       js  js                 [.] js::ion::VirtualRegister::intervalFor(js::ion::CodePosition)
     8.76%       js  js                 [.] js::ion::LiveInterval::addRange(js::ion::CodePosition, js::ion::CodePosition)
     5.67%       js  [kernel.kallsyms]  [k] 0xc01329c0
     4.50%       js  js                 [.] js::ion::LiveInterval::covers(js::ion::CodePosition)
     3.91%       js  js                 [.] js::ion::LinearScanAllocator::findBestFreeRegister(js::ion::CodePosition*)
     2.88%       js  js                 [.] js::ion::LinearScanAllocator::UnhandledQueue::enqueueBackward(js::ion::LiveInterval*)
     2.58%       js  js                 [.] js::ion::EliminatePhis(js::ion::MIRGenerator*, js::ion::MIRGraph&, js::ion::Observability)
     2.38%       js  js                 [.] js::ion::LinearScanAllocator::resolveControlFlow()
     2.16%       js  js                 [.] js::ion::LiveRangeAllocator<js::ion::LinearScanVirtualRegister>::buildLivenessInfo()
     2.13%       js  js                 [.] js::ion::LinearScanAllocator::allocateRegisters()
     1.77%       js  js                 [.] js::frontend::TokenStream::getTokenInternal()
     1.51%       js  js                 [.] js::ion::MDefinition::replaceAllUsesWith(js::ion::MDefinition*)
     1.28%       js  js                 [.] js::ion::MBasicBlock::inherit(js::ion::MBasicBlock*, unsigned int)
     1.22%       js  js                 [.] CheckExpr(FunctionCompiler&, js::frontend::ParseNode*, Use, js::ion::MDefinition**, Type*)
     1.20%       js  js                 [.] js::ion::MPhi::addInputSlow(js::ion::MDefinition*, bool*)
     1.16%       js  js                 [.] js::ion::MBasicBlock::discardPhiAt(js::InlineForwardListIterator<js::ion::MPhi>&)
     1.09%       js  js                 [.] js::InflateUTF8StringToBuffer(JSContext*, char const*, unsigned int, unsigned short*, unsigned int*)
     1.08%       js  js                 [.] js::ion::MPhi::setOperand(unsigned int, js::ion::MDefinition*)
     0.92%       js  js                 [.] js::ion::ValueNumberer::lookupValue(js::ion::MDefinition*)
     0.86%       js  js                 [.] js::ion::ValueNumberer::computeValueNumbers()
     0.82%       js  js                 [.] js::ion::MBasicBlock::addPredecessorPopN(js::ion::MBasicBlock*, unsigned int)
     0.80%       js  js                 [.] js::ion::LiveInterval::intersect(js::ion::LiveInterval*)
     0.80%       js  js                 [.] JSAtom* js::AtomizeChars<(js::AllowGC)1>(JSContext*, unsigned short const*, unsigned int, js::InternBehavior)
     0.77%       js  js                 [.] js::ion::LBlock::firstId()
     0.75%       js  js                 [.] NameResolver::resolve(js::frontend::ParseNode*, JS::Handle<JSAtom*>)
     0.75%       js  js                 [.] js::ion::Assembler::actualOffset(unsigned int) const
     0.72%       js  js                 [.] JS::Compile(JSContext*, JS::Handle<JSObject*>, JS::CompileOptions, _IO_FILE*)
     0.69%       js  js                 [.] PushNodeChildren(js::frontend::ParseNode*, NodeStack*)
     0.65%       js  js                 [.] js::ion::MPhi::New(unsigned int)
     0.64%       js  js                 [.] js::ion::ValueNumberer::eliminateRedundancies()
     0.58%       js  js                 [.] js::ion::LinearScanAllocator::findBestBlockedRegister(js::ion::CodePosition*)
     0.54%       js  js                 [.] js::ion::LBlock::lastId()
     0.53%       js  js                 [.] js::ion::LinearScanAllocator::reifyAllocations()

tldr: these two patches eliminate all of the overhead of the assembler buffer, but there is still a ways to go.
PS: all perf numbers are from a panda board.
Attachment #767866 - Flags: review?
just ran tests on my chromebook, which has a processor that is slightly better than the nexus 4 (but much more comparable than a pandaboard)
total time was 0m19.902s
perf's comment on the matter is:
  7.44%  js  js                 [.] js::ion::VirtualRegister::intervalFor(js::ion::CodePosition)
  6.60%  js  js                 [.] js::ion::LiveInterval::addRange(js::ion::CodePosition, js::ion::CodePosition)
  4.45%  js  js                 [.] js::ion::LinearScanAllocator::findBestFreeRegister(js::ion::CodePosition*)
  3.86%  js  [kernel.kallsyms]  [k] 0x80496fac
  3.39%  js  js                 [.] js::ion::LiveRangeAllocator<js::ion::LinearScanVirtualRegister>::buildLivenessInfo()
  2.81%  js  js                 [.] js::ion::LinearScanAllocator::allocateRegisters()
  2.75%  js  js                 [.] js::ion::MDefinition::replaceAllUsesWith(js::ion::MDefinition*)
  2.49%  js  js                 [.] js::ion::EliminatePhis(js::ion::MIRGenerator*, js::ion::MIRGraph&, js::ion::Observability)
  2.14%  js  js                 [.] js::ion::LinearScanAllocator::UnhandledQueue::enqueueBackward(js::ion::LiveInterval*)
  2.02%  js  js                 [.] js::frontend::TokenStream::getTokenInternal()
  1.90%  js  js                 [.] CheckExpr(FunctionCompiler&, js::frontend::ParseNode*, Use, js::ion::MDefinition**, Type*)
  1.78%  js  js                 [.] js::ion::LiveInterval::covers(js::ion::CodePosition)
  1.67%  js  js                 [.] js::ion::LinearScanAllocator::resolveControlFlow()
  1.60%  js  js                 [.] js::InflateUTF8StringToBuffer(JSContext*, char const*, unsigned int, unsigned short*, unsigned int*)
  1.30%  js  js                 [.] js::ion::ValueNumberer::lookupValue(js::ion::MDefinition*)
  1.30%  js  js                 [.] js::ion::MBasicBlock::inherit(js::ion::MBasicBlock*, unsigned int)
  1.26%  js  js                 [.] js::ion::MPhi::addInputSlow(js::ion::MDefinition*, bool*)
  1.23%  js  js                 [.] NameResolver::resolve(js::frontend::ParseNode*, JS::Handle<JSAtom*>)
  1.17%  js  js                 [.] PushNodeChildren(js::frontend::ParseNode*, NodeStack*)
  1.15%  js  js                 [.] JS::Compile(JSContext*, JS::Handle<JSObject*>, JS::CompileOptions, _IO_FILE*)
  1.09%  js  js                 [.] JSAtom* js::AtomizeChars<(js::AllowGC)1>(JSContext*, unsigned short const*, unsigned int, js::InternBehavior)
  1.01%  js  js                 [.] js::ion::LiveInterval::addRangeAtHead(js::ion::CodePosition, js::ion::CodePosition)
  0.95%  js  js                 [.] js::ion::MPhi::setOperand(unsigned int, js::ion::MDefinition*)
  0.92%  js  js                 [.] js::ion::LiveInterval::intersect(js::ion::LiveInterval*)
  0.83%  js  js                 [.] js::ion::MBasicBlock::addPredecessorPopN(js::ion::MBasicBlock*, unsigned int)
  0.81%  js  js                 [.] js::ion::ValueNumberer::computeValueNumbers()
  0.79%  js  js                 [.] js::ion::MPhi::getOperand(unsigned int) const
  0.79%  js  js                 [.] js::ion::Assembler::actualOffset(unsigned int) const
  0.78%  js  js                 [.] js::ion::ValueNumberer::eliminateRedundancies()
  0.74%  js  js                 [.] js::ion::LBlock::lastId()
  0.64%  js  js                 [.] js::ion::MNode::discardOperand(unsigned int)
  0.62%  js  js                 [.] js::ion::LinearScanAllocator::findBestBlockedRegister(js::ion::CodePosition*)
  0.57%  js  js                 [.] js::ion::Loop::init()
  0.55%  js  libgcc_s.so.1      [.] __aeabi_dsub
  0.54%  js  js                 [.] js::ion::LiveRangeAllocator<js::ion::LinearScanVirtualRegister>::init()
  0.54%  js  js                 [.] js::Vector<js::ion::LiveInterval::Range, 1u, js::ion::IonAllocPolicy>::growStorageBy(unsigned int)
  0.52%  js  js                 [.] js::ion::LinearScanAllocator::reifyAllocations()
  0.51%  js  js                 [.] js::ion::MPhi::New(unsigned int)
  0.50%  js  libc-2.17.so       [.] memset
For me on the N4, this was a significant improvement, but not enough -- down to 68s.  My profile looks basically the same though; 80% of the time in getInst().  This is with both patches in this bug.  Also note that I'm only profiling the main thread.. I believe off main thread compilation is enabled, but now I'm not sure.
I actually see the same compilation time (~68s) whether I have profiling enabled or disabled.  I thought that parallel compilation gets disabled when profiling is enabled, which then makes me wonder why I'm seeing the same compilation speed either way...
Attachment #767866 - Flags: review? → review?(Jacob.Bramley)
small blessing in disguise: I left a print statement in there, but commented out.  can you uncomment it, and give me the (rather long) log of what it prints?
Flags: needinfo?(vladimir)
Comment on attachment 767866 [details] [diff] [review]
/home/mjrosenb/patches/useAFinger-r0.patch

Review of attachment 767866 [details] [diff] [review]:
-----------------------------------------------------------------

It looks correct.

Linked lists tend to be very hard on caches, since they involve a lot of pointer chasing which is hard to predict. The finger might resolve quite a lot of that, since you can dramatically reduce the number of traversals you need to do, but it will only work well if your getInst calls have strong locality. Depending on your usage pattern, I think that you could get more performance by maintaining an index list of every chunk and not using the linked list. Your printf output should tell you that.

::: js/src/ion/shared/IonAssemblerBuffer.h
@@ +202,5 @@
> +        } else {
> +            // it is closest to the end
> +            cur = tail;
> +            cur_off = bufferSize;
> +        }

It might be useful to make this a helper function ("findNearbyChunk(...)"), to aid readability.

@@ +228,4 @@
>              }
>              JS_ASSERT(cur != NULL);
>          }
> +        if (count > 2 || used_finger) {

This threshold might need tuning. For example, if the getInst accesses often alternate between two distant chunks, then updating the finger on "count > 2" will be costly since it would often be flipping between them, and each flip would trigger a lengthy lookup. On the other hand, it might work very well as it is. I think it would be useful to study the typical access patterns (if you haven't already).

@@ +235,4 @@
>          // the offset within this node should not be larger than the node itself.
>          // this check is now completely bogus since a slightly different algorithm is used.
> +        JS_ASSERT(local_off < cur->size());
> +        // printf("### %c%s took %d steps\n", sigil, used_finger ? "G" : " ", count);

Remove this (and the count++ parts) before pushing.
Attachment #767866 - Flags: review?(Jacob.Bramley) → review+
Attached file gzip'd printf log
Whoops, my previous timings were without the second patch.  With the second patch (and no printf) compilation time is down to 18.9s!  Definitely making solid progress.

Here's the log of the steps output, gzip'd since it's 2.5MB otherwise.
Flags: needinfo?(vladimir)
Note: I'm not sure that cast was correct, because the result of the subraction is already a uint32.  So this abs is basically a noop.

Also, there are a lot of warnings in this code that should get resolved, but in a followup.
all of the values should probably be signed, because while none of them should ever be negative, I do actually want to take the difference of two of them and get a negative value.
But

  int finger_off = abs((int)(local_off - finger_offset));

doesn't take the difference and get a negative value.  It's really equivalent to this:

  int finger_off = abs((int)(unsigned(local_off) - finger_offset));

That is, the subtraction is done in unsigned-space, then the cast to int -- which only works correctly per spec if the subtraction result was >= 0 -- and then abs under that assumption will return the exact same value.  If you want signed subtraction, you have to convert the unsigned operand to signed before subtracting.

As far as abs goes, you should use mozilla::Abs in "mozilla/MathAlgorithms.h", which returns the corresponding unsigned type.  Then use SafeCast<int> from "mozilla/Casting.h" to convert to int, asserting the value will be unchanged by the cast.
Whiteboard: [ion:t] [games:p1] → [ion:t] [games:p1] [leave open]
Why not use mozilla::LinkedList?
Blocks: gecko-games
Attached patch AssemblerBufferRework-r2.patch (obsolete) — Splinter Review
wow, that involved a whole lot more time debugging than I expected.  Anyhow, this passes all jit-tests, but it likely doesn't compile on x86 or x64, due to a couple of minor api changes.  An updated patch should be along later today
Attachment #8347947 - Flags: review?(Jacob.Bramley)
Since this is a big rewrite, I think it'd be good to have somebody on the JS team review this as well. Luke, I think you and Marty have discussed this patch before, do you have some spare cycles? Else we can find somebody else or I can take it, but I'm pretty swamped atm.
Flags: needinfo?(luke)
(In reply to Jan de Mooij [:jandem] from comment #20)
> Since this is a big rewrite, I think it'd be good to have somebody on the JS
> team review this as well. Luke, I think you and Marty have discussed this
> patch before, do you have some spare cycles? Else we can find somebody else
> or I can take it, but I'm pretty swamped atm.

I was going to say the same thing. I don't think I'm qualified to review a change on this scale. (I'll still do a review and give my opinions, though, but someone else should look at it too.)
I probably won't be able to get to a review until after the Christmas break, but I could start reviewing then.

In the meantime, it sounds like there is some work to be done to finish up x86/x64 and I see a number of TODOs and extraneous #ifdefs in the patch.  It'd be nice to have those ironed out before starting to review.

Two other questions in the meantime:

1. Have you measured the effect of this on asm.js compile time?  In particular, I'd be interested to hear about the change in the "compilation succeeded in Nms" number reported after running bullet/zlib/box2d in awfy asmjs-apps.

2. Do I recall correctly that the plan is to refactor x86/x64 buffers to use he same buffer (so that we can do fancy things like using shorter jump instructions)?
Flags: needinfo?(luke)
Comment on attachment 8347947 [details] [diff] [review]
AssemblerBufferRework-r2.patch

Review of attachment 8347947 [details] [diff] [review]:
-----------------------------------------------------------------

I've scratched the surface. As Luke said, I think I'd like to wait until the patch is in a more final state before I continue. Most of the issues I've seen so far relate to formatting and debug code.

::: js/src/jit/AsmJS.cpp
@@ +1672,5 @@
> +        }
> +        for (unsigned i = 0; i < module_->numExits(); i++) {
> +            module_->exit(i).updateOffsets(&masm_);
> +        }
> +        

There's trailing whitespace here. There are other instances, but I won't note all of them.

::: js/src/jit/AsmJSModule.h
@@ +210,5 @@
>        public:
>          Exit() {}
>          Exit(unsigned ffiIndex, unsigned globalDataOffset)
>            : ffiIndex_(ffiIndex), globalDataOffset_(globalDataOffset),
> +            interpCodeOffset_(UINT_MAX), ionCodeOffset_(UINT_MAX)

Should you define a meaningful name for those? Exit::offset_uninitialized perhaps.

Also, something like Exit::is(Interp|Ion)OffsetInitialized() would be a good idea, rather than checkout offset() for a magic value.

@@ +228,3 @@
>              ionCodeOffset_ = off;
>          }
> +        void updateOffsets(jit::MacroAssembler *masm) {

This function isn't idempotent. Should it assert that it hasn't already been run?

::: js/src/jit/BaselineCompiler.cpp
@@ +718,5 @@
>          return false;
>      masm.bind(&noPush);
>  
>      // Store the start offset in the appropriate location.
> +    eprintf("spsPushToggleOffset_: %d\n", spsPushToggleOffset_.offset());

Did you mean to leave this here?

::: js/src/jit/Ion.cpp
@@ +646,5 @@
> +    if (getenv("MYSPEW") && !strcmp(getenv("MYSPEW"), "insns")) {
> +        char cmd[4096];
> +        snprintf(cmd, 4096, "gdb /proc/%d/exe %d -batch -ex 'set pagination off' -ex 'xi %d %p' 1>&2", getpid(), getpid(), insnSize_/4, code_);
> +        system(cmd);
> +    }

Did you mean to leave this here?

::: js/src/jit/IonCode.h
@@ +70,5 @@
>          jumpRelocTableBytes_(0),
>          dataRelocTableBytes_(0),
>          preBarrierTableBytes_(0),
>          invalidated_(false)
> +    { if (getenv("MYSPEW")) fprintf(stderr, "IonCode::IonCode code=%p\n", code);}

Did you mean to leave this here?

::: js/src/jit/IonMacroAssembler.h
@@ +847,5 @@
>          // the IonCode onto the stack in order to GC it correctly.  exitCodePatch should
>          // be unset if the code never needed to push its IonCode*.
>          if (hasEnteredExitFrame()) {
> +            exitCodePatch_.fixup(this);
> +            eprintf("patching %p with itself\n", code->raw());

Did you mean to leave that here?

::: js/src/jit/arm/Assembler-arm.cpp
@@ +574,1 @@
>          dataRelocations_.writeUnsigned(real_offset);

'offset' is never used other than to write to 'real_offset'.

@@ +580,1 @@
>          jumpRelocations_.writeUnsigned(real_offset);

Ditto.

@@ +586,1 @@
>          preBarriers_.writeUnsigned(real_offset);

Ditto.

@@ +598,5 @@
> +    char cmd[4096];
> +    snprintf(cmd, 4096, "gdb /proc/%d/exe %d --batch -ex 'set arm force-mode arm' -ex 'disas %p,+%d' -ex bt",
> +             getpid(), getpid(), buffer, m_buffer.size());
> +    if (m_buffer.size() > 4096 && getenv("MYSPEW"))
> +        system(cmd);

Debug code.

@@ +607,5 @@
>  Assembler::resetCounter()
>  {
>      m_buffer.resetCounter();
>  }
> +#if 0

Can't you just delete the code?

@@ +2664,5 @@
>      static int hit = 0;
>      if (stopBKPT == hit)
>          dbg_break();
>      writeInst(0xe1200070 | (hit & 0xf) | ((hit & 0xfff0)<<4));
> +    //writeInst(0xffff0000 | hit & 0xffff);

Did you mean to delete this?

::: js/src/jit/shared/Assembler-shared.h
@@ +562,5 @@
>      }
>  
>      void operator = (CodeOffsetJump base) {
>          raw_ = (uint8_t *) base.offset();
> +        if (getenv("MYSPEW")) fprintf(stderr, "setting myself (%p) to %p\n", &raw_, raw_);

Did you mean to leave this here?

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +394,5 @@
>  
>  bool
>  CodeGeneratorShared::markSafepointAt(uint32_t offset, LInstruction *ins)
>  {
> +    // I'm decently sure this is bogus on ARM.

Why? Can you leave a reference or some explanation in case someone else stumbles across it?

@@ +418,5 @@
>      // represent the join point of the call out of the function.
>      //
>      // At points where we want to ensure that invalidation won't corrupt an
>      // important instruction, we make sure to pad with nops.
> +#ifndef JS_CPU_ARM

Why?

@@ +425,5 @@
>          paddingSize -= masm.currentOffset() - lastOsiPointOffset_;
>          for (int32_t i = 0; i < paddingSize; ++i)
>              masm.nop();
>      }
> +    //JS_ASSERT(masm.currentOffset() - lastOsiPointOffset_ >= Assembler::patchWrite_NearCallSize());

Why is this commented out?
Attachment #8347947 - Flags: review?(Jacob.Bramley) → review-
Attached patch AssemblerBufferRework-r3.patch (obsolete) — Splinter Review
Still didn't add in the x86/x64 changes, but I remembered to deal with all of the messy allocations this time.  Just need to handle OOM, which shouldn't lead to any major architectural changes.  Also, got rid of a bunch more commented out code.
Attachment #8347947 - Attachment is obsolete: true
ok, builds on x64, passes a whole boatload of tests when run under valgrind, I'm like 90 %sure that I handle all of the oom cases, and about as sure that I got the bounds checking right for falling back to long branches/calls.  I'll benchmark it... soon.
Attachment #8349419 - Attachment is obsolete: true
Attachment #8349745 - Flags: review?(luke)
Attachment #8349745 - Flags: review?(Jacob.Bramley)
I probably won't have time to review it until the new year I'm afraid.

(In reply to Marty Rosenberg [:mjrosenb] from comment #25)
> I'm like 90 %sure that I handle all of the oom cases, and about as sure that
> I got the bounds checking right for falling back to long branches/calls. 

That 10% is quite a big window of uncertainty considering the nature of the code. (A bug in the branch handling code could easily be a security bug.) Can you do some more focussed testing to make sure that the branches are converted properly at the boundary conditions please? The JavaScript-based test suites won't be able to test that reliably.
Blocks: 955822
The patch references the new file jit/shared/IonAssemblerBuffer.cpp but this does not appear to be included in the patch?
Flags: needinfo?(mrosenberg)
Blocks: 944972
Blocks: 906964
Blocks: 952810
as Doug pointed out, I didn't add in a new .cpp file.  My bad.
Attachment #8355727 - Flags: review?(luke)
Attachment #8355727 - Flags: review?(Jacob.Bramley)
Flags: needinfo?(mrosenberg)
Attached patch AssemblerBufferRework-r5.patch (obsolete) — Splinter Review
unlike the second -r4, I actually qref'ed before uploading a new patch.  thankfully my script realized that I've already submitted that patch.
Attachment #8349745 - Attachment is obsolete: true
Attachment #8355727 - Attachment is obsolete: true
Attachment #8349745 - Flags: review?(luke)
Attachment #8349745 - Flags: review?(Jacob.Bramley)
Attachment #8355727 - Flags: review?(luke)
Attachment #8355727 - Flags: review?(Jacob.Bramley)
Attachment #8355729 - Flags: review?(luke)
Attachment #8355729 - Flags: review?(Jacob.Bramley)
Thank you for the updated patch.

A little rebasing was needed to apply the patch to the m-c tip and fwiw the rebased patch is attached.  Also noticed some trailing white space which was removed.

There is still a good deal of debug code in the patch but it's not distracting and might even be useful in the short term.  There are quite a few compilation warnings that could be addressed, unused variables etc.

Initial testing on some asm.js code is positive.  Ammo and BananaBread work on the ARMv7, and Ammo works on the ARMv6.  Had to apply the patch in bug 861208, and this might be something to consider here.  BananaBread failed on the ARMv6 with an assertion failure which might be related to FP loads from the pools:

Assembler-arm.h line 511:
    Imm8VFPOffData(uint32_t imm) : data (imm) {
        JS_ASSERT((imm & ~(0xff)) == 0);
    }

Shall continue exploring the code and try to debug this, and test it further.  Shall try the jit-tests on the ARMv6 to see if some simpler tests failure.

Noticed some crashes reloading demos repeatedly, that did not seems to occur if loading after a fresh start, but it's not yet clear if these are related to the changes here.

Great work, looks very promising.
jit-test tbpl on an ARMv6 device went very well and was a big improvement.  All the previous buffer pool failures were corrected!  There was only one new failure for tests/asm.js/testBasic.js

0x003d48c4 in js::jit::DtrOffImm::DtrOffImm(int) () at js/src/jit/arm/Assembler-arm.h:782
782	        JS_ASSERT(mozilla::Abs(imm) < 4096);
#0  0x003d48c4 in js::jit::DtrOffImm::DtrOffImm(int) () at js/src/jit/arm/Assembler-arm.h:782
#1  0x003c5a20 in js::jit::Assembler::patchConstantPoolLoad(js::jit::Instruction*, void*) () at js/src/jit/arm/Assembler-arm.cpp:2039

jit-test tbpl on an ARMv7 device produces a good number of new failures.  Too many to analyses them all, but so far they all crashed with a SIGSEGV on the following instruction sequence.

   0x40ccde04:	movw	r0, #65535	; 0xffff
   0x40ccde08:	movt	r0, #65535	; 0xffff
=> 0x40ccde0c:	ldr	r12, [r0, #124]	; 0x7c
uneval(this)

asserts js debug shell on m-c rev 30f3710477c2 with --ion-eager --ion-parallel-compile=off at Assertion failure: typeAlloc->isArgument(), at jit/LinearScan.cpp with the patch in comment 30.

(this was a drive-by fuzzing instance, I'd also suggest a try run on all relevant platforms prior to landing)
Comment on attachment 8355754 [details] [diff] [review]
Rebased and cleaned up some trailing white space.

Dang it. It turns out I was hitting a variant of bug 958432, and the patch there seems to fix the issue.
Attachment #8355754 - Flags: feedback-
Comment on attachment 8355729 [details] [diff] [review]
AssemblerBufferRework-r5.patch

Sounds like there are some issues dougc found (crashes, remaining debug/todo code).  Perhaps, since dougc is going to be working with this a lot, he could review the patch for logic and then give Jan the review for Ion style?
Attachment #8355729 - Flags: review?(luke)
Attached patch AssemblerBufferRework-r6.patch (obsolete) — Splinter Review
I updated to your rebased patch, and added in a fix for the one bug I found.  it looks like it is *mostly* clean on armel as well as armhf, but I need to do a bit of investigation that may take a while.  (my armel machine is acting up).  Feel free to test this out, and see if it fixes the bugs you've been seeing.
Attachment #8361322 - Flags: feedback?(dtc-moz)
Blocks: 925819
Comment on attachment 8361322 [details] [diff] [review]
AssemblerBufferRework-r6.patch

Review of attachment 8361322 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you, the fix helps a lot.  All the jit-tests now pass on the ARMv7.  There are still are few jit-test failures on the ARMv6 that appear to be related and they all appear to have the same signature:

Assertion failure: data == imm, at js/src/jit/arm/Assembler-arm.h:553
#0  0x0037ec96 in js::jit::datastore::Imm12Data::Imm12Data(unsigned int) () at js/src/jit/arm/Assembler-arm.h:553
#1  0x0037f1c0 in js::jit::DtrOffImm::DtrOffImm(int) () at js/src/jit/arm/Assembler-arm.h:780
#2  0x0035bfd0 in js::jit::Assembler::patchConstantPoolLoad(js::jit::Instruction*, void*) () at js/src/jit/arm/Assembler-arm.cpp:2022
#3  0x003862f8 in js::jit::AssemblerBufferWithConstantPool<4, js::jit::Instruction, js::jit::Assembler, js::jit::SliceKind, 1, 2>::executableCopy(unsigned char*) ()
    at js/src/jit/shared/IonAssemblerBufferWithConstantPools.h:980
#4  0x00358db0 in js::jit::Assembler::executableCopy(unsigned char*) () at js/src/jit/arm/Assembler-arm.cpp:595
#5  0x002565a2 in js::jit::JitCode::copyFrom(js::jit::MacroAssembler&) () at js/src/jit/Ion.cpp:645
#6  0x0023ab14 in js::jit::JitCode* js::jit::Linker::newCode<(js::AllowGC)1>(JSContext*, JSC::ExecutableAllocator*, JSC::CodeKind) () at js/src/jit/IonLinker.h:63
#7  0x00225c32 in js::jit::JitCode* js::jit::Linker::newCode<(js::AllowGC)1>(JSContext*, JSC::CodeKind) () at js/src/jit/IonLinker.h:81
#8  0x001a2018 in js::jit::BaselineCompiler::compile() () at js/src/jit/BaselineCompiler.cpp:111
#9  0x001cbedc in js::jit::BaselineCompile(JSContext*, JS::Handle<JSScript*>) () at js/src/jit/BaselineJIT.cpp:236

Shall continue testing and explore the jit-test failures.
Attachment #8361322 - Flags: feedback?(dtc-moz) → feedback?
Blocks: 959263
Depends on: 861208
The patch in bug 861208 was necessary to stop some assertion failures even on the ARMv7, so it has been added as a dependency for now.  Previously bug 861208 was only seen on the ARMv6.

Some larger Odin demos run on the ARMv7, such as ammo.  BananaBench compiles but crashes on the ARMv7, SIGSEGV at 0xfffffffe.  Was working before the last 'fix'.  Following the link register points to the call below.  The buffer before the call appears to be corrupt and a call address pointer was not set.

   0x88fdfc2c:	beq	0x88fdfc34
   0x88fdfc30:	bkpt	0x45bf
   0x88fdfc34:	movw	r12, #48992	; 0xbf60
   0x88fdfc38:	movt	r12, #16391	; 0x4007
   0x88fdfc3c:	andcs	r0, r0, r0               \ corrupt buffer?
   0x88fdfc40:	movwgt	r6, #52213	; 0xcbf5 |
   0x88fdfc44:	andge	r0, r0, r0               |
   0x88fdfc48:	svclt	0x0091df46               |
   0x88fdfc4c:	andeq	r0, r0, r0               |
   0x88fdfc50:	andeq	r0, r0, r0               |
   0x88fdfc54:	andeq	r0, r0, r0               /
   0x88fdfc58:	movw	r12, #65535	; 0xffff < not set?
   0x88fdfc5c:	movt	r12, #65535	; 0xffff <
   0x88fdfc60:	blx	r12
   0x88fdfc64:	vmov	d0, r0, r1
So I tried to reproduce this using a debug-only armel build, but after compiling, it seems to die a fairly normal death:
exception thrown: TypeError: GLctx is undefined,_glGetString@/home/mjrosenb/bench/BananaBread/cube2/bb.js:6435
_glGetString@/home/mjrosenb/bench/BananaBread/cube2/bb.js:6687
Module._main@./js/game-setup.js:115
callMain@/home/mjrosenb/bench/BananaBread/cube2/bb.js:16060
doRun@/home/mjrosenb/bench/BananaBread/cube2/bb.js:16099
run/<@/home/mjrosenb/bench/BananaBread/cube2/bb.js:16109
window.runEventLoop@/home/mjrosenb/bench/BananaBread/cube2/game/headless.js:73
@./js/game-setup.js:605

/home/mjrosenb/bench/BananaBread/cube2/bb.js:6435:10 TypeError: GLctx is undefined

How did you build the shell, and how did you run the benchmark?
Flags: needinfo?(dtc-moz)
(In reply to Marty Rosenberg [:mjrosenb] from comment #38)
> So I tried to reproduce this using a debug-only armel build, but after
> compiling, it seems to die a fairly normal death:
> exception thrown: TypeError: GLctx is
> undefined,_glGetString@/home/mjrosenb/bench/BananaBread/cube2/bb.js:6435
> _glGetString@/home/mjrosenb/bench/BananaBread/cube2/bb.js:6687
> Module._main@./js/game-setup.js:115
> callMain@/home/mjrosenb/bench/BananaBread/cube2/bb.js:16060
> doRun@/home/mjrosenb/bench/BananaBread/cube2/bb.js:16099
> run/<@/home/mjrosenb/bench/BananaBread/cube2/bb.js:16109
> window.runEventLoop@/home/mjrosenb/bench/BananaBread/cube2/game/headless.js:
> 73
> @./js/game-setup.js:605
> 
> /home/mjrosenb/bench/BananaBread/cube2/bb.js:6435:10 TypeError: GLctx is
> undefined
> 
> How did you build the shell, and how did you run the benchmark?

It was run in a browser, not a shell.  The simulator in bug 959597 will run BananaBread in the browser.  Let me try and sort out some of the simulator issues and then try to reproduce the crash seen using the simulator.

The movw r12, #65535,  movt r12, #65535, sequence is quite distinctive and perhaps some hack code could scan all compiled buffers for this sequence to try and locate the error at compile time.
Flags: needinfo?(dtc-moz)
Attached patch AssemblerBufferRework-r7.patch (obsolete) — Splinter Review
Ok, I think I fixed the issue with that stray write into the instruction stream.  I also put in some assertions, but they may be bogus (I haven't actually tested this yet, mostly because I wasn't able to reproduce the error initially.).
Attachment #8367441 - Flags: review?(Jacob.Bramley)
Comment on attachment 8367441 [details] [diff] [review]
AssemblerBufferRework-r7.patch

I'm not going to be able to review this in a timely manner. Doug, could you have a look please? Luke suggested you in comment 34.
Attachment #8367441 - Flags: review?(Jacob.Bramley) → review?(dtc-moz)
Comment on attachment 8355729 [details] [diff] [review]
AssemblerBufferRework-r5.patch

This was replaced with newer patches.
Attachment #8355729 - Flags: review?(Jacob.Bramley)
(In reply to Marty Rosenberg [:mjrosenb] from comment #40)
> Created attachment 8367441 [details] [diff] [review]
> AssemblerBufferRework-r7.patch

Marty, do you mind rebasing this patch to m-c tip? (rev cafe909f7e07 at the time of this writing)
Flags: needinfo?(mrosenberg)
Attached patch R7 rebased (obsolete) — Splinter Review
Excuse some minor reformatting of text comments, and wip review annotations.
Flags: needinfo?(mrosenberg)
(In reply to Douglas Crosher [:dougc] from comment #44)
> Created attachment 8372742 [details] [diff] [review]
> R7 rebased
> 
> Excuse some minor reformatting of text comments, and wip review annotations.

I hit this compile error:

Traceback (most recent call last):
  File "./config.status", line 419, in <module>
    config_status(**args)
  File "/home/fuzz5lin/Desktop/js-dbg-32-dm-ts-er-sfp-linux-mozilla-central-167616-cafe909f7e07-dgapI_-patched/compilePath/python/mozbuild/mozbuild/config_status.py", line 99, in config_status
    summary = backend.consume(definitions)
  File "/home/fuzz5lin/Desktop/js-dbg-32-dm-ts-er-sfp-linux-mozilla-central-167616-cafe909f7e07-dgapI_-patched/compilePath/python/mozbuild/mozbuild/backend/base.py", line 204, in consume
    for obj in objs:
  File "/home/fuzz5lin/Desktop/js-dbg-32-dm-ts-er-sfp-linux-mozilla-central-167616-cafe909f7e07-dgapI_-patched/compilePath/python/mozbuild/mozbuild/frontend/emitter.py", line 104, in emit
    objs = list(self.emit_from_sandbox(out))
  File "/home/fuzz5lin/Desktop/js-dbg-32-dm-ts-er-sfp-linux-mozilla-central-167616-cafe909f7e07-dgapI_-patched/compilePath/python/mozbuild/mozbuild/frontend/emitter.py", line 205, in emit_from_sandbox
    % (symbol, src, sandbox['RELATIVEDIR']))
mozbuild.frontend.reader.SandboxValidationError: Reference to a file that doesn't exist in UNIFIED_SOURCES (jit/shared/IonAssemblerBuffer.cpp) in js/src
Flags: needinfo?(dtc-moz)
Attached patch R7 rebasedSplinter Review
Sorry, forgot to add the new file to the patch set.
Attachment #8372742 - Attachment is obsolete: true
Flags: needinfo?(dtc-moz)
Comment on attachment 8372763 [details] [diff] [review]
R7 rebased

(function(b, foreign) {
    "use asm";
    var ff = foreign.f;
    function f() {
        ff((.0), 0, 7, (.0), 0, (.0), (.0), (.7), (.5),
           (.0), (.6), (-0), (.1), (.0), (.0), (.0), (1.), (.0), (.0),
           (.0), (-0), (.0), (.0), (.0), (.0), (.0), (.0), (.7), (.0))
    }
    return f
})

$ ./js-dbg-32-dm-ts-er-sfp-linux-cafe909f7e07 --ion-eager testcase.js 

Assertion failure: false (MOZ_ASSUME_UNREACHABLE(unsupported relocation)), at jit/arm/Assembler-arm.cpp:795
Attachment #8372763 - Flags: feedback-
Flags: needinfo?(dtc-moz)
Tested on Ubuntu on ARM pandaboard.
(In reply to Gary Kwong [:gkw] [:nth10sd] catching up on email/bugmail from comment #47)
> Comment on attachment 8372763 [details] [diff] [review]
> R7 rebased
> 
> (function(b, foreign) {
>     "use asm";
>     var ff = foreign.f;
>     function f() {
>         ff((.0), 0, 7, (.0), 0, (.0), (.0), (.7), (.5),
>            (.0), (.6), (-0), (.1), (.0), (.0), (.0), (1.), (.0), (.0),
>            (.0), (-0), (.0), (.0), (.0), (.0), (.0), (.0), (.7), (.0))
>     }
>     return f
> })
> 
> $ ./js-dbg-32-dm-ts-er-sfp-linux-cafe909f7e07 --ion-eager testcase.js 
> 
> Assertion failure: false (MOZ_ASSUME_UNREACHABLE(unsupported relocation)),
> at jit/arm/Assembler-arm.cpp:795

This does not reproduce in the ARM simulator with the patch in bug 861208 applied.  A rebased patch for bug 861208 has just been uploaded.  Is it convenient to apply this too, or do you really need a combined patch?
Flags: needinfo?(dtc-moz)
I tried that additional patch and when run with --ion-eager, got:

Assertion failure: i->is<InstMovW>(), at jit/arm/MacroAssembler-arm.cpp
Flags: needinfo?(dtc-moz)
(In reply to Gary Kwong [:gkw] [:nth10sd] catching up on email/bugmail from comment #50)
> Created attachment 8375383 [details]
> second stack with additional patch from bug 861208
> 
> I tried that additional patch and when run with --ion-eager, got:
> 
> Assertion failure: i->is<InstMovW>(), at jit/arm/MacroAssembler-arm.cpp

Ok, thank you, I was able to reproduce this in the simulator by applying just these patches.

A small fix to the alignment code makes this failure go away, but it might not be the cause.
Flags: needinfo?(dtc-moz)
Attached patch Fix an alignment calculation (obsolete) — Splinter Review
Some brackets were needed here to get the desired precedence.  This just avoids extra alignment fill, the alignment was not wrong.
(In reply to Gary Kwong [:gkw] [:nth10sd] catching up on email/bugmail from comment #50)
> Created attachment 8375383 [details]
> second stack with additional patch from bug 861208
> 
> I tried that additional patch and when run with --ion-eager, got:
> 
> Assertion failure: i->is<InstMovW>(), at jit/arm/MacroAssembler-arm.cpp

The revised patch in bug 861208 fixes the real issue.  There might be more cases that need similar handling.  Some refactoring might be appropriate.

It is not necessary to apply the patch noted in comment 52, but it would make it easer to reproduce future crashes if we are using the same patches.

Thank you for testing, and further testing would help.
> It is not necessary to apply the patch noted in comment 52, but it would
> make it easer to reproduce future crashes if we are using the same patches.

So ideally I'd need the patch in comment 46, the patch in bug 861208 comment 5, and the patch in comment 52?
Flags: needinfo?(dtc-moz)
(In reply to Gary Kwong [:gkw] [:nth10sd] catching up on email/bugmail from comment #54)
> > It is not necessary to apply the patch noted in comment 52, but it would
> > make it easer to reproduce future crashes if we are using the same patches.
> 
> So ideally I'd need the patch in comment 46, the patch in bug 861208 comment
> 5, and the patch in comment 52?

Yes.  If your expectation is a combined patch then just let me know and I will bundled everything together?
Flags: needinfo?(dtc-moz)
> Yes.  If your expectation is a combined patch then just let me know and I
> will bundled everything together?

Such a combined patch is always appreciated, as it can then be clear which problems found are for exactly which patch. :)

Or you could file a new meta bug with these combined patches to test 'em all, or whatever you're most comfortable with.
Flags: needinfo?(dtc-moz)
Comment on attachment 8367441 [details] [diff] [review]
AssemblerBufferRework-r7.patch

The work will continue of a user repo and the combined set will be reviewed when ready.
Attachment #8367441 - Flags: review?(dtc-moz)
Flags: needinfo?(dtc-moz)
Comment on attachment 8367441 [details] [diff] [review]
AssemblerBufferRework-r7.patch

A later rebased patch has been supplied.
Attachment #8367441 - Attachment is obsolete: true
Attachment #8361322 - Attachment is obsolete: true
Attachment #8361322 - Flags: feedback?
Attachment #8355754 - Attachment is obsolete: true
Attachment #8355729 - Attachment is obsolete: true
Comment on attachment 8375471 [details] [diff] [review]
Fix an alignment calculation

Moved to follow up work on a user repo.
Attachment #8375471 - Attachment is obsolete: true
Depends on: 972710
Blocks: 970261
Blocks: 973874
What's the status on this bug?
Whiteboard: [ion:t] [games:p1] [leave open] → [ion:t] [games:p?] [leave open]
I assume this bug is no longer valid and/or the patches are heavily bitrotted ... can we close this then? Is there anything actionable left for this bug?
Flags: needinfo?(Virtual)
Yep, let's mark it as FIXED, per Comment 17, as some patches landed.
If there will be some leftovers, let's track it in new bug, which will block this one.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: needinfo?(Virtual)
Resolution: --- → FIXED
Whiteboard: [ion:t] [games:p?] [leave open] → [ion:t] [games:p?] [patches landed in Firefox 25]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: