JM: Port JM+Fatval to ARM.

RESOLVED FIXED

Status

()

RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: jbramley, Assigned: jbramley)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 4 obsolete attachments)

(Assignee)

Description

9 years ago
The (soon to be) attached patches _nearly_ comprise a port of the 'moo' branch to ARM. Most of the changes are minor syntax conventions, such as removing trailing commas from enumerations and the like.

The only remaining build failure (unless it is masking other problems) is in MonoIC, because GET_DATA_OFFSET and the other related constants are only defined for x86. I'm not entirely sure how the monomorphic inline caches fit into Jaeger Monkey (and what values those constants should have for ARM), so I'll need to familiarize myself with the code before I can sort that out.
(Assignee)

Comment 1

9 years ago
Created attachment 451874 [details] [diff] [review]
1: Fix the #if that surrounds the definition of cacheFlush on ARM Linux (GCC).
(Assignee)

Comment 2

9 years ago
Created attachment 451875 [details] [diff] [review]
2: Remove trailing commas and delete duplicate methods.
(Assignee)

Comment 3

9 years ago
Created attachment 451876 [details] [diff] [review]
3: Remove a reference to js::Vector's 'at' method.

This one was ported from here: http://hg.mozilla.org/users/danderson_mozilla.com/jaegermonkey/rev/b2c7d6513ce6
(Assignee)

Comment 4

9 years ago
Created attachment 451877 [details] [diff] [review]
4: Reintegrate JM.

 * Add missing overloaded variants of some methods used by the new fast-paths.
 * Fixes a few copy-paste errors.
 * Defines JSReturnReg_{Type,Data}. Note that I don't really know the consequences of doing this, so this might break horribly.
Comment on attachment 451877 [details] [diff] [review]
4: Reintegrate JM.

Awesome, thanks for doing this Jacob. If we can get it to build I should be able to port the mono ICs. We emit a few loads/stores with bogus address offsets, then patch them later.
Attachment #451877 - Flags: review+
(Assignee)

Comment 6

9 years ago
Created attachment 452672 [details] [diff] [review]
4: Reintegrate JM.
Attachment #451877 - Attachment is obsolete: true
(Assignee)

Comment 7

9 years ago
Created attachment 452674 [details] [diff] [review]
4: Reintegrate JM.
Attachment #452672 - Attachment is obsolete: true
(Assignee)

Comment 8

9 years ago
Created attachment 452676 [details] [diff] [review]
5: Tweak inline caches to make the build.

Ok, this gets the inline caches to build, but there are two catches:

 * The offsets are all wrong for ARM.

 * I'm not sure that using the offsets like this is a good idea. The inline-caching code makes some assumptions about the size of the instructions that are emitted, and these assumptions will be very fragile, particularly if different build options are specified, for example.

   I don't know exactly how best to do it, but can we have, for instance, a fixed-size literal pool or control structure that is emitted by {Mono,Poly}IC.cpp? This would seem to be more generic, thus easing the port to other architectures, and also more robust and less likely to break when the macro-assembler is modified. I haven't checked that this approach would be feasible, so consider it speculation.
Jacob, would it be easier to dynamically compute the offsets and store them in MICInfo? (Maybe this is what you meant.)
(In reply to comment #9)
> Jacob, would it be easier to dynamically compute the offsets and store them in
> MICInfo? (Maybe this is what you meant.)

I think this would work. In creating the initial PIC implementation, I followed JSC and V8, and I think JSC followed V8, so it's just a copy of their design decision. I don't know why they did that, but I assumed that it was to make the PIC records smaller and require fewer loads to patch a PIC. So far, I haven't really observed any perf changes as I grew or modified that structure, so I think it is OK to record all those offets dynamically. 

I didn't understand "a fixed-size literal pool or control structure that is emitted by {Mono,Poly}IC.cpp", but maybe by this you mean generating these offsets once for each distinct IC stub so that they don't need to be stored in each PIC record?
(Assignee)

Comment 11

9 years ago
This is all new code to me so I'm not sure of all the caveats, but I'm essentially talking about replacing the inline parameters with values that are loaded from some kind of table outside the code. That maps very neatly onto ARM's literal pool model, but might hurt x86.

The advantage of this approach is that the location of these parameters (and thus the values of GET_DATA_OFFSET and the like) are completely separated from the code generated by the macro assembler, so future updates to Nitro won't break the ICs.

Another benefit is that the GET_DATA_OFFSET (etc) values will be the same for all platforms.

Also, because some pointer will inevitably need to be dereferenced in order to to get at data stored in such a way, it might be possible to share these records between PIC records (as you mentioned), but I'm not really sure of the interactions there so I'll have to take your word that this is possible!

----

(In reply to comment #9)
> Jacob, would it be easier to dynamically compute the offsets and store them in
> MICInfo? (Maybe this is what you meant.)

That's not what I meant, but it might work as well; dynamically-computed offsets are less likely to require maintenance than static ones.

----

I've still not looked at the code in detail so I only have a rough idea of what's actually going on here. It's quite possible that I'm talking rubbish, but the #ifdef-wrapped offset blocks make me uncomfortable, as does the dependency on Nitro emitting known code sequences.
> I'm essentially talking about replacing the inline parameters with values 
> that are loaded from some kind of table outside the code. That maps very
> neatly onto ARM's literal pool model, but might hurt x86.

I get it now. I'm always afraid of loads on Intel, but it might be a good idea for ARM given the greater number of assembly variants. And I think I did some tests before reducing the number of loads in the PIC by one and it didn't make much of a difference. So, I am fine with doing this on ARM, and it may be worth a try on Intel at some point.

> but the #ifdef-wrapped offset blocks make me uncomfortable, as does the
> dependency on Nitro emitting known code sequences.

Sure, they're not very appealing. They are how WebKit does it (I can't remember about V8), so they're definitely workable, but from my experience also definitely a maintenance burden.
Latest moo tip from bug 578896 gives --disable-monoic and --disable-polyic configure options, which may be helpful for the ARM port. Note that on x86, passing those options currently exposes a number of bugs that normally only occur on (rare) PIC failure.
Moo builds on ARM!

Due to a issues in the tree in the case that ICs are disabled, the shell segfaults when attempting to execute any script in the JIT, but we get workable binaries now.
(Assignee)

Comment 16

9 years ago
Created attachment 457810 [details] [diff] [review]
Fix branch patching. (Still broken.)

This patch fixes branch patching by forcing masm (and stubcc) to flush the literal pool. This has to happen before any call to size() because the literal pool is constructed outside the instruction stream, and isn't included by 'size()' until it is flushed into the stream. I suspect that this patch is a poor solution, but it works (nearly).

I now see another crash, where the first instruction in stubcc branches to an invalid address. This looks like a slightly different branch patching bug.
(Assignee)

Comment 17

9 years ago
> I now see another crash, where the first instruction in stubcc branches to an
> invalid address. This looks like a slightly different branch patching bug.

In StubCompiler::leave() we link some jumps to a label. The x86 back-end emits a JMP instruction to achieve this. When the code is copied to its destination buffer in finishThisUp, everything continues to work as expected. The JMP instruction is a relative branch (I think), so it doesn't need patching. However, the ARM back-end emits an LDR into the PC to implement these branches, and this needs patching after the copy because it is an absolute branch.

I don't know the masm API well enough to know off the top of my head how to patch these branches, but I'll investigate. (Alternatively, we could force the ARM back-end to emit a relative branch, but that seems potentially awkward.)
(Assignee)

Comment 18

9 years ago
Created attachment 458618 [details] [diff] [review]
5: Fix branch patching.

  * I had to add a method to the assembler which allows Compiler.cpp to force a literal pool flush so that 'size()' returns what it ought to return. This is normally abstracted away by executableCopy, but we can't use that because we have a single contiguous buffer for the output of two assemblers.

  * I had to add an overloaded version of executableCopy that copies to an arbitrary address, and doesn't allocate memory. This allows the ARM back-end to fix up absolute addresses. (For x86, it passes straight through to memcpy.)

  * I tried to achieve the same effect is a tidier fashion by instantiating the LinkBuffers early, as one of the constructors does the executableCopy stuff. Unfortunately, we don't seem to track a MacroAssembler object in JM so this wouldn't work.

I'm not entirely sure that this is the best solution. Suggestions are welcome.
Attachment #452676 - Attachment is obsolete: true
Attachment #457810 - Attachment is obsolete: true
Attachment #458618 - Flags: review?(dvander)
(Assignee)

Comment 19

9 years ago
Oh, yeah:
[628| 43|671] 100% ==================================================>|   93.3s

Most of the failures are the 'closures/' tests. There are a few others, too. I know that the x86 port has failures when ICs are turned off, but I haven't tried it so I don't know if these numbers are similar.
(In reply to comment #19)
> Oh, yeah:
> [628| 43|671] 100% ==================================================>|   93.3s
> 
> Most of the failures are the 'closures/' tests. There are a few others, too. I
> know that the x86 port has failures when ICs are turned off, but I haven't
> tried it so I don't know if these numbers are similar.

I fixed the ICs-off failures yesterday, so things might look better now.
moo tip (d49db5482db6, http://hg.mozilla.org/users/danderson_mozilla.com/moo/rev/d49db5482db6) contains PIC fixes, and produces the following:

[668|  3|671] 100% ==================================================>|   26.0s
FAILURES:
    /home/sean/dev/moo/js/src/trace-test/tests/basic/testBug552248.js
    /home/sean/dev/moo/js/src/trace-test/tests/basic/testCrossGlobalCall.js
    /home/sean/dev/moo/js/src/trace-test/tests/jaeger/bug556525.js
(Assignee)

Comment 22

9 years ago
(In reply to comment #20)
> I fixed the ICs-off failures yesterday, so things might look better now.

I get massive failures now (on trace-tests), but they disappear with "--methodjit-only". My build with GCC 4.4.3 freezes after 33 tests with "--methodjit-only", or 57 tests with both JITs. I have no idea why at the moment, and 4.3.3 doesn't freeze.
Comment on attachment 458618 [details] [diff] [review]
5: Fix branch patching.

Thanks for tackling this, Jacob. If it makes things easier, relative jumps are appealing - but it sounded like it was not actually easier :)

re: test failures. Until JaegerFromTracer is re-ported to ARM, I expect there will be quite a few. On my x86 machine, --methodjit-only with or without ICs currently gives 0 failures.
Attachment #458618 - Flags: review?(dvander) → review+
(Assignee)

Comment 24

9 years ago
(In reply to comment #23)
> Thanks for tackling this, Jacob. If it makes things easier, relative jumps are
> appealing - but it sounded like it was not actually easier :)

Relative jumps are actually smaller and probably also faster if you can guarantee that it'll fit into ARM's ±32MB range for 'B'/'BL'/'BLX'. (This is likely to be the case here.)

Relative jumps can be implemented with a modification to the Assembler interface, without adding too much complexity. (It would require new variants of 'jmp', 'linkJump' and 'relinkJump'.)

> re: test failures. Until JaegerFromTracer is re-ported to ARM, I expect there
> will be quite a few. On my x86 machine, --methodjit-only with or without ICs
> currently gives 0 failures.

Ah, I thought that might be the cause of my problems. I still have some failures in JM-only so I'll mop those up first, perhaps get ICs working, then look at JaegerFromTracer if it still needs porting.

I pushed the latest patch:
5: http://hg.mozilla.org/users/danderson_mozilla.com/moo/rev/a0ff24e18d75

I'll resolve the other failures and ICs in other patches.
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
(Assignee)

Updated

9 years ago
Blocks: 581956
You need to log in before you can comment on or make changes to this bug.