Ionmonkey: Assembler buffer rewrite follow up work.

RESOLVED FIXED

Status

()

defect
P5
normal
RESOLVED FIXED
5 years ago
2 years ago

People

(Reporter: dougc, Assigned: dougc)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(14 attachments, 1 obsolete attachment)

1.13 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
3.15 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
1.24 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
39.92 KB, patch
Details | Diff | Splinter Review
2.20 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
1.62 KB, patch
Details | Diff | Splinter Review
6.71 KB, patch
Details | Diff | Splinter Review
1.83 KB, patch
Details | Diff | Splinter Review
1.63 KB, patch
Details | Diff | Splinter Review
69.14 KB, patch
Details | Diff | Splinter Review
52.33 KB, patch
Details | Diff | Splinter Review
60.14 KB, patch
Details | Diff | Splinter Review
288.02 KB, patch
Details | Diff | Splinter Review
6.35 KB, text/plain
Details
Assignee

Description

5 years ago
The assembler buffer rewrite patch of bug 760642 has been landed on a user repo to continue the work and some of the changes are expected to be attached here for review or feedback.

https://hg.mozilla.org/users/dtc-moz_scieneer.com/asmbuffer

The work is difficult to split up and requires coordination across many areas of code.

A common tree is expected to help with fuzzing.  Errors in the assembler buffer code are often sensitive to changes in the buffer layout and a small seemingly unrelated change might made a bug detected in fuzzing unreproducible.  With the patches under revision control it should be easier to reproduce bugs.

If I am making a mess of organizing this then please let me know?
Assignee

Comment 1

5 years ago
Landed to prevent blocking the fuzzing work, but I presume you will at least want to eye-ball such changes, and should an r+ be sought for such functional changes?

https://hg.mozilla.org/users/dtc-moz_scieneer.com/asmbuffer/rev/b1fd9c194540
Attachment #8376021 - Flags: review?(mrosenberg)
Comment on attachment 8376021 [details] [diff] [review]
Fix an alignment fill calculation

Review of attachment 8376021 [details] [diff] [review]:
-----------------------------------------------------------------

Wow, that was a silly bug.  I can only assume that I forgot the precedence of operators.
Attachment #8376021 - Flags: review?(mrosenberg) → review+
Assignee

Comment 3

5 years ago
These code paths were still using the size() method which is no longer applicable.
Attachment #8376080 - Flags: review?(mrosenberg)
Assignee

Comment 4

5 years ago
Revise to use currentOffset(), and simplify a little.

I note that there are definitions for currentOffset() and nextOffset() and they are identical.  Should nextOffset() return a great offset than currentOffset(), or should one be removed?
Attachment #8376080 - Attachment is obsolete: true
Attachment #8376080 - Flags: review?(mrosenberg)
Attachment #8376197 - Flags: review?(mrosenberg)
Assignee

Comment 5

5 years ago
With the alignment fill fix, it is now possible to have zero function bytes and this is causing assertion failures.  This patch remove these assertions.

If all the backends will be using the new assembler buffer then is it wrong for the support for the new buffer to be conditional on the JS_CODEGEN_ARM?  Would it be correct to also remove these source conditionals here?
Attachment #8376199 - Flags: review?(mrosenberg)
Assignee

Comment 8

5 years ago
There are remaining calls to masm.size() before finalizing the assembler buffer in CodeGenerator.cpp.  The code was added in bug 814489 to 'add byte counts for instruction and spill code to basic block info', see 'struct ScriptCountBlockState'.

With the reworked assembler buffer, positions are not known until the buffer is finalized.

Could this code be reworked to account for the sizes after finalizing the buffer?

It would appear to need to track a huge number of references, and it is not clear how practical that would be.  So how important is it to maintain the functionality added in bug 814489?

I understand that all backends will use the reworked assembler buffer so disabling it just on the ARM may not be an option.
Flags: needinfo?(bhackett1024)
Assignee

Comment 9

5 years ago
(In reply to Douglas Crosher [:dougc] from comment #8)
> There are remaining calls to masm.size() before finalizing the assembler
> buffer in CodeGenerator.cpp.  The code was added in bug 814489 to 'add byte
> counts for instruction and spill code to basic block info', see 'struct
> ScriptCountBlockState'.

Looking at this further, it might be adequate to use the current assembler bufferSize which is how much has been added to the intermediate buffer.  These code paths using size() appear to be used only for debug purposes so it might be useful enough even if not exact.  It will miss count variable sized instruction sequences, alignment fill, constant pools, etc.  Might this is acceptable?
Assignee

Comment 10

5 years ago
The length of the last slice to be copied over in executableCopy() does not account for the init_size() so can be too large, potentially filling past the end of the code buffer.

The calculation in the size() method appears to have a typo leading to a miscalculation of the size when the init_size is not zero.
Attachment #8376665 - Flags: review?(mrosenberg)
Attachment #8376199 - Flags: review?(mrosenberg) → review+
Attachment #8376197 - Flags: review?(mrosenberg) → review+
Comment on attachment 8376665 [details] [diff] [review]
Correct assembler buffer copy length and size computations

Review of attachment 8376665 [details] [diff] [review]:
-----------------------------------------------------------------

These don't look obviously wrong, but we should probably add a comment explaining the math behind this.  Also, some of these additions seem to be repeated a lot.  I had a plan for pulling those out into a member function, but it isn't a high priority.
Attachment #8376665 - Flags: review?(mrosenberg) → review+
Assignee

Updated

5 years ago
Attachment #8377100 - Attachment description: 5. remove the assembler buffer isAligned method with is broken and unused. → 6. remove the assembler buffer isAligned method with is broken and unused.
Depends on: 975110
(In reply to Douglas Crosher [:dougc] from comment #8)
> There are remaining calls to masm.size() before finalizing the assembler
> buffer in CodeGenerator.cpp.  The code was added in bug 814489 to 'add byte
> counts for instruction and spill code to basic block info', see 'struct
> ScriptCountBlockState'.
> 
> With the reworked assembler buffer, positions are not known until the buffer
> is finalized.
> 
> Could this code be reworked to account for the sizes after finalizing the
> buffer?
> 
> It would appear to need to track a huge number of references, and it is not
> clear how practical that would be.  So how important is it to maintain the
> functionality added in bug 814489?
> 
> I understand that all backends will use the reworked assembler buffer so
> disabling it just on the ARM may not be an option.

The number of bytes of instruction and spill code aren't really useful for anything and could be removed.  The CodeGenerator does rely on knowing which assembly instructions are associated with which MIR instructions though for -D output and the JIT inspector, which so far as I know are still used so it would be nice if they didn't break.
Flags: needinfo?(bhackett1024)
Fwiw, the tip of the repo in comment 0 doesn't build for me:

js/src/jit/CodeGenerator.cpp: In member function ‘void js::jit::ScriptCountBlockState::visitInstruction(js::jit::LInstruction*)’:
js/srcjit/CodeGenerator.cpp:3044:27: error: ‘class js::jit::MacroAssembler’ has no member named ‘bufferSize’
Assignee

Comment 17

5 years ago
There would appear to be some opportunity to split the patch:

1. an initial patch adding the Assembler interface changes to support variable sized code regions, plus a minimal simple unoptimized constant pool packing implementation that takes advantage of this new support.

2. follow up work to optimize the constant pool packing.

The constant pool packing code is complex and the review so far does not give a lot of confidence.  The allocEntryBackward code path appears to be broken to the extent that no backward entries are allocated.  The allocEntryForward code path also appears to be broken, failing to accurately account for the pools exceeding the reference limits - some paths account for the PC bias, others do not, and it's not clear if the pool header is accounted for, some paths do not account for the size of pool entries.  The snapshot logic appears to be in need of more development - it does not appear to be useful in allocEntryForward.  There seems to be a lot of complexity here.  I get the gist of some of the optimizations, and it looks like a good plan, but it's very hard to review and test such complex code, and perhaps to get it right.

Since the backward pool entry allocation code is not functional, could we just implement forward entry allocation for a start?

Is the snapshot code necessary for a minimal allocator with only forward allocation?

What else could be simplified?

Perhaps future optimized constant pool packing strategies could be implemented as run time options to allow wider testing - in a similar manner to the range of register allocation strategies.

Would you support working to split the patch up in this manner?
Flags: needinfo?(mrosenberg)
Assignee

Comment 18

5 years ago
Removed based on feedback in comment 15 that this code is not useful or needed.  Since this appears to be the only use for access to the bufferSize() from outside the Assembler, and since bufferSize() is only an estimate of the final position, this interface function has been removed too.

Landed on the asmbuffer user repo:
https://hg.mozilla.org/users/dtc-moz_scieneer.com/asmbuffer/rev/ffcdef7920d9
Assignee

Comment 19

5 years ago
(In reply to Christian Holler (:decoder) from comment #16)
> Fwiw, the tip of the repo in comment 0 doesn't build for me:
> 
> js/src/jit/CodeGenerator.cpp: In member function ‘void
> js::jit::ScriptCountBlockState::visitInstruction(js::jit::LInstruction*)’:
> js/srcjit/CodeGenerator.cpp:3044:27: error: ‘class js::jit::MacroAssembler’
> has no member named ‘bufferSize’

Strange, it has been compiling fine here.  Perhaps this was a non-ARM build.  There is probably not much point doing a lot of testing yet as the code still needs a lot of work.  Anyway, the offending code has been removed based on feedback in comment 15.
Assignee

Comment 20

5 years ago
Removed a call to allocInst() when cleaning up unused variables, sorry.
Assignee

Comment 23

5 years ago
As noted in comment 17, the backwards pool entry allocation code has been removed.  This simplifies the code a lot and makes it far more practical to review, debug, and test.

Focusing on only forward allocation turned up more issues that have been addressed.  In particular the ARM code generation prohibits the placement of pools in some sequences, and this has been extended to take an upper bounds on the size of the region so that a pool can be placed before such code sections if necessary before the code is actually allocated.  This needs a little more development. There mains a concern with large jump tables that needs to be explored.  Allocate of pool entries by code in these sequences need to be explored.

The code now passes all jit-tests on ARMv6 and ARMv7, and has been tested on ARMv6 hardware just to be sure.  Some of the large asm.js demos have been tested without failure.  More testing will be done.

Further work is needed to scrutinise the code.  There are still some potential issues to explore, but they have not shown up in testing.

Some of the 'snapshot' code is retained.  The approach is sound, but this might still be removed if a simpler algorithm is now applicable.

Backwards allocation is also sound, and could improve pool layout, but adds a lot of complexity.  I hope everyone will support moving this to follow up work and help get the current changes in good shape and landed.
Flags: needinfo?(mrosenberg)
Assignee

Comment 24

5 years ago
(In reply to Douglas Crosher [:dougc] from comment #23)
> Created attachment 8386513 [details] [diff] [review]
> 10. Remove the backward constant pool entry alloc code, simplify, debug.

Landed on the asmbuffer user repo:
https://hg.mozilla.org/users/dtc-moz_scieneer.com/asmbuffer/rev/a4eb2951305b
Assignee

Updated

5 years ago
Depends on: 970262
Assignee

Updated

5 years ago
Depends on: 980941
Assignee

Comment 25

5 years ago
This patch is mainly a reworking of the as_BranchPool code paths, plus some other misc code cleanup.

Rework actualIndex() to accept an encoded pool index, and add getEntryOffset(idx) in the pool code to decode the idx and return the offset of the pool entry.

The instruction pointer is no longer passed to actualIndex(). Trying to decode the pool entry from the instruction was not working because the branch can use an immediate offset.

Rewrite the as_BranchPool instruction as an immediate branch if within range in the late stages of the assembler.

This path still has some significant problems.  The pool entry is also needed when binding the repatch label, but there are not many bits available to encode the pool index in the 32 bit pool hint.  Some more work and re-factoring will be needed.

Landed on the asmbuffer user repo:
https://hg.mozilla.org/users/dtc-moz_scieneer.com/asmbuffer/rev/aada15b99a83

There is still a lot of work to do.  Some issues:

* The alignment slice handling is rather broken.  The alignment slices can currently shrink and grow but the code appears to assume these will not grow.  For example a full branch over an alignment slice might be transformed to a short branch in one pass and the alignment slice might grow in a later pass and make the branch destination out of range.

* The regular branches do not currently allocate pool entries.  The code depends on an inline movw/t instruction sequence which is not available on the ARMv6. For the ARMv6 pool entries will need to be allocated for these.

* It might be that the pool packer can be simplified by allocating to just one pool rather than sub-pools and irrespective of the allocation unit size.  This should allow the snapshot code to be removed.  A trivial packing strategy could be used for a start, and a linear resource should be easier to work with for more advanced packing strategies such as an ordered packing.

* The memory usage needs to be explored.  A lot of references are being tracked, and might some different data structure choices help.  It might be possible to progressively discard the intermediate buffer as it is copied to the final code object to reduce the memory usage spike.

* Performance needs to be explored.
Assignee

Updated

5 years ago
Blocks: 919592
Assignee

Updated

5 years ago
Blocks: 996715
Assignee

Updated

5 years ago
Blocks: 988791
Assignee

Updated

5 years ago
Blocks: 1000219
Assignee

Updated

5 years ago
Blocks: 1001052
Assignee

Updated

5 years ago
Blocks: 970262
No longer depends on: 970262
Assignee

Updated

5 years ago
Blocks: 861208
No longer depends on: 861208
Assignee

Updated

5 years ago
Blocks: 944972
Assignee

Comment 26

5 years ago
Conference call notes:

* Goals

The ARM backend needs to be reliable and secure. It needs to survive the fuzzing, and it needs to be maintainable. It needs to handle large code objects. A reliable and secure base is needed to build on.

The rewrite makes a significant change to the interface between the assembler and the larger compiler. References into the instruction buffer are no longer an offset but rather an index into a vector of references that are not finalized until assembly is complete. This supports some code resizing, patching of instructions such as branch sequences to use shorter encodings when possible. It is a goal to add this support even if other features are simplified so as to allow further improvements to the ARM asm buffer independent of the other platforms.

Establish Benchmarks to work toward
- Performance spent in Assembly Buffer
- Performance of generated code

* Next steps: Feed back on what Douglas has worked out so far

    ACTION: Douglas to finish some local work, then post a roll-up patch and ping for jandem for feedback.

    ACTION: Douglas to get some initial benchmark figures next week.

* Douglas' assembler buffer plans:
    * WIP with single pool
    * Performance is a concern; we should measure baseline performance.  Two issues: the compile-time performance, and the runtime performance. The asm buffer changes track references into the buffer and this might impact memory usage and performance, and to handle large code objects might require expanded storage. The simplified patch does not do such a good job packing the pools and this might degrade performance - it only allocates to a forward pool, does not align double-float constants, and only handles a pools of size 1024 whereas floats could use pools of size 4096 bytes.
    * Benchmarks:
        * Unity compile time (as reported in the JS console)
        * AWFY 'loadtime' tests
        * Massive: http://kripken.github.io/Massive/
        * Floating point tests
        * Memory increases?
Assignee

Comment 27

5 years ago
This is a WIP patch to simplify the pool allocation. It allocates to only a forward pool, to only a single pool, it assumes a maximum pool size of 1024 bytes, and it only aligns constants to 32 bits words. This is not optimal: the instructions can encode negative offsets, and the float instructions can encode an offset up to 4096 bytes, and it might help performance to align 64 bit floats to a 64 bits. The goal is to have a simple base allocation algorithm that can be scrutinized for correctness and that gets good test coverage.

The code passes the jit-test in the simulator for ARMv7 and ARMv6 code generation.

Landed on the asmbuffer user repo:
https://hg.mozilla.org/users/dtc-moz_scieneer.com/asmbuffer/rev/aca823916722
Let us know when/if this needs fuzzing on the ARM simulator and/or ARM boards :)
Assignee

Comment 29

5 years ago
Here's the combined patch against m-c tip. It's a wip with many known issues and feedback is not currently required, although any feedback would be welcomed.
Attachment #8421758 - Flags: feedback?(jdemooij)
Assignee

Comment 30

5 years ago
Benchmark comparison of the asm-buffer branch versus a standard m-c build, both optimized non-debug builds. Tested on a Nexus 4. There are results for the asmjs-apps, Kraken, and Sunspider benchmarks. Got a test failure in the Octane benchmark that need to be explored and might be a bug in the code. The results show some winners and losers but overall there does not appear to be any significant performance difference.

So I'm not worried by the grossly simplified pool allocation strategy. It might still be necessary to ask for some minor performance losses to be accepted in the short term, but it does not look like a big issue.

Still to explore the memory usage, but suggest deferring this to address some of the bugs in the code and try and get it into shape for fuzzing.
Comment on attachment 8421758 [details] [diff] [review]
Combined patch.

Review of attachment 8421758 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for posting the patch! Good to see a lot of code being removed. As you said, you're still working on it so I won't post more detailed feedback until it's more stable.
Attachment #8421758 - Flags: feedback?(jdemooij)
Assignee

Updated

5 years ago
Depends on: 1017405
Douglas, what's left here?
Flags: needinfo?(dtc-moz)
Priority: -- → P5
Assignee

Comment 33

3 years ago
Surely this all landed - must have been moved to another issue.
Flags: needinfo?(dtc-moz)

Comment 34

2 years ago
(In reply to Douglas Crosher [:dougc] from comment #33)
> Surely this all landed - must have been moved to another issue.

Closing. Would be nice to have a reference to the bugs if you can provide them!
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.