integrate final Thumb2 changes

ASSIGNED
Assigned to

Status

Tamarin
Virtual Machine
ASSIGNED
6 years ago
6 years ago

People

(Reporter: Tom Rodriguez, Assigned: Tom Rodriguez)

Tracking

Details

Attachments

(3 attachments, 3 obsolete attachments)

(Assignee)

Description

6 years ago
Created attachment 643017 [details] [diff] [review]
thumb2 support

the final thumb2 changes including more support for 16 bit instructions, an alternate constant pooling mechanism, and other various cleanups.
(Assignee)

Updated

6 years ago
Attachment #643017 - Attachment is patch: true
(Assignee)

Updated

6 years ago
Attachment #643017 - Flags: review?(wmaddox)
(Assignee)

Updated

6 years ago
Assignee: nobody → throdrig
Status: NEW → ASSIGNED

Comment 1

6 years ago
(In reply to Tom Rodriguez from comment #0)
> Created attachment 643017 [details] [diff] [review] thumb2 support

The diff is in the wrong format for splinter to correctly determine the filenames.
I believe P4 can be told to create a diff in the standard form, or I can provide you with a script that fixes up the default P4 diff format.

I've looked at most of the code, but have not yet examined the constant pooling
code.  Here are some comments so far:

nanojit/NativeThumb2.cpp

- asm_arg_float() needs some commentary, particularly regarding the usage of start_mask.  A few sentences describing the algorithm at a high level would also be appropriate.

- Bcc_insn() should be static inline like B_short_insn().  It then need not be declared in the NativeThumb2.h.

- In Assembler::nPatchBranch():
    // Convert LDR PC into conditional branch
                           ^^^^^^^^^^^ should be 'unconditional'
test/performance/runtests.py

- It seems a bit ugly to attempt the import of scipy.stats for each test result,
though I imagine the actual performance impact is minimal.

Comment 2

6 years ago
Tom: I've only been looking at your contributions, not the portions of this patch that I actually wrote.  You should review my code as well.
(Assignee)

Comment 3

6 years ago
Created attachment 643591 [details] [diff] [review]
updated thumb2 patch

changed Bcc_insn to static inline, added comments to asm_arg_float, fixed windows compilation problem.  attempted to fix patch format
Attachment #643017 - Attachment is obsolete: true
Attachment #643017 - Flags: review?(wmaddox)
(Assignee)

Comment 4

6 years ago
Created attachment 643604 [details] [diff] [review]
new thumb2 patch

one more try on the patch formatting.  The patch command likes it fine.
Attachment #643591 - Attachment is obsolete: true
(Assignee)

Comment 5

6 years ago
Created attachment 643608 [details] [diff] [review]
full thumb2 patch properly formatted for bugzilla

Added asm_arg_float comments to NativeARM.cpp too.
(Assignee)

Updated

6 years ago
Attachment #643604 - Attachment is obsolete: true

Comment 6

6 years ago
Continuing comments on the first patch:

- Assembler::flushConstants():

I find this code difficult to follow.  Some explanation would have helped, but since I knew basically what you were trying to achieve, I could puzzle it out.  It does seem that the logic is complicated by treating the reservation of a block of space for the pool and filling in the pool as separate activities.  I suspect that the index fiddling (_dataBytes, didAlignment, ptr) could be made simpler and/or more obvious by laying down the contents of storage one element at a time, decrementing _nIns for each, following the idiom we use elsewhere.  The purpose of the following line is obscure, even though commented:
+        _dataBytes -= 4;   // Remove extra branch and pad space
I assume this is the space added in Assembler::asm_immd() here:
+                // Reserve extra space for branch and pad
+                _dataBytes = 4;
It looks like this is added so that the flush in underrunProtect() occurs early
enough, but it could all use some elaboration.

- One of the main objectives of the new constant pooling scheme relative to the
old scheme used for integers in NativeARM.cpp is the elimination of the artificially small block size limit, but it apparently remains (CodeAlloc.cpp):
     // ARM requires single-page allocations, due to the constant pool that
     // lives on each page that must be reachable by a 4KB pc-relative load.
     static const int pagesPerAlloc = 1; // 4KB

- We should opportunistically flush constants following an unconditional branch.

- In underrunProtect(), this line certainly requires some explanation:
     pointerDiff(_firstConstInsn, _nIns) > (1024 - _dataBytes - 12 - 6 - bytes)) {
What is the significance of the magic numbers?

Comment 7

6 years ago
Continuing:

- Nit: When all data members are public, it's typical in Tamarin code to use 'struct' in preference to 'class', even if a constructor is provided. (class ThumbPatch)

- HashMap _constPatches is not actually used as a map at all, but as an array of pairs, and could be replaced by such.

- Assembler::encThumb2Double():
I have not carefully examined this yet, but it looks rather complicated.  We probably need some white-box tests for double literals.

- Assembler::VMOVi():  We appear to be using the A8.8.339 T2 encoding for VFPv3 and VFPv4.  Note that we already assume NEON (Advanced SIMD) for float4, so a NEON requirement is in our future.  Should probably comment on this.
(Assignee)

Comment 8

6 years ago
(In reply to William Maddox from comment #6)
> Continuing comments on the first patch:
> 
> - Assembler::flushConstants():
> 
> I find this code difficult to follow.  Some explanation would have helped,
> but since I knew basically what you were trying to achieve, I could puzzle
> it out.  It does seem that the logic is complicated by treating the
> reservation of a block of space for the pool and filling in the pool as
> separate activities.  I suspect that the index fiddling (_dataBytes,
> didAlignment, ptr) could be made simpler and/or more obvious by laying down
> the contents of storage one element at a time, decrementing _nIns for each,
> following the idiom we use elsewhere.

You can't lay it down ahead of time because you don't know where it will go.  It ends up right after the last instruction.  I had originally laid it out at the end up the allocation then copied it into place but that overly complicated and the data is all there to fill in table.  I could compute the offset on the fly as well though I'm not sure that matters.  I added a few more comments to hopefully make it clearer.

>  The purpose of the following line is
> obscure, even though commented:
> +        _dataBytes -= 4;   // Remove extra branch and pad space
> I assume this is the space added in Assembler::asm_immd() here:
> +                // Reserve extra space for branch and pad
> +                _dataBytes = 4;
> It looks like this is added so that the flush in underrunProtect() occurs
> early
> enough, but it could all use some elaboration.

Yes, it's so this test in underrunProtect:

    if (pc - bytes - _dataBytes < top) {

doesn't need extra logic.  If you have suggestions of how to comment it better let me know.

> 
> - One of the main objectives of the new constant pooling scheme relative to
> the
> old scheme used for integers in NativeARM.cpp is the elimination of the
> artificially small block size limit, but it apparently remains
> (CodeAlloc.cpp):
>      // ARM requires single-page allocations, due to the constant pool that
>      // lives on each page that must be reachable by a 4KB pc-relative load.
>      static const int pagesPerAlloc = 1; // 4KB

Yes, I missed that because I assumed the NJ_MAX_CPOOL_OFFSET passed into codeAlloc was responsible for controlling the allocation size.  I believe that forcing pagesPerAlloc to be 4k on ARM is unnecessary as well, since it's the codeAlloc calls that actually enforces the reach.  Anyway, I've removed THUMB2 from that ifdef

> 
> - We should opportunistically flush constants following an unconditional
> branch.

I added that, which required changing the arguments to flushConstants.  I'm using bools but maybe it should be an enum?

> 
> - In underrunProtect(), this line certainly requires some explanation:
>      pointerDiff(_firstConstInsn, _nIns) > (1024 - _dataBytes - 12 - 6 -
> bytes)) {
> What is the significance of the magic numbers?

That was leftovers.  During the original testing I was hitting bugs that I originally though were logic problems with this computation and I was tweaking the values down.  I eventually found the real bug but never went back and correct the values here.  It should be 1020 - _dataBytes - bytes.  I added some comments explaining it better.
(Assignee)

Comment 9

6 years ago
(In reply to William Maddox from comment #7)
> Continuing:
> 
> - Nit: When all data members are public, it's typical in Tamarin code to use
> 'struct' in preference to 'class', even if a constructor is provided. (class
> ThumbPatch)

Fixed.

> 
> - HashMap _constPatches is not actually used as a map at all, but as an
> array of pairs, and could be replaced by such.

Originally I found Seq class pretty bizarre so I ignored it and used HashMap.  I've changed the code to use a SeqBuilder.

> 
> - Assembler::encThumb2Double():
> I have not carefully examined this yet, but it looks rather complicated.  We
> probably need some white-box tests for double literals.

I tested it in a little C program against the GNU assembler when I originally wrote it.  I'll try to add some tests to exercise it more directly.

> 
> - Assembler::VMOVi():  We appear to be using the A8.8.339 T2 encoding for
> VFPv3 and VFPv4.  Note that we already assume NEON (Advanced SIMD) for
> float4, so a NEON requirement is in our future.  Should probably comment on
> this.

What do you want done here?
(Assignee)

Comment 10

6 years ago
(In reply to William Maddox from comment #1)
> (In reply to Tom Rodriguez from comment #0)
> > Created attachment 643017 [details] [diff] [review] thumb2 support
> 
> The diff is in the wrong format for splinter to correctly determine the
> filenames.
> I believe P4 can be told to create a diff in the standard form, or I can
> provide you with a script that fixes up the default P4 diff format.

bugzilla seems very picky about patch format.  It seems to want

diff f1 f2
--- f1
+++ f2

if the diff part is missing it can't seem to separate them.

> 
> 
> - It seems a bit ugly to attempt the import of scipy.stats for each test
> result,
> though I imagine the actual performance impact is minimal.

This is a bug ugly.  I can turn it into a real flag option instead of burying it.
(Assignee)

Comment 11

6 years ago
Created attachment 644022 [details] [diff] [review]
minor fixes on top of previous patch
(Assignee)

Comment 12

6 years ago
Created attachment 644490 [details] [diff] [review]
change the ttest into a real option in runtests.py

This patch adds a --ttest option along with appropriate imports.

Comment 13

6 years ago
Comment on attachment 644490 [details] [diff] [review]
change the ttest into a real option in runtests.py

How is this change related to the thumb2 work? There should be a separate bug to update to using scipy
Attachment #644490 - Flags: feedback-
(Assignee)

Comment 14

6 years ago
(In reply to Brent Baker from comment #13)
> Comment on attachment 644490 [details] [diff] [review]
> change the ttest into a real option in runtests.py
> 
> How is this change related to the thumb2 work? There should be a separate
> bug to update to using scipy

You're right.  I'll separate that out.  I'll also separate out the fix in core/MethodInfo.cpp which is related to the parsing of MethodRecognizer's.
You need to log in before you can comment on or make changes to this bug.