Closed
Bug 433793
Opened 17 years ago
Closed 16 years ago
Port TT to x64
Categories
(Tamarin Graveyard :: Tracing Virtual Machine, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: edwsmith, Unassigned)
Details
Attachments
(2 files, 11 obsolete files)
448 bytes,
patch
|
Details | Diff | Splinter Review | |
18.33 KB,
patch
|
edwsmith
:
review+
|
Details | Diff | Splinter Review |
big job, capturing some thoughts on major areas to work. this bug can serve as a clearing house for dependent bugs.
starting points...
1. should Atom go away entirely on x64?
2. can we stuff 64bit pointers into Box?
Reporter | ||
Comment 1•17 years ago
|
||
pointers up to 48 bits could fit in Box. however should we limit to 32 bit pointers to keep odd-sized values away from the cpu?
Comment 2•17 years ago
|
||
IMHO, would be nice if MMgc could fit all allocations into 48 bits. Is it reasonable to assume that we can map all our allocations into the lower 48 bits of logical address space?
Comment 3•17 years ago
|
||
At which virtual addresses does the stack live on various 64-bit operating systems? (I'm guessing: above the 48-bit range in some cases.) Will a Box ever need to point to stack-allocated data? Consider escape analyses that allocate data known to have stack discipline on the CPU stack. Or consider call-ins from C that allocate data there and pass pointers to them.
Reporter | ||
Comment 4•17 years ago
|
||
idea 3: abandon Box, use Atom64, and stuff doubles into Atom64 by bit banging and allocating out-of-range doubles on the heap. 3a: shrink the exponent field 11->8 bits. 3b: shrink the mantissa field 52->49 bits. either gives you a 61 bit float that captures most values, the rest overflow to heap like Atom does now.
Reporter | ||
Comment 5•17 years ago
|
||
idea 3 has been done before, too
http://www.canonware.com/~ttt/2007/07/tagged-unboxed-floating-point-numbers.html
just peeking in here -- it was suggested to me that this might be a good
project to take on for the summer, so I've started trying to tackle it. code generation shouldn't be too difficult. I've done a preliminary sketch of a NativeAMD64.h header but I'm holding off on the implementation until the interpreter compiles+works.
So, I'm trying to get everything to compile without warnings (at the moment I'm adding new forthwords for boxing objects to intptr_t). At least in the interim I'll modify Box to mask+sign extend 48-bit pointers.
I'd like to comment more on the Box/Atom discussion but I don't think I should since I'm not too familiar with MMgc yet. My personal feeling is that restricting pointer sizes to Box's limitations could hurt portability to other platforms. Even if it's in the distant future, someone will have >48-bit addresses someday. Changing the structures (edwsmith's suggestion) is my preference so far.
Comment 7•17 years ago
|
||
(In reply to comment #4)
> idea 3: abandon Box, use Atom64, and stuff doubles into Atom64 by bit banging
> and allocating out-of-range doubles on the heap. 3a: shrink the exponent field
> 11->8 bits. 3b: shrink the mantissa field 52->49 bits. either gives you a 61
> bit float that captures most values, the rest overflow to heap like Atom does
> now.
3b won't work well if the intention is to preserve full 64-bit float precision, because many common values will have non-zero least significant bits, thus commonly forcing boxing. 3a, on the other hand, works for everything but the most extreme representable values, which are mainly useful for scientific computing purposes anyway.
Updated•17 years ago
|
Assignee: nobody → danderson
Just to clarify, sorry for not saying this earlier -- for now I've tabled porting the interpreter because the tracer seems more important for the TraceMonkey effort. I'm writing programs in straight LIR and fixing all of the code generation issues that come up, which is pretty straightforward.
I left the interpreter with hard feelings toward the Box type :) There has to be wrappers to box and unbox pointers properly, maintaining the x64 requirements of bits 63-48 on addresses. It was fairly easy to change the Forth scripts and generated C++ code to use wrappers with Boxes, but there's many cases in the AVM classes that need to be changed as well. For example, I think avmplusList was passing &box->obj to GC write barriers.
My personal feeling is that the solution proposed by Edwin Smith and Jason Evans is best; replacing Box's internal representation with a double that steals 3 bits from the exponent. No matter what though, Box and lots of code that uses it seems to need refactoring, so such a change could probably be transparent once hidden as an implementation detail.
Once I finish the code generation changes I'll post a patch and then unassign myself (unless someone wants to unassign me sooner).
Small note on bits 63-48 of addresses: they need to be sign extended from the 48th bit. But from what I can tell, Linux and OS X only use 47 bits, and Windows only uses 44 bits, so bit 47 will always be 0. Stack addresses occur at the top (but within) those ranges. I haven't confirmed this about Solaris yet, but "for now," it suffices to not sign extend and to just clear the top bits.
This fixes the fact that, in 64-bit mode, StringBug fails to pass the GC's debug check that an object is 0'd.
The reason the original code didn't work is because RCObject has extra bytes for alignment on 64-bit mode. But when StringBuf inherits from RCObject, the alignment goes away, and sizeof(RCObject) == sizeof(StringBuf).
Reporter | ||
Comment 10•17 years ago
|
||
Attachment #326356 -
Flags: review?
Reporter | ||
Updated•17 years ago
|
Attachment #326356 -
Flags: review? → review?(danderson)
Edwin -- looks like the attachment got messed up :)
Reporter | ||
Comment 12•17 years ago
|
||
take two
Attachment #326356 -
Attachment is obsolete: true
Attachment #326381 -
Flags: review?(danderson)
Attachment #326356 -
Flags: review?(danderson)
Updated•17 years ago
|
Attachment #326381 -
Flags: review?(danderson) → review+
Attachment #326522 -
Flags: review?(edwsmith)
Reporter | ||
Comment 14•17 years ago
|
||
should this patch be on bug 440478? also, the previous patch was removing "nanojit::", this patch is adding it. which one is right?
well, this is the start of my x64-portability patches but I can attach them there if you'd like. I think the namespace has to be explicit, GCC was complaining otherwise.
Reporter | ||
Comment 16•17 years ago
|
||
no prob, if this is for x64, then cool. but maybe sync up with Andreas since it seems like a tug-o-war that should be resolved
Comment 17•17 years ago
|
||
I thought my patch added nanojit:: ? I don't use nanojit as default address space.
There will be some overlap in SSE code between i386 and amd64. It's only a small amount though so unless it really benefits, I'll try and abstract it further later.
Please let me know if I did things wrong or against style. To get around the new abstraction I moved some local variables in Assembler::gen() to be members of the object. It didn't look like re-entrancy was an issue but I made it safe anyway.
Attachment #326583 -
Flags: review?(edwsmith)
Reporter | ||
Comment 19•17 years ago
|
||
Comment on attachment 326381 [details] [diff] [review]
add LIR_feq, flt, etc, and LIR_stq, stqi
pushed, marking this patch done.
Attachment #326381 -
Attachment is obsolete: true
Reporter | ||
Comment 20•17 years ago
|
||
can insImmPtr do if (sizeof(void*)==8) ... ? one less ifdef, compiler ought to optimize out the constant branch, no?
Reporter | ||
Comment 21•17 years ago
|
||
Comment on attachment 326522 [details] [diff] [review]
adds helper function for pointers, fixes namespace assumptions
included in patch in bug 440748
Attachment #326522 -
Attachment is obsolete: true
Attachment #326522 -
Flags: review?(edwsmith)
Reporter | ||
Comment 22•17 years ago
|
||
Comment on attachment 326583 [details] [diff] [review]
moves fpu-specific code into nativei386
overall, the mods that factor code into 386 specific functions looks right.
gen() is not reentrant, so curCall can be removed, okay to reference member vars directly.
for simple cases where a helper just needs extra params, go ahead and pass them as args.
if asm_fop can always get the opcode from the ins being passed in, we dont need redundant args. conversely if sometimes we call it with a specific opcode, how about passing the opcode+operands instead of an only-partially-used instruction
LMK if you are unable to test on thumb (vs2008 with ppc emulators is enough to test).
Attachment #326583 -
Flags: review?(edwsmith) → review-
Reporter | ||
Comment 23•17 years ago
|
||
(In reply to comment #21)
> (From update of attachment 326522 [details] [diff] [review])
> included in patch in bug 440748
>
that should be bug 440478
take two
Attachment #326583 -
Attachment is obsolete: true
Updated•17 years ago
|
Attachment #326727 -
Attachment is patch: true
Attachment #326727 -
Attachment mime type: application/octet-stream → text/plain
Updated•17 years ago
|
Attachment #326727 -
Flags: review?(edwsmith)
Nope I can't test on Thumb at the moment -- I've only got Visual Studio Express. I'll try to get my hands on a copy of professional.
Attachment #326971 -
Flags: review?(edwsmith)
Reporter | ||
Comment 27•17 years ago
|
||
Comment on attachment 326971 [details] [diff] [review]
fixes ptr to int32 casts
in Assembler for LIR_loop (near line 1186) LDi emits a 32bit immediate, so if sizeof(intptr_t) != 4, we need to use a different instruction.
this patch causes gcc on osx to complain about the %d in the verbose output for LDi, but changing it to %p won't be right for other places it is used.
Attachment #326971 -
Flags: review?(edwsmith) → review-
Comment on attachment 326971 [details] [diff] [review]
fixes ptr to int32 casts
will resubmit when amd64 stuff is ready
Attachment #326971 -
Attachment is obsolete: true
Reporter | ||
Updated•17 years ago
|
Attachment #326727 -
Flags: review?(edwsmith) → review+
Reporter | ||
Comment 29•17 years ago
|
||
Comment on attachment 326727 [details] [diff] [review]
moves fpu-specific code into nativei386
pushed this patch http://hg.mozilla.org/tamarin-tracing/index.cgi/rev/601e5a41ce1e
Updated•17 years ago
|
Attachment #326727 -
Attachment is obsolete: true
I don't mind whether it lands early or not, but before it's production there are a few personal nitpicks outstanding:
-On Linux it's likely CALL/imm32 will be out of range and I'd like to solve that. Since RAX gets evicted anyway I was thinking of just moving the value in, but perhaps that'll hurt performance.
-It assumes JMP/imm32 will be in range but that's a lot more likely.
-If we ever move beyond 6-8 parameters I'll need to redo the calling code, which perhaps should be moved into processor specific files anyway.
Attachment #328006 -
Flags: review?(edwsmith)
Updated•17 years ago
|
Attachment #328006 -
Flags: review?(rwinchel)
Note: I renamed SSE macros to have an SSE_ prefix as I went along, to catch unused macros and to make the code a bit cleaner to differentiate.
Comment 32•17 years ago
|
||
AFAIK we never go above 5 parameters now; let's cap it at that as a hard limit.
It might be worthwhile to see if we can prune the max down to 4, since IIRC that's the max for register-based calling convention on ARM.
On Win64 you only get four registers for parameter passing so that'd be a nice limit there too.
Comment 34•17 years ago
|
||
I feel strongly about having direct calls vs RAX-indirect. If we rework the page code we can just bulk-allocate a large coherent code cache on 64-bit machines. For call-outs to libc or the VM we might need thunks.
Yeah, the other option for out-of-bounds addresses is squirreling the address at the end of the page and using it as a thunk or memory load. Whether that's better than the register load or not, I don't know (I could make an argument for either and both arguments are probably wrong).
Reporter | ||
Comment 36•17 years ago
|
||
Comment on attachment 328006 [details] [diff] [review]
amd64 code generation header + necessary changes
Couple small suggestions on the latest patch (optional)
* factor the sse2 and cmov checks so AMD64 makes them compile-time-true and doesn't compile in code for the non-sse or cmov checks
* define STACK_GRANULARITY in one place in Assembler.h as sizeof(intptr_t) or sizeof(void*)
* :XXX: dvander in asm_mmq()
Attachment #328006 -
Flags: review?(edwsmith) → review+
Updated•17 years ago
|
Attachment #328006 -
Flags: review?(rwinchel) → review+
Hi Mason! Andreas told me to run this patch by you to verify it doesn't break the build.
Attachment #328006 -
Attachment is obsolete: true
Attachment #329756 -
Flags: review?(mason.chang)
Updated•17 years ago
|
Attachment #329756 -
Attachment is obsolete: true
Attachment #329756 -
Flags: review?(mason.chang)
Attachment #330070 -
Flags: review?(mason.chang)
Updated•17 years ago
|
Attachment #330070 -
Attachment is obsolete: true
Attachment #330070 -
Flags: review?(mason.chang)
Updated•17 years ago
|
Assignee: danderson → nobody
nanojit amd64 port pushed as changeset 516:d417f70ac382
Fixes some code generation bugs on AMD64. Tracemonkey runs with this now. We needed three new opcodes so I'm flagging this for review.
Note: I'm hoping we don't need more as I'm running out of opcodes. I'd still *really* like to see a separation like LIR_DBL versus LIR_I64, cases like asm_load64 get nasty as it can reserve the wrong register type.
Attachment #331204 -
Flags: review?(edwsmith)
TraceMonkey was breaking on md5+AMD64, works now. There were a few silly mistakes and I removed some debug code.
Attachment #331204 -
Attachment is obsolete: true
Attachment #331318 -
Flags: review?(edwsmith)
Attachment #331204 -
Flags: review?(edwsmith)
Reporter | ||
Comment 42•17 years ago
|
||
Comment on attachment 331318 [details] [diff] [review]
fixes a bunch of bugs, adds three new opcodes tracemonkey needed
in the places we have ifdef AMD64, is it feasable to factor that into NativeXXX.h/cpp?
that leaves the generic 64bit support. its okay to leave that in Assembler.cpp until we find a cleaner way to factor that out too.
Sure, I'll post another patch soon.
Attachment #331318 -
Attachment is obsolete: true
Attachment #332228 -
Flags: review?(edwsmith)
Attachment #331318 -
Flags: review?(edwsmith)
Reporter | ||
Comment 45•16 years ago
|
||
Comment on attachment 332228 [details] [diff] [review]
fixes a bunch of bugs, adds three new opcodes tracemonkey needed
postmortem review+, this all has landed in tm and tamarin-redux.
Attachment #332228 -
Flags: review?(edwsmith) → review+
Reporter | ||
Updated•16 years ago
|
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•