Closed Bug 697127 Opened 13 years ago Closed 13 years ago

win32 jit will intermittently produce incorrect results for float4

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: brbaker, Assigned: virgilp)

References

Details

Attachments

(1 file)

I have no idea what the root cause is here, but if you run this code with a win32 release shell forcing the jit to kick in you have about a 20% chance of this returning incorrect results!

var f4a:float4 = float4(1f);
var f4b:float4 = float4(2f);

print(float4.isEqual(f4a, f4b)); // this should return '0,0,0,0'


Testmedia: as3/Types/Float4/flt4_4_6_5_1.as
Flags: in-testsuite+
Flags: flashplayer-triage+
Flags: flashplayer-qrb+
Flags: flashplayer-bug-
Maybe the title is a little too broad, I have only seen this issue with the above code, but I have not been doing day-day work on windows.
Ok I am also seeing an intermittent failure in as3/Types/Float4/flt4_4_6_5_1.as

  as3/Types/Float4/flt4_4_6_5_1.abc : float4Array.every() sum up = 8,8,8,8 FAILED! expected: 6,6,6,6
  as3/Types/Float4/flt4_4_6_5_1.abc : float4Array.forEach() sum up = 8,8,8,8 FAILED! expected: 6,6,6,6
  as3/Types/Float4/flt4_4_6_5_1.abc : float4Array.some() sum up = 8,8,8,8 FAILED! expected: 6,6,6,6
  as3/Types/Float4/flt4_4_6_5_1.abc : float4Array.join() = 0,0,0,0|2,2,2,2|2,2,2,2|2,2,2,2 FAILED! expected: 0,0,0,0|1,1,1,1|2,2,2,2|3,3,3,3
  as3/Types/Float4/flt4_4_6_5_1.abc : float4Array.indexOf(float4(2f)) = 1 FAILED! expected: 2
  as3/Types/Float4/flt4_4_6_5_1.abc : float4Array.lastIndexOf(float4(2f)) = 4 FAILED! expected: 3
On Virgil's suggestion I also checked on a debug build to see if there was an assertion being fired. I was able to repo the issue with a debug build but no assertion was fired.
Seems to come from the bibopFloat4Allocator - it returns unaligned values on x32 in Debugger builds, causing issues at runtime.
Easiest to reproduce - run with -Dverbose, you're likely to get a crash during the parsing of the float4 pool.
Specifically the objects are not 16-byte aligned, and there's no reliable way to get the C++ compilers (not the jit) to use load-unaligned SSE instructions.

In general MMgc has not been guaranteeing anything but 8-byte alignment.

In practice in Release and Release-Debugger builds, if the object size is divisible by 2^k for k >= 3 then GCAlloc delivers 2^k-byte alignment due to how the block is carved: the block is carved into objects starting from the end of the block and going toward lower addresses, until we run out of space, as opposed to starting at an 8-byte boundary after the header and going forward until we run out of space.  This is true for both 32-bit and 64-bit builds.

In Debug and Debug-Debugger builds we add space for debugging information at the beginning and end of the object: notably we add 8 bytes at the beginning and so the pointer returned from the GC to the client is 8-byte aligned always.

I'm worried about some of the information in this bug report because it conflicts with my view that alignment should be a problem only in Debug builds: Brent reports failures in Release shells, Virgil talks about Debugger builds.  I'm pretty sure my mental model of GC and GCAlloc is correct so there may be other things going on too.

I see two ways of dealing with this.  One is to make DebugSize 16 bytes (half of which is unused); this wastes space but only in Debug builds and that may be good enough.  The other is to make DebugSize a variable and store it in the block and/or the alloc - both block and alloc are easily available given an object without needing to know DebugSize, both in FixedAlloc and GCAlloc, by masking off the low bits.  There would be added cost, but again only in Debug builds - a small space cost and a slightly larger time cost to read and add/subtract the value, instead of a constant.  For a quick experiment it might be sufficient to make DebugSize 16 bytes and make some adjustments throughout the code to deal with the ramifications, then we can discuss the ideal solution later.
Ah, additionally the object sizes in Debug builds will need to be divisible by 16 to guarantee alignment of course.

On 32-bit builds the DebugSize has been 16 and float4 objects have been 32 bytes.  With DebugSize=24 then the object size is 40 bytes which is not divisible by 16; the objects will thus have to be 48 bytes (four times larger than the payload - I think the "ideal solution" is something else).

On 64-bit builds the DebugSize has been 24 but will now be 32; 32+16=48 bytes again.

(It still remains to be investigated why there would be a problem in Release builds.)
This is useful:

     template<>
     REALLY_INLINE void*
     GC::AllocBibopType<avmplus::AtomConstants::kBibopFloat4Type>()
     {
-        return AllocBibop(bibopAllocFloat4);
+        void* p = AllocBibop(bibopAllocFloat4);
+        if (uintptr_t(p) & 15)
+        {
+            printf("Bad alignment of float4\n");
+            GCHeap::GetGCHeap()->Abort();
+        }
+        return p;
     }
 
As expected it never fires in Release or Release-Debugger builds (MacOS X), but fires in Debug and Debug-Debugger builds.
I can repro the issue with -Ojit on both Mac and Windows 7 in Release-Debugger builds but as the previous remark makes clear, in non-Debug builds is not an object alignment issue, it is something else.  In fact with -Dinterp the tests pass as expected, and the interpreter boxes everything and should hit the allocator the hardest.
Hmmm.. I'll look into it some more, seems to be two different issues then.
The patch is a stopgap solution to the alignment issue in Debug builds: it inserts four words of metadata at the beginning of an object instead of two.  When the allocation size is then rounded up to 16 bytes we're ensured that the user data for a float4 box is aligned at a 16-byte boundary.  There's a space cost to this, which I think we should ignore for now - we can afford to use this patch while we think about other solutions.

Suggest you incorporate this into the patch queue to see how it works for you (it passes testing here).
Comment on attachment 569618 [details] [diff] [review]
Align float4 payload on 16-byte boundary in Debug builds

Is this patch only expected to fix "debug" builds or should it also be a stopgap for release builds? I am only wondering because I am seeing a lot of random failures in windows32 when the JIT is enabled. When running the float4 tests. I will hit between 10 -> 500 failures depending on how my machine feels.

This patch has not made its way into the patch queue yet.
(In reply to Brent Baker from comment #11)
> Is this patch only expected to fix "debug" builds or should it also be a
> stopgap for release builds? I am only wondering because I am seeing a lot of
> random failures in windows32 when the JIT is enabled. When running the
> float4 tests. I will hit between 10 -> 500 failures depending on how my
> machine feels.
> 

To answer my own question, no this does not help the random failures I see when running a release shell with -Ojit against the as3/Types/Float4 tests
Fixed into changeset 298:1b58f0aa83fc (which also imports Lars' patch)
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
I'll create a separate bug to track the padding issue and the proposed fix.
(In reply to Lars T Hansen from comment #14)
> I'll create a separate bug to track the padding issue and the proposed fix.

Forked as bug #697672.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: