Closed Bug 638533 Opened 14 years ago Closed 14 years ago

as3/Definitions/Function/OneOptArgFunction intermittently fails on linux-mips

Categories

(Tamarin Graveyard :: Virtual Machine, defect, P3)

Other
Linux
defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 631081
Q3 11 - Serrano

People

(Reporter: cpeyer, Assigned: stejohns)

References

Details

Attachments

(2 files)

Attached file bug.as —
Failing testcase: returnNumber() = 10 FAILED! expected: 12 The test fails apx. 20% of the time. Injected by http://hg.mozilla.org/tamarin-redux/rev/5974 (Bug 635979) I've attached an excerpt of the test that reproduces the issue. Compile with: java -jar asc.jar-import builtin.abc -AS3 -in Number.as bug.as (no difference in bug behavior if with -optimize on/off) Note that if the returnStringNoPackage result is NOT printed, then the bug does not manifest itself. Expected output: outside package and outside class true
Flags: flashplayer-qrb?
Flags: flashplayer-bug+
Attached file Number.as —
Attachment #516669 - Attachment description: extraced testcase → bug.as
Attachment #516669 - Attachment mime type: application/octet-stream → text/plain
Attachment #516670 - Attachment mime type: application/octet-stream → text/plain
Flags: in-testsuite+
Flags: flashplayer-triage+
Flags: flashplayer-injection+
changeset: 6037:f40da16ca6a0 user: Chris Peyer <cpeyer@adobe.com> summary: Bug 638533: Skip intermittently failing linux-mips-release test (r=cpeyer) http://hg.mozilla.org/tamarin-redux/rev/f40da16ca6a0
Set intestsuite to ? since testcase is now being skipped for linux-mips-release builds.
Flags: in-testsuite+ → in-testsuite?
HUGE NOTE: This failure seems to be only happening on one class of MIPS hardware that we are using: FAILS: Broadcom BMIPS5000 V0.2 FPU V0.1 PASSES: Broadcom 97405 Reference I "think" the BMIPS5000 is a reference system that adobe had created as a reference for DH.
changeset: 6316:274d706edd20 user: Brent Baker <brbaker@adobe.com> summary: Bug 638533: skip the OneOptArgFunction when running on MIPS as it will fail about 50% of the time when running on the BMIPS5000 reference system (r=brbaker) http://hg.mozilla.org/tamarin-redux/rev/274d706edd20
Regarding comment #5, you've got the machine ident backwards. The 97405 is the original bcm stb form factor machines. The 97420 is the tiny ff box we commissioned for MAX, which appears to be printing "Broadcom BMIPS5000 V0.2 FPU V0.1" from some command...which one? Anyway, it is failing on our newest hardware and not the old.
Ok, Brent's original comment is actually correct as written. Sorry for the noise. It is failing on the new 97420 hardware and not the 97405.
Assigned to Steven, apparently introduced via ANI changes.
Assignee: nobody → stejohns
Status: NEW → ASSIGNED
Flags: flashplayer-qrb? → flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → Q3 11 - Serrano
I wonder if it (In reply to comment #0) > Injected by http://hg.mozilla.org/tamarin-redux/rev/5974 (Bug 635979) Seriously? And only on certain MIPS machines? Um...
Does this only occur with the JIT? Perhaps same issue as bug 658389?
(In reply to comment #11) > Does this only occur with the JIT? Perhaps same issue as bug 658389? I had stated this in the duplicate bug that I had accidentally created, but was not mentioned here: "The testcase must be jitted for this to fail, and only fails about 50% of the time" I will attempt to compile with the patch posted to bug 658389 and see what happens with this testcase.
(In reply to comment #12) > I will attempt to compile with the patch posted to bug 658389 and see what > happens with this testcase. The patch id 533798 from bug 658389 does NOT resolve this issue.
Not sure where to start with this, or even how to access this machine. Is it possible to run the failing test(s) and get a stacktrace? Also, since it's intermittent, are we *certain* the injection point was that ANI change? (I ask because it seems like a really unlikely injection point...)
Steven: Trevor has a machine that you can use.
Running in -Dverbose mode with this file segfaults: #0 0x0076ad00 in appendHexVals (str=0x8bf000 <Address 0x8bf000 out of bounds>, valFrom=0x2ab91014 "", valTo=0x2aba0e10 "") at ../nanojit/Native.h:149 #1 0x00784ed4 in nanojit::Assembler::asm_branchtarget (this=0x2abb1010, targ=0x0) at ../nanojit/NativeMIPS.cpp:1443 #2 0x0078527c in nanojit::Assembler::asm_bxx (this=0x2abb1010, branchOnFalse=true, condop=nanojit::LIR_eqi, ra=2, rb=16, targ=0x0) at ../nanojit/NativeMIPS.cpp:1458 #3 0x00794fe8 in nanojit::Assembler::asm_branch (this=0x2abb1010, branchOnFalse=true, cond=0x2ab8d5c0, targ=0x0) at ../nanojit/NativeMIPS.cpp:1631 #4 0x007657f8 in nanojit::Assembler::asm_jcc (this=0x2abb1010, ins=0x2ab8d5cc, pending_lives=@0x7f9f5348) at ../nanojit/Assembler.cpp:1332 #5 0x00766890 in nanojit::Assembler::gen (this=0x2abb1010, reader=0x7f9f5990) at ../nanojit/Assembler.cpp:1830 #6 0x00767174 in nanojit::Assembler::assemble (this=0x2abb1010, frag=0x2ab84030, reader=0x7f9f5990) at ../nanojit/Assembler.cpp:1072 #7 0x006146e8 in avmplus::CodegenLIR::emitMD (this=0x7f9f5aa8) at ../core/CodegenLIR.cpp:7334 #8 0x00649790 in avmplus::BaseExecMgr::verifyJit (this=0x2aac1090, m=0x2aaea798, ms=0x2ab545a0, toplevel=0x2aad50a8, abc_env=0x2ab53028, osr=0x0) at ../core/exec-jit.cpp:255 #9 0x00643378 in avmplus::BaseExecMgr::verifyMethod (this=0x2aac1090, m=0x2aaea798, toplevel=0x2aad50a8, abc_env=0x2ab53028) at ../core/exec.cpp:357 #10 0x00643470 in avmplus::BaseExecMgr::verifyOnCall (env=0x2ab7a2e8) at ../core/exec.cpp:334 #11 0x006434c4 in avmplus::BaseExecMgr::verifyInvoke (env=0x2ab7a2e8, argc=1, args=0x7f9f5e44) at ../core/exec.cpp:319 #12 0x00738200 in avmplus::callprop_b<avmplus::Toplevel*> (env=0x2aad50a8, base=716689481, multiname=0x2ab206d4, argc=1, atomv=0x7f9f5e44, vtable=0x2ab5c048, b=0x51) at ../core/MethodEnv-inlines.h:154 #13 0x0065cc94 in avmplus::interpBoxed (env=0x2ab7a478, _argc=0, _atomv=0x7f9f7af0) at ../core/Toplevel-inlines.h:93 #14 0x0064557c in avmplus::BaseExecMgr::invokeInterpNoCoerce (env=0x2ab7a478, argc=0, atomv=0x7f9f7af0) at ../core/exec.cpp:895 #15 0x006455e8 in avmplus::BaseExecMgr::initInvokeInterpNoCoerce (env=0x2ab7a478, argc=0, args=0x7f9f7af0) at ../core/exec.cpp:216 #16 0x006434e4 in avmplus::BaseExecMgr::verifyInvoke (env=0x2ab7a478, argc=0, args=0x7f9f7af0) at ../core/exec.cpp:320 #17 0x00671fe0 in avmplus::MethodEnv::newclass (this=0x2aabf198, ctraits=0x2ab0c888, base=0x2ab7d048, outer=0x2aac16f8, scopes=0x7f9f7b60) at ../core/MethodEnv-inlines.h:137 #18 0x0065c3c8 in avmplus::interpBoxed (env=0x2aabf198, _argc=0, _atomv=0x7f9f94d4) at ../core/Interpreter.cpp:2468 #19 0x0064557c in avmplus::BaseExecMgr::invokeInterpNoCoerce (env=0x2aabf198, argc=0, atomv=0x7f9f94d4) at ../core/exec.cpp:895 #20 0x006455e8 in avmplus::BaseExecMgr::initInvokeInterpNoCoerce (env=0x2aabf198, argc=0, args=0x7f9f94d4) at ../core/exec.cpp:216 #21 0x006434e4 in avmplus::BaseExecMgr::verifyInvoke (env=0x2aabf198, argc=0, args=0x7f9f94d4) at ../core/exec.cpp:320 #22 0x004f47ac in avmplus::AvmCore::callScriptEnvEntryPoint (this=0x2aabe028, main=0x2aabf198) at ../core/MethodEnv-inlines.h:137 #23 0x00523030 in avmplus::AvmCore::initToplevel (this=0x2aabe028, codeContextCreator=@0x7f9f9730) at ../core/AvmCore.cpp:848 #24 0x0044dc10 in avmshell::ShellCore::createShellToplevel (this=0x2aabe028) at ../shell/ShellCore.cpp:198 #25 0x0044e3c4 in avmshell::ShellCore::setup (this=0x2aabe028, settings=@0x7f9f9b50) at ../shell/ShellCore.cpp:462 #26 0x00439b44 in avmshell::Shell::singleWorkerHelper (shell=0x2aabe028, settings=@0x7f9f9b50) at ../shell/avmshell.cpp:186 #27 0x0043e720 in avmshell::Shell::singleWorker (settings=@0x7f9f9b50) at ../shell/avmshell.cpp:177 #28 0x0043e9d4 in avmshell::Shell::run (argc=3, argv=0x7f9f9e44) at ../shell/avmshell.cpp:141 #29 0x00487b28 in main (argc=3, argv=0x7f9f9e44) at ../shell/avmshellUnix.cpp:112
I have a hack fix for the segfault (https://bugzilla.mozilla.org/show_bug.cgi?id=659384); using this I have confirmed that the JIT output is *exactly* the same between failing builds (both normal and -Ojit) ... even to the memory addresses being jitted to. So far, I'm baffled. Don't see any obvious way it's related to the ANI checkin, except for random memory motion.
This is my first time debugging MIPS, but I've found a possibly-interesting oddity: The gotcha seems to be when we call returnNumber(n:Number, ...rest). The call site ensures that the "n" argument is 8-aligned, relative to the start of the local stack storage (LIR_allocp)... but in this case, the LIR_allocp itself isn't 8-aligned, so in terms of absolute memory address, the "n" argument can be merely 4-aligned. Not sure if this could cause this error, or merely result in suboptimal load times... investigating.
any LIR_allocp with size >= 8 should be 8-aligned with respect to the frame pointer and stack pointer. If that's not true, bug. This also requires the frame pointer and stack pointer to be 8-aligned - if not, bug. Chris Dearman should be around for consultation, if it helps. I don't rememebr if the MIPS abi requires the stack pointer to remain 8-aligned or not, but I would be surprised if it didn't.
(In reply to comment #19) > any LIR_allocp with size >= 8 should be 8-aligned with respect to the frame > pointer and stack pointer. If that's not true, bug. Maybe, actually, this: Bug 631081
Note that while CodegenLIR::emitCall claims: // LIR_allocp of any size >= 8 is always 8-aligned. Assembler::arDisp is actually implemented as: inline int32_t arDisp(LIns* ins) { // even on 64bit cpu's, we allocate stack area in 4byte chunks return -4 * int32_t(ins->getArIndex()); } so it's unclear to me how the 8-aligned claim is justified...
(In reply to comment #20) > Maybe, actually, this: Bug 631081 Ouch. That definitely needs fixing; why haven't we cleaned that up and landed it? (It doesn't necessarily explain this bug, it's merely the first odd thing I've been able to identify about this snippet. I'm *hoping* that perhaps this particular hardware is randomly buggy for unaligned-double loads, but that is probably wishful thinking on my part...)
(In reply to comment #22) > (In reply to comment #20) > > Maybe, actually, this: Bug 631081 Guess what: applying that patch makes this unrepeatable (so far)... I'll torture test it but I was getting failure ~50% of the time, and with the proper-alignment patch, so far zero failures. Don't know why that would be -- maybe this particular machine is attempting to trap-and-resolve unaligned accesses, but failing to do so reliably? (I've cc'ed Chris Dearman, hopefully he can comment) In any event clearly cleaning up and landing 631081 is desirable in its own right...
(In reply to comment #21) > Note that while CodegenLIR::emitCall claims: > > // LIR_allocp of any size >= 8 is always 8-aligned. > > Assembler::arDisp is actually implemented as: > > inline int32_t arDisp(LIns* ins) > { > // even on 64bit cpu's, we allocate stack area in 4byte chunks > return -4 * int32_t(ins->getArIndex()); > > so it's unclear to me how the 8-aligned claim is justified... Unclear: yes. justified: yes, because: 1. arIndex >= 1 2. size=8 allocations should have even-numbered arIndex, thus be multiples of -8 from the frame pointer (In reply to comment #22) > (In reply to comment #20) > > Maybe, actually, this: Bug 631081 > > Ouch. That definitely needs fixing; why haven't we cleaned that up and > landed it? The comments explain. Until now, it's only been a rare-but-annoying alignment penalty. however, for MIPS, misalignment = crash, upping the priority. (what other cpu's crash on misalignment? do they show the bug? surprised if not). > (It doesn't necessarily explain this bug, it's merely the first > odd thing I've been able to identify about this snippet. I'm *hoping* that > perhaps this particular hardware is randomly buggy for unaligned-double > loads, but that is probably wishful thinking on my part...) Randomly buggy hardware? not impossible, but probably wishful thinking.
Depends on: 631081
bug 658253 is another mips segfault....relation unknown but pointing out in case.
I marked this as depending on bug 631081, but arguably it could be considered a dupe of it.
This is definitely a dupe of 631801; marking as such.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: