Closed Bug 678928 Opened 11 years ago Closed 11 years ago
GL JS broken after pushing 649202
Latest trunk show broken WebGL on ARM Also on http://learningwebgl.com/lessons/lesson04/index.html - it crashes
#0 0x3aaf57fc in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:45 #1 0x3bdf2280 in CheckScript (script=<value optimized out>, prev=<value optimized out>) at js/src/jsscript.cpp:306 #2 0x3bdf2a98 in js_TraceScript (trc=0xae91571c, script=0x3b712448, owner=0x6) at js/src/jsscript.cpp:1442 #3 0x3bd7bd68 in ScanObject (this=0xae91571c) at js/src/jsgcmark.cpp:636 #4 js::GCMarker::drainMarkStack (this=0xae91571c) at js/src/jsgcmark.cpp:802 #5 0x3b67dd84 in setMarkColor (trc=0xae91571c, data=0x11c7) at js/src/jsgc.h:1314 #6 XPCJSRuntime::TraceJS (trc=0xae91571c, data=0x11c7) at js/src/xpconnect/src/xpcjsruntime.cpp:383 #7 0x3bd71288 in js::MarkRuntime (trc=0xae91571c) at js/src/jsgc.cpp:1846 #8 0x3bd73228 in MarkAndSweep (cx=0x405885c0, comp=<value optimized out>, gckind=GC_SHRINK, gcTimer=<value optimized out>) at js/src/jsgc.cpp:2272 #9 GCCycle (cx=0x405885c0, comp=<value optimized out>, gckind=GC_SHRINK, gcTimer=<value optimized out>) at js/src/jsgc.cpp:2642 #10 0x3bd73cd4 in js_GC (cx=0x405885c0, comp=<value optimized out>, gckind=GC_SHRINK) at js/src/jsgc.cpp:2728 #11 0x3b328c3c in nsJSContext::ScriptEvaluated (this=0x41031d80, aTerminated=4551) at dom/base/nsJSEnvironment.cpp:3167 #12 0x3b3245bc in nsJSContext::ScriptExecuted (this=0x41031d80) at dom/base/nsJSEnvironment.cpp:3239 #13 0x3b68a818 in AutoScriptEvaluate::~AutoScriptEvaluate (this=0xae915ae8, __in_chrg=<value optimized out>) ---Type <return> to continue, or q <return> to quit--- at js/src/xpconnect/src/xpcwrappedjsclass.cpp:122 #14 0x3b68c9d0 in nsXPCWrappedJSClass::CallMethod (this=0x42150b50, wrapper=<value optimized out>, methodIndex=<value optimized out>, info=0x40468848, nativeParams=0xae915c10) at js/src/xpconnect/src/xpcwrappedjsclass.cpp:1896 #15 0x3b6870e0 in nsXPCWrappedJS::CallMethod (this=0x42198640, methodIndex=3, info=0x40468848, params=0xae915c10) at js/src/xpconnect/src/xpcwrappedjs.cpp:585 #16 0x3bacb780 in PrepareAndDispatch (self=<value optimized out>, methodIndex=<value optimized out>, args=0xae915cd4) at xpcom/reflect/xptcall/src/md/unix/xptcstubs_arm.cpp:131 #17 0x3bacae00 in SharedStub () from /nfsobj-build/dist/bin/libxul.so #18 0x3af25740 in nsRefreshDriver::Notify (this=0x3fd8c540, aTimer=<value optimized out>) at layout/base/nsRefreshDriver.cpp:361 #19 0x3baba5a4 in nsTimerImpl::Fire (this=0x42198680) at xpcom/threads/nsTimerImpl.cpp:427 #20 0x3baba720 in nsTimerEvent::Run (this=<value optimized out>) at xpcom/threads/nsTimerImpl.cpp:520 #21 0x3bab6a00 in nsThread::ProcessNextEvent (this=0x3fd0cc50, mayWait=<value optimized out>, result=<value optimized out>) at xpcom/threads/nsThread.cpp:631
Have this problem on Maemo6 and WebOS.
GC-related crash. Its unclear to me how the ARM patch caused this. Just in case we unearthed an existing GC problem, I am going to hide the bug until someone gets around to take a look. This doesn't crash x86, at least not on mac or linux, but maybe thats just luck/different memory pressure.
I've bisected it... and before that commit everything works fine... after WebGL broken and some WEbGL examples crashes.
ARM patch might cause some VFP incorrect behavior and memory corruption.
It is entirely possible that I generated an instruction that does not exist on your particular chip. Do you know what type of chip you have or what instruction sets are supported? While running JIT'ed code, that is no good reason for the stack to look like anything sane, but that does appear as if you have a rather sane stack that is far from any JIT'ed code.
Here is the CPUINFO on chips where I see problems with WebGL HP Touchpad: Processor : ARMv7 Processor rev 2 (v7l) processor : 0 BogoMIPS : 13.52 Features : swp half thumb fastmult vfp edsp neon vfpv3 CPU implementer : 0x51 CPU architecture: 7 CPU variant : 0x0 CPU part : 0x02d CPU revision : 2 Hardware : TENDERLOIN Revision : 0000 Serial : 0000000000000000 N9: Processor : ARMv7 Processor rev 2 (v7l) BogoMIPS : 996.74 Features : swp half thumb fastmult vfp edsp neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x3 CPU part : 0xc08 CPU revision : 2 Hardware : Nokia RM-696 board Revision : 1504 Serial : 0000000000000000 Soc Info : OMAP 3630 ES1.2-hs IDCODE : 2b89102f Pr. ID : 00000000 00000000 000000cc cafeb891
Also on N9 we are using hardfp toolchain if that make any difference
That looks like it supports all of the instructions that I would have used. I only have an android device for testing. I'll see if it reproduces there, and otherwise, I'll borrow someone's device that does exhibit the issue.
if you are in MV office, the you probably can take N9 from dougt... For webOS you need additional patches to compile FF... probably JS part is easy to compile there
Tested this also on Galaxy Tab 9.0xx , and it works wrong also there, and crashes too. http://learningwebgl.com/lessons/lesson05/index.html http://learningwebgl.com/lessons/lesson04/index.html
any progress here? this seems like on the way to 8.0 release, and presumably breaking WebGL on android (at least galaxy tab similar HW)
We cannot keep carrying this regression. If we don't have a fix today, please back out the offending patch.
I haven't been able to get a maemo development environment set up yet, and the problem doesn't reproduce on my nexus-s. I'll see if I can borrow someone's galaxy S today in order to get to the point where I can actually start debugging this.
Not sure about galaxy S, but I have Webg broken on galaxy Tab... hope HW is the same for them
Sorry, I meant galaxy tab, not galaxy S (must be too used to typing nexus S).
I still haven't been able to get an environment to reproduce this... A simple fix may be to just disable fast typed arrays on arm rather than to back out the whole patch. It should be as simple as removing the line: ENABLE_POLYIC_TYPED_ARRAY=1 from the arm section of js/src/configure.in
Is that disabling any optimization we didn't have before the patch? And what is the path to fix that?
Why would the specific ARM chip matter if it's a GC bug?
Assignee: general → mrosenberg
Also, what does it mean that you can't get an environment to reproduce this? I have it reproduced on my galaxy s2. If you don't have one you can use mine. I am downstairs on the 2nd floor.
(In reply to Daniel Veditz from comment #19) > Why would the specific ARM chip matter if it's a GC bug? I think we have a deal with memory corruption, which endup in GC...
If we suspect memory corruption, the patch has to come out. In particular if we have no explanation whats going on here exactly. Lets investigate this on a project branch instead of carrying risk on trunk. Thats what project branches are there for.
Looks like I got this right in 4/5 of the callsites, and the last callsite is never hit by any of our tests. In other news, fmem_imm_off is something that shouldn't be called from outside ARMAssembler.*, so I'm about to clean that up.
additional thought: it would appear as if storeFloat does not just store floats. It takes a double, converts it to a float, then stores that. This is blatantly obvious because storefloat takes a "floatReg", which is used for both single precision and double precision values. If storeFloat is actually supposed to convert a double to a float before writing it to memory, then most of the variants of storeFloat are wrong, which means that they are not tested anywhere. I'm happy to change it, but we should get some tests that make sure this is doing something sane.
Comment on attachment 554036 [details] [diff] [review] comments saying "fmem_imm_offset is expecting the offset in words" should be heeded ok, with this patch I see webgl working fine again, and no crash.
Attachment #554036 - Flags: feedback+
as (sort of) stated earlier, fmem_imm_off cannot be called with any offset. In order to have a sane macro assembler, this has been wraped in floatTransfer, which will ensure that the offset is encoded correctly. for whatever reason, I'd called fmem_imm_off, rather than floatTransfer. this has been mitigated.
Attachment #554036 - Attachment is obsolete: true
Attachment #554243 - Flags: review?(adrake)
Comment on attachment 554243 [details] [diff] [review] should fix the issue *and* not have 10-bit overflow issues This is a pretty big security risk. Using a typed array, and crafting your indexes into it, you can write single floating point values up to 3 times the length of the array past the end of the array. Most likely this would just corrupt the heap but I cannot say for sure. The fix is plenty simple. It calling an already existent wrapper function that will encode the offset, but it will apply the correct shift, and it will also guard against cases where the offset is out of range for the given instruction.
Attachment #554243 - Flags: approval-mozilla-aurora?
Landed on m-c: http://hg.mozilla.org/mozilla-central/rev/af2098f08fc7
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment on attachment 554243 [details] [diff] [review] should fix the issue *and* not have 10-bit overflow issues Discussed and approved in triage - we're assuming this isn't needed on beta? (Please land it soon!)
Attachment #554243 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
beta does not have fix for 649202 IIRC
Landed on Aurora, for Firefox 8: http://hg.mozilla.org/releases/mozilla-aurora/rev/9a243f427279
You need to log in before you can comment on or make changes to this bug.