The default bug view has changed. See this FAQ.

WebGL JS broken after pushing 649202

RESOLVED FIXED

Status

()

Core
JavaScript Engine
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: romaxa, Assigned: mjrosenb)

Tracking

Trunk
ARM
Linux
Points:
---

Firefox Tracking Flags

(firefox7 unaffected, firefox8+ fixed, firefox9+ fixed)

Details

(URL)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

6 years ago
Latest trunk show broken WebGL on ARM
Also on 
http://learningwebgl.com/lessons/lesson04/index.html - it crashes
(Reporter)

Comment 1

6 years ago
#0  0x3aaf57fc in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:45
#1  0x3bdf2280 in CheckScript (script=<value optimized out>, prev=<value optimized out>)
    at js/src/jsscript.cpp:306
#2  0x3bdf2a98 in js_TraceScript (trc=0xae91571c, script=0x3b712448, owner=0x6) at js/src/jsscript.cpp:1442
#3  0x3bd7bd68 in ScanObject (this=0xae91571c) at js/src/jsgcmark.cpp:636
#4  js::GCMarker::drainMarkStack (this=0xae91571c) at js/src/jsgcmark.cpp:802
#5  0x3b67dd84 in setMarkColor (trc=0xae91571c, data=0x11c7) at js/src/jsgc.h:1314
#6  XPCJSRuntime::TraceJS (trc=0xae91571c, data=0x11c7) at js/src/xpconnect/src/xpcjsruntime.cpp:383
#7  0x3bd71288 in js::MarkRuntime (trc=0xae91571c) at js/src/jsgc.cpp:1846
#8  0x3bd73228 in MarkAndSweep (cx=0x405885c0, comp=<value optimized out>, gckind=GC_SHRINK, gcTimer=<value optimized out>)
    at js/src/jsgc.cpp:2272
#9  GCCycle (cx=0x405885c0, comp=<value optimized out>, gckind=GC_SHRINK, gcTimer=<value optimized out>)
    at js/src/jsgc.cpp:2642
#10 0x3bd73cd4 in js_GC (cx=0x405885c0, comp=<value optimized out>, gckind=GC_SHRINK) at js/src/jsgc.cpp:2728
#11 0x3b328c3c in nsJSContext::ScriptEvaluated (this=0x41031d80, aTerminated=4551)
    at dom/base/nsJSEnvironment.cpp:3167
#12 0x3b3245bc in nsJSContext::ScriptExecuted (this=0x41031d80) at dom/base/nsJSEnvironment.cpp:3239
#13 0x3b68a818 in AutoScriptEvaluate::~AutoScriptEvaluate (this=0xae915ae8, __in_chrg=<value optimized out>)
---Type <return> to continue, or q <return> to quit---
    at js/src/xpconnect/src/xpcwrappedjsclass.cpp:122
#14 0x3b68c9d0 in nsXPCWrappedJSClass::CallMethod (this=0x42150b50, wrapper=<value optimized out>, methodIndex=<value optimized out>, info=0x40468848, 
    nativeParams=0xae915c10) at js/src/xpconnect/src/xpcwrappedjsclass.cpp:1896
#15 0x3b6870e0 in nsXPCWrappedJS::CallMethod (this=0x42198640, methodIndex=3, info=0x40468848, params=0xae915c10)
    at js/src/xpconnect/src/xpcwrappedjs.cpp:585
#16 0x3bacb780 in PrepareAndDispatch (self=<value optimized out>, methodIndex=<value optimized out>, args=0xae915cd4)
    at xpcom/reflect/xptcall/src/md/unix/xptcstubs_arm.cpp:131
#17 0x3bacae00 in SharedStub () from /nfsobj-build/dist/bin/libxul.so
#18 0x3af25740 in nsRefreshDriver::Notify (this=0x3fd8c540, aTimer=<value optimized out>)
    at layout/base/nsRefreshDriver.cpp:361
#19 0x3baba5a4 in nsTimerImpl::Fire (this=0x42198680) at xpcom/threads/nsTimerImpl.cpp:427
#20 0x3baba720 in nsTimerEvent::Run (this=<value optimized out>) at xpcom/threads/nsTimerImpl.cpp:520
#21 0x3bab6a00 in nsThread::ProcessNextEvent (this=0x3fd0cc50, mayWait=<value optimized out>, result=<value optimized out>)
    at xpcom/threads/nsThread.cpp:631
(Reporter)

Comment 2

6 years ago
Have this problem on Maemo6 and WebOS.

Updated

6 years ago
Group: core-security

Comment 3

6 years ago
GC-related crash. Its unclear to me how the ARM patch caused this. Just in case we unearthed an existing GC problem, I am going to hide the bug until someone gets around to take a look. This doesn't crash x86, at least not on mac or linux, but maybe thats just luck/different memory pressure.
(Reporter)

Comment 4

6 years ago
I've bisected it... and before that commit everything works fine... after WebGL broken and some WEbGL examples crashes.
(Reporter)

Comment 5

6 years ago
ARM patch might cause some VFP incorrect behavior and memory corruption.
(Assignee)

Comment 6

6 years ago
It is entirely possible that I generated an instruction that does not exist on your particular chip.  Do you know what type of chip you have or what instruction sets are supported?
While running JIT'ed code, that is no good reason for the stack to look like anything sane, but that does appear as if you have a rather sane stack that is far from any JIT'ed code.
(Reporter)

Comment 7

6 years ago
Here is the CPUINFO on chips where I see problems with WebGL
HP Touchpad:
Processor	: ARMv7 Processor rev 2 (v7l)
processor	: 0
BogoMIPS	: 13.52
Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x51
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0x02d
CPU revision	: 2
Hardware	: TENDERLOIN
Revision	: 0000
Serial		: 0000000000000000

N9:
Processor	: ARMv7 Processor rev 2 (v7l)
BogoMIPS	: 996.74
Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x3
CPU part	: 0xc08
CPU revision	: 2
Hardware	: Nokia RM-696 board
Revision	: 1504
Serial		: 0000000000000000
Soc Info        : OMAP 3630 ES1.2-hs
        IDCODE  : 2b89102f
        Pr. ID  : 00000000 00000000 000000cc cafeb891
(Reporter)

Comment 8

6 years ago
Also on N9 we are using hardfp toolchain if that make any difference
(Assignee)

Comment 9

6 years ago
That looks like it supports all of the instructions that I would have used.  I only have an android device for testing.  I'll see if it reproduces there, and otherwise, I'll borrow someone's device that does exhibit the issue.
(Reporter)

Comment 10

6 years ago
if you are in MV office, the you probably can take N9 from dougt... For webOS you need additional patches to compile FF... probably JS part is easy to compile there
(Reporter)

Comment 11

6 years ago
Tested this also on Galaxy Tab 9.0xx , and it works wrong also there, and crashes too.
http://learningwebgl.com/lessons/lesson05/index.html
http://learningwebgl.com/lessons/lesson04/index.html
(Reporter)

Comment 12

6 years ago
any progress here? this seems like on the way to 8.0 release, and presumably breaking WebGL on android (at least galaxy tab similar HW)

Comment 13

6 years ago
We cannot keep carrying this regression. If we don't have a fix today, please back out the offending patch.
(Assignee)

Comment 14

6 years ago
I haven't been able to get a maemo development environment set up yet, and the problem doesn't reproduce on my nexus-s. I'll see if I can borrow someone's galaxy S today in order to get to the point where I can actually start debugging this.
(Reporter)

Comment 15

6 years ago
Not sure about galaxy S, but I have Webg broken on galaxy Tab... hope HW is the same for them
(Assignee)

Comment 16

6 years ago
Sorry, I meant galaxy tab, not galaxy S (must be too used to typing nexus S).
(Assignee)

Comment 17

6 years ago
I still haven't been able to get an environment to reproduce this... 
A simple fix may be to just disable fast typed arrays on arm rather than to back out the whole patch.  It should be as simple as removing the line: ENABLE_POLYIC_TYPED_ARRAY=1
from the arm section of js/src/configure.in

Comment 18

6 years ago
Is that disabling any optimization we didn't have before the patch? And what is the path to fix that?
Why would the specific ARM chip matter if it's a GC bug?
Assignee: general → mrosenberg

Comment 20

6 years ago
Also, what does it mean that you can't get an environment to reproduce this? I have it reproduced on my galaxy s2. If you don't have one you can use mine. I am downstairs on the 2nd floor.
(Reporter)

Comment 21

6 years ago
(In reply to Daniel Veditz from comment #19)
> Why would the specific ARM chip matter if it's a GC bug?

I think we have a deal with memory corruption, which endup in GC...

Comment 22

6 years ago
If we suspect memory corruption, the patch has to come out. In particular if we have no explanation whats going on here exactly. Lets investigate this on a project branch instead of carrying risk on trunk. Thats what project branches are there for.
(Assignee)

Comment 23

6 years ago
Created attachment 554036 [details] [diff] [review]
comments saying "fmem_imm_offset is expecting the offset in words" should be heeded

Looks like I got this right in 4/5 of the callsites, and the last callsite is never hit by any of our tests.
In other news, fmem_imm_off is something that shouldn't be called from outside ARMAssembler.*, so I'm about to clean that up.
(Assignee)

Comment 24

6 years ago
additional thought: it would appear as if storeFloat does not just store floats.  It takes a double, converts it to a float, then stores that.  This is blatantly obvious because storefloat takes a "floatReg", which is used for both single precision and double precision values.

If storeFloat is actually supposed to convert a double to a float before writing it to memory, then most of the variants of storeFloat are wrong, which means that they are not tested anywhere.  I'm happy to change it, but we should get some tests that make sure this is doing something sane.
(Reporter)

Comment 25

6 years ago
Comment on attachment 554036 [details] [diff] [review]
comments saying "fmem_imm_offset is expecting the offset in words" should be heeded

ok, with this patch I see webgl working fine again, and no crash.
Attachment #554036 - Flags: feedback+
(Assignee)

Comment 26

6 years ago
Created attachment 554243 [details] [diff] [review]
should fix the issue *and* not have 10-bit overflow issues

as (sort of) stated earlier, fmem_imm_off cannot be called with any offset.  In order to have a sane macro assembler, this has been wraped in floatTransfer, which will ensure that the offset is encoded correctly.
for whatever reason, I'd called fmem_imm_off, rather than floatTransfer.  this has been mitigated.
Attachment #554036 - Attachment is obsolete: true
(Assignee)

Updated

6 years ago
Attachment #554243 - Flags: review?(adrake)

Updated

6 years ago
Attachment #554243 - Flags: review?(adrake) → review+
tracking-firefox8: --- → ?
tracking-firefox9: --- → ?
(Assignee)

Comment 27

6 years ago
Comment on attachment 554243 [details] [diff] [review]
should fix the issue *and* not have 10-bit overflow issues

This is a pretty big security risk.  Using a typed array, and crafting your indexes into it, you can write single floating point values up to 3 times the length of the array past the end of the array.  Most likely this would just corrupt the heap but I cannot say for sure.

The fix is plenty simple.  It calling an already existent wrapper function that
will encode the offset, but it will apply the correct shift, and it will also guard against cases where the offset is out of range for the given instruction.
Attachment #554243 - Flags: approval-mozilla-aurora?
(Assignee)

Updated

6 years ago
Keywords: checkin-needed
Landed on m-c:
 http://hg.mozilla.org/mozilla-central/rev/af2098f08fc7
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Comment on attachment 554243 [details] [diff] [review]
should fix the issue *and* not have 10-bit overflow issues

Discussed and approved in triage - we're assuming this isn't needed on beta? (Please land it soon!)
Attachment #554243 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Reporter)

Comment 30

6 years ago
beta does not have fix for 649202 IIRC
Landed on Aurora, for Firefox 8:
http://hg.mozilla.org/releases/mozilla-aurora/rev/9a243f427279
status-firefox8: --- → fixed
status-firefox7: --- → unaffected
status-firefox9: --- → fixed
tracking-firefox8: ? → +
tracking-firefox9: ? → +
Group: core-security
You need to log in before you can comment on or make changes to this bug.