Closed Bug 678928 Opened 13 years ago Closed 13 years ago

WebGL JS broken after pushing 649202

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox7

---

unaffected

firefox8

fixed

firefox9

fixed

People

(Reporter: romaxa, Assigned: mjrosenb)

References

(
URL
)

Details

Attachments

(1 file, 1 obsolete file)

comments saying "fmem_imm_offset is expecting the offset in words" should be heeded 13 years ago Marty Rosenberg [:mjrosenb] 1.01 KB, patch	romaxa : feedback+	Details \| Diff \| Splinter Review
should fix the issue and not have 10-bit overflow issues 13 years ago Marty Rosenberg [:mjrosenb] 2.08 KB, patch	adrake : review+ johnath : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

Oleg Romashin (:romaxa)

Reporter

Description

•

13 years ago

Latest trunk show broken WebGL on ARM
Also on 
http://learningwebgl.com/lessons/lesson04/index.html - it crashes

Oleg Romashin (:romaxa)

Reporter

Comment 1

•

13 years ago

#0  0x3aaf57fc in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:45
#1  0x3bdf2280 in CheckScript (script=<value optimized out>, prev=<value optimized out>)
    at js/src/jsscript.cpp:306
#2  0x3bdf2a98 in js_TraceScript (trc=0xae91571c, script=0x3b712448, owner=0x6) at js/src/jsscript.cpp:1442
#3  0x3bd7bd68 in ScanObject (this=0xae91571c) at js/src/jsgcmark.cpp:636
#4  js::GCMarker::drainMarkStack (this=0xae91571c) at js/src/jsgcmark.cpp:802
#5  0x3b67dd84 in setMarkColor (trc=0xae91571c, data=0x11c7) at js/src/jsgc.h:1314
#6  XPCJSRuntime::TraceJS (trc=0xae91571c, data=0x11c7) at js/src/xpconnect/src/xpcjsruntime.cpp:383
#7  0x3bd71288 in js::MarkRuntime (trc=0xae91571c) at js/src/jsgc.cpp:1846
#8  0x3bd73228 in MarkAndSweep (cx=0x405885c0, comp=<value optimized out>, gckind=GC_SHRINK, gcTimer=<value optimized out>)
    at js/src/jsgc.cpp:2272
#9  GCCycle (cx=0x405885c0, comp=<value optimized out>, gckind=GC_SHRINK, gcTimer=<value optimized out>)
    at js/src/jsgc.cpp:2642
#10 0x3bd73cd4 in js_GC (cx=0x405885c0, comp=<value optimized out>, gckind=GC_SHRINK) at js/src/jsgc.cpp:2728
#11 0x3b328c3c in nsJSContext::ScriptEvaluated (this=0x41031d80, aTerminated=4551)
    at dom/base/nsJSEnvironment.cpp:3167
#12 0x3b3245bc in nsJSContext::ScriptExecuted (this=0x41031d80) at dom/base/nsJSEnvironment.cpp:3239
#13 0x3b68a818 in AutoScriptEvaluate::~AutoScriptEvaluate (this=0xae915ae8, __in_chrg=<value optimized out>)
---Type <return> to continue, or q <return> to quit---
    at js/src/xpconnect/src/xpcwrappedjsclass.cpp:122
#14 0x3b68c9d0 in nsXPCWrappedJSClass::CallMethod (this=0x42150b50, wrapper=<value optimized out>, methodIndex=<value optimized out>, info=0x40468848, 
    nativeParams=0xae915c10) at js/src/xpconnect/src/xpcwrappedjsclass.cpp:1896
#15 0x3b6870e0 in nsXPCWrappedJS::CallMethod (this=0x42198640, methodIndex=3, info=0x40468848, params=0xae915c10)
    at js/src/xpconnect/src/xpcwrappedjs.cpp:585
#16 0x3bacb780 in PrepareAndDispatch (self=<value optimized out>, methodIndex=<value optimized out>, args=0xae915cd4)
    at xpcom/reflect/xptcall/src/md/unix/xptcstubs_arm.cpp:131
#17 0x3bacae00 in SharedStub () from /nfsobj-build/dist/bin/libxul.so
#18 0x3af25740 in nsRefreshDriver::Notify (this=0x3fd8c540, aTimer=<value optimized out>)
    at layout/base/nsRefreshDriver.cpp:361
#19 0x3baba5a4 in nsTimerImpl::Fire (this=0x42198680) at xpcom/threads/nsTimerImpl.cpp:427
#20 0x3baba720 in nsTimerEvent::Run (this=<value optimized out>) at xpcom/threads/nsTimerImpl.cpp:520
#21 0x3bab6a00 in nsThread::ProcessNextEvent (this=0x3fd0cc50, mayWait=<value optimized out>, result=<value optimized out>)
    at xpcom/threads/nsThread.cpp:631

Oleg Romashin (:romaxa)

Reporter

Comment 2

•

13 years ago

Have this problem on Maemo6 and WebOS.

Andreas Gal :gal

Updated

•

13 years ago

Group: core-security

Andreas Gal :gal

Comment 3

•

13 years ago

GC-related crash. Its unclear to me how the ARM patch caused this. Just in case we unearthed an existing GC problem, I am going to hide the bug until someone gets around to take a look. This doesn't crash x86, at least not on mac or linux, but maybe thats just luck/different memory pressure.

Oleg Romashin (:romaxa)

Reporter

Comment 4

•

13 years ago

I've bisected it... and before that commit everything works fine... after WebGL broken and some WEbGL examples crashes.

Oleg Romashin (:romaxa)

Reporter

Comment 5

•

13 years ago

ARM patch might cause some VFP incorrect behavior and memory corruption.

Marty Rosenberg [:mjrosenb]

Assignee

Comment 6

•

13 years ago

It is entirely possible that I generated an instruction that does not exist on your particular chip.  Do you know what type of chip you have or what instruction sets are supported?
While running JIT'ed code, that is no good reason for the stack to look like anything sane, but that does appear as if you have a rather sane stack that is far from any JIT'ed code.

Oleg Romashin (:romaxa)

Reporter

Comment 7

•

13 years ago

Here is the CPUINFO on chips where I see problems with WebGL
HP Touchpad:
Processor	: ARMv7 Processor rev 2 (v7l)
processor	: 0
BogoMIPS	: 13.52
Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x51
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0x02d
CPU revision	: 2
Hardware	: TENDERLOIN
Revision	: 0000
Serial		: 0000000000000000

N9:
Processor	: ARMv7 Processor rev 2 (v7l)
BogoMIPS	: 996.74
Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x3
CPU part	: 0xc08
CPU revision	: 2
Hardware	: Nokia RM-696 board
Revision	: 1504
Serial		: 0000000000000000
Soc Info        : OMAP 3630 ES1.2-hs
        IDCODE  : 2b89102f
        Pr. ID  : 00000000 00000000 000000cc cafeb891

Oleg Romashin (:romaxa)

Reporter

Comment 8

•

13 years ago

Also on N9 we are using hardfp toolchain if that make any difference

Marty Rosenberg [:mjrosenb]

Assignee

Comment 9

•

13 years ago

That looks like it supports all of the instructions that I would have used.  I only have an android device for testing.  I'll see if it reproduces there, and otherwise, I'll borrow someone's device that does exhibit the issue.

Oleg Romashin (:romaxa)

Reporter

Comment 10

•

13 years ago

if you are in MV office, the you probably can take N9 from dougt... For webOS you need additional patches to compile FF... probably JS part is easy to compile there

Oleg Romashin (:romaxa)

Reporter

Comment 11

•

13 years ago

Tested this also on Galaxy Tab 9.0xx , and it works wrong also there, and crashes too.
http://learningwebgl.com/lessons/lesson05/index.html
http://learningwebgl.com/lessons/lesson04/index.html

Oleg Romashin (:romaxa)

Reporter

Comment 12

•

13 years ago

any progress here? this seems like on the way to 8.0 release, and presumably breaking WebGL on android (at least galaxy tab similar HW)

Andreas Gal :gal

Comment 13

•

13 years ago

We cannot keep carrying this regression. If we don't have a fix today, please back out the offending patch.

Marty Rosenberg [:mjrosenb]

Assignee

Comment 14

•

13 years ago

I haven't been able to get a maemo development environment set up yet, and the problem doesn't reproduce on my nexus-s. I'll see if I can borrow someone's galaxy S today in order to get to the point where I can actually start debugging this.

Oleg Romashin (:romaxa)

Reporter

Comment 15

•

13 years ago

Not sure about galaxy S, but I have Webg broken on galaxy Tab... hope HW is the same for them

Marty Rosenberg [:mjrosenb]

Assignee

Comment 16

•

13 years ago

Sorry, I meant galaxy tab, not galaxy S (must be too used to typing nexus S).

Marty Rosenberg [:mjrosenb]

Assignee

Comment 17

•

13 years ago

I still haven't been able to get an environment to reproduce this... 
A simple fix may be to just disable fast typed arrays on arm rather than to back out the whole patch.  It should be as simple as removing the line: ENABLE_POLYIC_TYPED_ARRAY=1
from the arm section of js/src/configure.in

Andreas Gal :gal

Comment 18

•

13 years ago

Is that disabling any optimization we didn't have before the patch? And what is the path to fix that?

Daniel Veditz [:dveditz]

Comment 19

•

13 years ago

Why would the specific ARM chip matter if it's a GC bug?

Assignee: general → mrosenberg

Andreas Gal :gal

Comment 20

•

13 years ago

Also, what does it mean that you can't get an environment to reproduce this? I have it reproduced on my galaxy s2. If you don't have one you can use mine. I am downstairs on the 2nd floor.

Oleg Romashin (:romaxa)

Reporter

Comment 21

•

13 years ago

(In reply to Daniel Veditz from comment #19)
> Why would the specific ARM chip matter if it's a GC bug?

I think we have a deal with memory corruption, which endup in GC...

Andreas Gal :gal

Comment 22

•

13 years ago

If we suspect memory corruption, the patch has to come out. In particular if we have no explanation whats going on here exactly. Lets investigate this on a project branch instead of carrying risk on trunk. Thats what project branches are there for.

Marty Rosenberg [:mjrosenb]

Assignee

Comment 23

•

13 years ago

Attached patch comments saying "fmem_imm_offset is expecting the offset in words" should be heeded (obsolete) — Details — Splinter Review

Looks like I got this right in 4/5 of the callsites, and the last callsite is never hit by any of our tests.
In other news, fmem_imm_off is something that shouldn't be called from outside ARMAssembler.*, so I'm about to clean that up.

Marty Rosenberg [:mjrosenb]

Assignee

Comment 24

•

13 years ago

additional thought: it would appear as if storeFloat does not just store floats.  It takes a double, converts it to a float, then stores that.  This is blatantly obvious because storefloat takes a "floatReg", which is used for both single precision and double precision values.

If storeFloat is actually supposed to convert a double to a float before writing it to memory, then most of the variants of storeFloat are wrong, which means that they are not tested anywhere.  I'm happy to change it, but we should get some tests that make sure this is doing something sane.

Oleg Romashin (:romaxa)

Reporter

Comment 25

•

13 years ago

Comment on attachment 554036 [details] [diff] [review]
comments saying "fmem_imm_offset is expecting the offset in words" should be heeded

ok, with this patch I see webgl working fine again, and no crash.

Attachment #554036 - Flags: feedback+

Marty Rosenberg [:mjrosenb]

Assignee

Comment 26

•

13 years ago

Attached patch should fix the issue *and* not have 10-bit overflow issues — Details — Splinter Review

as (sort of) stated earlier, fmem_imm_off cannot be called with any offset.  In order to have a sane macro assembler, this has been wraped in floatTransfer, which will ensure that the offset is encoded correctly.
for whatever reason, I'd called fmem_imm_off, rather than floatTransfer.  this has been mitigated.

Attachment #554036 - Attachment is obsolete: true

Marty Rosenberg [:mjrosenb]

Assignee

Updated

•

13 years ago

Attachment #554243 - Flags: review?(adrake)

Andrew Drake [:adrake]

Updated

•

13 years ago

Attachment #554243 - Flags: review?(adrake) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

13 years ago

tracking-firefox8: --- → ?

tracking-firefox9: --- → ?

Marty Rosenberg [:mjrosenb]

Assignee

Comment 27

•

13 years ago

Comment on attachment 554243 [details] [diff] [review]
should fix the issue *and* not have 10-bit overflow issues

This is a pretty big security risk.  Using a typed array, and crafting your indexes into it, you can write single floating point values up to 3 times the length of the array past the end of the array.  Most likely this would just corrupt the heap but I cannot say for sure.

The fix is plenty simple.  It calling an already existent wrapper function that
will encode the offset, but it will apply the correct shift, and it will also guard against cases where the offset is out of range for the given instruction.

Attachment #554243 - Flags: approval-mozilla-aurora?

Marty Rosenberg [:mjrosenb]

Assignee

Updated

•

13 years ago

Keywords: checkin-needed

Daniel Holbert [:dholbert]

Comment 28

•

13 years ago

Landed on m-c:
 http://hg.mozilla.org/mozilla-central/rev/af2098f08fc7

Status: NEW → RESOLVED

Closed: 13 years ago

Keywords: checkin-needed

Resolution: --- → FIXED

Johnathan Nightingale [:johnath]

Comment 29

•

13 years ago

Comment on attachment 554243 [details] [diff] [review]
should fix the issue *and* not have 10-bit overflow issues

Discussed and approved in triage - we're assuming this isn't needed on beta? (Please land it soon!)

Attachment #554243 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Oleg Romashin (:romaxa)

Reporter

Comment 30

•

13 years ago

beta does not have fix for 649202 IIRC

Daniel Holbert [:dholbert]

Comment 31

•

13 years ago

Landed on Aurora, for Firefox 8:
http://hg.mozilla.org/releases/mozilla-aurora/rev/9a243f427279

status-firefox8: --- → fixed

Daniel Veditz [:dveditz]

Updated

•

13 years ago

status-firefox7: --- → unaffected

status-firefox9: --- → fixed

tracking-firefox8: ? → +

tracking-firefox9: ? → +

Daniel Veditz [:dveditz]

Updated

•

13 years ago

Group: core-security

You need to log in before you can comment on or make changes to this bug.