Closed Bug 769985 Opened 13 years ago Closed 13 years ago

TI incorrectness on big BananaBread demos

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla16
Tracking Status
firefox15 --- fixed

People

(Reporter: azakai, Assigned: bhackett1024)

Details

(Whiteboard: [js:p1])

Attachments

(2 files)

STR: 1. Unpack www.syntensity.com/static/bigbananabread.tar.bz2 2. Run a little webserver, python -m SimpleHTTPServer 8888 3. Browse to localhost:8888/medium.html In FF13 stable up to FF16 Nightly, incorrect behavior will occur, after a few seconds an exception is thrown. The exception is thrown from a code region marked as "unreachable" by LLVM (line 165976, second in the stack trace), control flow should never get to there. (Note that there are some warnings before it in the console - those are expected and not related.) If TI is disabled, or the page is run in Chrome, correct behavior will happen. It will run for a very very long time - this is a bananabread build with max assertions and runtime checks, it's the only way I could make this error 100% reproducible every time. At the end of many minutes, a game will begin (but due to the slowness of the build, you might only get a few frames per second). Background: Over the last few days BananaBread builds have been getting intermittent crashes and hangs. I've seen this on two Linux machines and pcwalton reports he saw it on OS X as well. I am not sure why this would suddenly trigger a JS engine bug, except that we have been adding code and even small code additions can lead to very different final code after closure optimizes. Investigation showed we were doing bizarre memory accesses in Firefox with TI (but not without it), but only enough to crash us some of the time. With all runtime checks on however we assert pretty quickly.
Group: core-security
> Background: Over the last few days BananaBread builds have been getting intermittent crashes and hangs. Re-reading that, I should clarify what I meant there: The crashes are that a JS exception is thrown, that is, incorrect behavior, and not a crash of the browser.
Whiteboard: [js:p1]
Group: core-security
Hi, can you turn this into a shell testcase? I spent several hours looking at this yesterday, and after fixing the uses of Date.now there still seem to be other sources of non-determinism. Fixing this would make it much easier to figure out what's going on here (plus debugging browser failures is in general a PITA).
Another option is to look for a regressing changeset, if this used to work correctly.
Turning this into a shell testcase means ripping out all the webgl and other browser code like loading xhrs, which is a huge amount of work. Since this problem seems to hinge on specific circumstances (it was hard for me to make the reproducing testcase submitted here), my assumption is that even if we do all the work to make a shell testcase it might just behave ok there, since removing all the webgl etc. stuff would greatly change the code. Aside from Date.now() which we can monkeypatch, I don't think there should be any additional sources of nondeterminism. Except for the preload process which does a bunch of image and audio decode requests, and in theory they can return in random order. However preloading completes before main() is run, so I doubt it has an effect, but it is worth checking. I will attach a nearly-empty preload_medium.js file that replaces the one in the testcase, and loads just the minimum necessary files, in a deterministic order. I looked through FF stable to nightly (13-16), it fails on all of them, so this is not a regression (unless it's one much farther back).
With this replacing the original preload_medium.js, load behavior is deterministic (also faster). The engine will abort on missing assets, however it aborts late enough to see the bug. Specifically, without TI or on Chrome the correct result happens, uncaught exception: exit(1) called, at lka(1)@http://localhost:8888/bb.js:171026 vp(5310140,68)@http://localhost:8888/bb.js:29356 fla(1,6430740)@http://localhost:8888/bb.js:30741 ([object Array])@http://localhost:8888/bb.js:172073 e()@http://localhost:8888/bb.js:192812 ()@http://localhost:8888/bb.js:192832 and with TI enabled the incorrect result is the same exception as before, uncaught exception: abort() at R9()@http://localhost:8888/bb.js:167252 Pg(8603248)@http://localhost:8888/bb.js:165976 Z(8603248)@http://localhost:8888/bb.js:166349 dNa(3553,6408,1,1,6408,5121,8534368,false,7,8603248,1)@http://localhost:8888/bb.js:113339 hn(128,128,128,8534368,0,1,6408,3553,128,128,512,0,6408)@http://localhost:8888/bb.js:113469 O2(8599916,2,512,0,1,0,128,0)@http://localhost:8888/bb.js:114898 qk(0,0,0)@http://localhost:8888/bb.js:114834 fla(1,6430740)@http://localhost:8888/bb.js:30735 ([object Array])@http://localhost:8888/bb.js:172073 e()@http://localhost:8888/bb.js:192812 ()@http://localhost:8888/bb.js:192832
Attached patch patchSplinter Review
OK, the problem here is that we aren't correctly checking for coherent doubles correctly when entering loops from the interpreter. The interpreter can maintain variables as integers which have been inferred as int|double and must be represented as doubles in compiled code, so when entering from the interpreter we need to make sure that integers are converted to doubles appropriately. The problem is we were only doing this for slots which were known to be doubles at the source of the backedge, and not at the loop head itself (where we are going to enter from the interpreter). So triggered by code patterns like: i = someDouble; while (...) { i = someInteger; }
Assignee: general → bhackett1024
Attachment #639193 - Flags: review?(dvander)
Attachment #639193 - Flags: review?(dvander) → review+
I really hope we can uplift this to aurora. Without this on 15, we can't do anything with BananaBread until 16 - which is a shame since 15 has the other 99% needed and it would be a very cool thing to announce.
Comment on attachment 639193 [details] [diff] [review] patch [Approval Request Comment] Bug caused by (feature/regressing bug #): TI User impact if declined: Potential incorrect behavior, blocking BananaBread demo (comment 7) Risk to taking this patch (and alternatives if risky): none
Attachment #639193 - Flags: approval-mozilla-aurora?
To elaborate on comment 7 for comment 9, the issue is that without this patch, BananaBread http://www.syntensity.com/static/night10/ in some cases experiences incorrect behavior - it hangs, or throws an invalid JS exception, and sometimes uses a lot of memory while doing so. So without this in Aurora/15, we would need to wait until 16 in order to launch anything serious with BananaBread, and we were hoping to do so in 15 since it has all the features we need to run BananaBread well (everything a first person shooter needs, from mouselock to fullscreen to fast and consistent JS).
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla16
Comment on attachment 639193 [details] [diff] [review] patch [Triage Comment] No risk, makes us awesomer. Approved for Aurora 15.
Attachment #639193 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: