Last Comment Bug 769985 - TI incorrectness on big BananaBread demos
: TI incorrectness on big BananaBread demos
Status: RESOLVED FIXED
[js:p1]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla16
Assigned To: Brian Hackett (:bhackett)
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-30 21:15 PDT by Alon Zakai (:azakai)
Modified: 2012-07-08 18:29 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
minimal deterministic preload_medium.js (7.00 KB, text/plain)
2012-07-04 11:08 PDT, Alon Zakai (:azakai)
no flags Details
patch (2.22 KB, patch)
2012-07-04 15:52 PDT, Brian Hackett (:bhackett)
dvander: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Alon Zakai (:azakai) 2012-06-30 21:15:12 PDT
STR:

1. Unpack www.syntensity.com/static/bigbananabread.tar.bz2
2. Run a little webserver, python -m SimpleHTTPServer 8888
3. Browse to localhost:8888/medium.html

In FF13 stable up to FF16 Nightly, incorrect behavior will occur, after a few seconds an exception is thrown. The exception is thrown from a code region marked as "unreachable" by LLVM (line 165976, second in the stack trace), control flow should never get to there. (Note that there are some warnings before it in the console - those are expected and not related.)

If TI is disabled, or the page is run in Chrome, correct behavior will happen. It will run for a very very long time - this is a bananabread build with max assertions and runtime checks, it's the only way I could make this error 100% reproducible every time. At the end of many minutes, a game will begin (but due to the slowness of the build, you might only get a few frames per second).

Background: Over the last few days BananaBread builds have been getting intermittent crashes and hangs. I've seen this on two Linux machines and pcwalton reports he saw it on OS X as well. I am not sure why this would suddenly trigger a JS engine bug, except that we have been adding code and even small code additions can lead to very different final code after closure optimizes. Investigation showed we were doing bizarre memory accesses in Firefox with TI (but not without it), but only enough to crash us some of the time. With all runtime checks on however we assert pretty quickly.
Comment 1 Alon Zakai (:azakai) 2012-07-01 09:56:25 PDT
> Background: Over the last few days BananaBread builds have been getting intermittent crashes and hangs.

Re-reading that, I should clarify what I meant there: The crashes are that a JS exception is thrown, that is, incorrect behavior, and not a crash of the browser.
Comment 2 Brian Hackett (:bhackett) 2012-07-04 06:11:15 PDT
Hi, can you turn this into a shell testcase?  I spent several hours looking at this yesterday, and after fixing the uses of Date.now there still seem to be other sources of non-determinism.  Fixing this would make it much easier to figure out what's going on here (plus debugging browser failures is in general a PITA).
Comment 3 Brian Hackett (:bhackett) 2012-07-04 08:12:30 PDT
Another option is to look for a regressing changeset, if this used to work correctly.
Comment 4 Alon Zakai (:azakai) 2012-07-04 11:06:03 PDT
Turning this into a shell testcase means ripping out all the webgl and other browser code like loading xhrs, which is a huge amount of work. Since this problem seems to hinge on specific circumstances (it was hard for me to make the reproducing testcase submitted here), my assumption is that even if we do all the work to make a shell testcase it might just behave ok there, since removing all the webgl etc. stuff would greatly change the code.

Aside from Date.now() which we can monkeypatch, I don't think there should be any additional sources of nondeterminism. Except for the preload process which does a bunch of image and audio decode requests, and in theory they can return in random order. However preloading completes before main() is run, so I doubt it has an effect, but it is worth checking. I will attach a nearly-empty preload_medium.js file that replaces the one in the testcase, and loads just the minimum necessary files, in a deterministic order.

I looked through FF stable to nightly (13-16), it fails on all of them, so this is not a regression (unless it's one much farther back).
Comment 5 Alon Zakai (:azakai) 2012-07-04 11:08:50 PDT
Created attachment 639144 [details]
minimal deterministic preload_medium.js

With this replacing the original preload_medium.js, load behavior is deterministic (also faster). The engine will abort on missing assets, however it aborts late enough to see the bug. Specifically, without TI or on Chrome the correct result happens,

uncaught exception: exit(1) called, at lka(1)@http://localhost:8888/bb.js:171026
vp(5310140,68)@http://localhost:8888/bb.js:29356
fla(1,6430740)@http://localhost:8888/bb.js:30741
([object Array])@http://localhost:8888/bb.js:172073
e()@http://localhost:8888/bb.js:192812
()@http://localhost:8888/bb.js:192832

and with TI enabled the incorrect result is the same exception as before,

uncaught exception: abort() at R9()@http://localhost:8888/bb.js:167252
Pg(8603248)@http://localhost:8888/bb.js:165976
Z(8603248)@http://localhost:8888/bb.js:166349
dNa(3553,6408,1,1,6408,5121,8534368,false,7,8603248,1)@http://localhost:8888/bb.js:113339
hn(128,128,128,8534368,0,1,6408,3553,128,128,512,0,6408)@http://localhost:8888/bb.js:113469
O2(8599916,2,512,0,1,0,128,0)@http://localhost:8888/bb.js:114898
qk(0,0,0)@http://localhost:8888/bb.js:114834
fla(1,6430740)@http://localhost:8888/bb.js:30735
([object Array])@http://localhost:8888/bb.js:172073
e()@http://localhost:8888/bb.js:192812
()@http://localhost:8888/bb.js:192832
Comment 6 Brian Hackett (:bhackett) 2012-07-04 15:52:16 PDT
Created attachment 639193 [details] [diff] [review]
patch

OK, the problem here is that we aren't correctly checking for coherent doubles correctly when entering loops from the interpreter.  The interpreter can maintain variables as integers which have been inferred as int|double and must be represented as doubles in compiled code, so when entering from the interpreter we need to make sure that integers are converted to doubles appropriately.  The problem is we were only doing this for slots which were known to be doubles at the source of the backedge, and not at the loop head itself (where we are going to enter from the interpreter).  So triggered by code patterns like:

i = someDouble;
while (...) {
  i = someInteger;
}
Comment 7 Alon Zakai (:azakai) 2012-07-05 20:44:27 PDT
I really hope we can uplift this to aurora. Without this on 15, we can't do anything with BananaBread until 16 - which is a shame since 15 has the other 99% needed and it would be a very cool thing to announce.
Comment 8 Brian Hackett (:bhackett) 2012-07-06 05:55:14 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/d4d3ae6de60e
Comment 9 Brian Hackett (:bhackett) 2012-07-06 05:57:51 PDT
Comment on attachment 639193 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): TI
User impact if declined: Potential incorrect behavior, blocking BananaBread demo (comment 7)
Risk to taking this patch (and alternatives if risky): none
Comment 10 Alon Zakai (:azakai) 2012-07-06 11:25:00 PDT
To elaborate on comment 7 for comment 9, the issue is that without this patch, BananaBread

http://www.syntensity.com/static/night10/

in some cases experiences incorrect behavior - it hangs, or throws an invalid JS exception, and sometimes uses a lot of memory while doing so. So without this in Aurora/15, we would need to wait until 16 in order to launch anything serious with BananaBread, and we were hoping to do so in 15 since it has all the features we need to run BananaBread well (everything a first person shooter needs, from mouselock to fullscreen to fast and consistent JS).
Comment 11 Ryan VanderMeulen [:RyanVM] 2012-07-07 12:03:59 PDT
https://hg.mozilla.org/mozilla-central/rev/d4d3ae6de60e
Comment 12 Alex Keybl [:akeybl] 2012-07-08 18:16:19 PDT
Comment on attachment 639193 [details] [diff] [review]
patch

[Triage Comment]
No risk, makes us awesomer. Approved for Aurora 15.
Comment 13 Ryan VanderMeulen [:RyanVM] 2012-07-08 18:29:21 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/4ad8dc475fe3

Note You need to log in before you can comment on or make changes to this bug.