Closed Bug 650874 Opened 13 years ago Closed 13 years ago

crash [@ ExecuteTree]

Categories

(Core :: JavaScript Engine, defect)

1.9.2 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
firefox5 - unaffected
firefox6 - unaffected
firefox7 - unaffected
status2.0 --- unaffected
blocking1.9.2 --- .18+
status1.9.2 --- .18-fixed
blocking1.9.1 --- needed
status1.9.1 --- wanted

People

(Reporter: ngocminh309, Assigned: luke)

References

()

Details

(4 keywords, Whiteboard: [sg:critical?])

Crash Data

Attachments

(3 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16

error crash reporter

Reproducible: Always
Moving bp-4c8a5444-4a10-4076-b764-36ef22110418 out of the summary line.
Assignee: nobody → general
Component: General → JavaScript Engine
Product: Firefox → Core
QA Contact: general → general
Summary: bp-4c8a5444-4a10-4076-b764-36ef22110418 → crash [@ ExecuteTree]
Version: unspecified → 1.9.2 Branch
This is now our top4 topcrash for Firefox 3.6.16 and happens across all platforms. It started around April 14th and got a high spike yesterday. It's still present in yesterdays nightly build on 3.6 and in our official 3.6.17 beta version. It also affects Firefox 3.5, but 4.0 seems to be unaffected and only shows a high cpu load and memory usage. Crash also happens in a fresh profile.

Firefox 3.6.18pre: bp-f23cf48b-bed7-45a1-a243-e6c372110419

0 		@0x189bfe63 	
1 	libmozjs.dylib 	ExecuteTree 	js/src/jstracer.cpp:6276
2 	libmozjs.dylib 	js_MonitorLoopEdge 	js/src/jstracer.cpp:6749
3 	libmozjs.dylib 	js_Interpret 	js/src/jsops.cpp:904
4 	libmozjs.dylib 	js_Execute 	js/src/jsinterp.cpp:1601
5 	libmozjs.dylib 	JS_EvaluateUCScriptForPrincipals 	js/src/jsapi.cpp:5057
6 	XUL 	nsJSContext::EvaluateString 	dom/base/nsJSEnvironment.cpp:1764


Firefox 3.5.20pre: bp-4711dfda-76b6-4761-b8b7-37e8e2110419

0  	 	@0x426ce69  	
1 		@0xbfffbbc7 	
2 	libmozjs.dylib 	js_MonitorLoopEdge 	js/src/jstracer.cpp:4931
3 	libmozjs.dylib 	js_Interpret 	js/src/jsinterp.cpp:3877
4 	libmozjs.dylib 	js_Execute 	js/src/jsinterp.cpp:1622
5 	libmozjs.dylib 	JS_EvaluateUCScriptForPrincipals 	js/src/jsapi.cpp:5132
6 	XUL 	nsJSContext::EvaluateString 	dom/src/base/nsJSEnvironment.cpp:1753


As you can see there is a slightly different stack between 3.6 and 3.5.
Status: UNCONFIRMED → NEW
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
Ever confirmed: true
Keywords: crash, topcrash
OS: Windows NT → All
Hardware: x86 → All
On mozilla-central the crash has been fixed between 10022405 and 10022503.

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ddfecbc93934&tochange=a29d44f196a6

I wonder if this is related to plugins like Flash. I will further investigate.
Attached file testcase (obsolete) —
Completely unrelated to plugins. It's somekind CSS and JS related. The patches on bug 542592 or bug 461199 could have probably fixed it on mozilla-central.
Attached file Minimized testcase
This is the minimized version of the testcase. I can't reduce it more without preventing the crash to happen.
Attachment #526949 - Attachment is obsolete: true
yes, something caused an explosion in this signature just after april 13, going from around 300-400 crashes per day to now over 3000 per day.

date     crashes at
         ExecuteTree
20110408 401
20110409 334
20110410 386
20110411 387
20110412 364
20110413 379
20110414 1681
20110415 3268
20110416 3247

most of the new urls see after the explosion are out of wyciwg cache at for the site www.av.by

1428 wyciwyg: http: www.av.by
 378 wyciwyg: http: av.by

http://www.av.by/public/public.php?event=View&public_id=2221144

when not coming from the wyciwyg cache there are also crashes on av.by

 1 http://www.av.by/public/search.php?event=Search&category_parent=545&country_id=1&category_id=0&body_type_id=0&class_id=0&engine_type_all=1&volume_value=0&volume_value_max=0&cylinders_number=0&run_value=0&run_value_max=0&run_unit=-1&transmission_id=0&year_
   1 http://www.av.by/public/public.php?event=View_XXL&public_id=2238995
   1 http://www.av.by/public/public.php?event=View&public_id=2233754
   1 http://www.av.by/public/public.php?event=View&public_id=2110218
   1 http://www.av.by/public/index.php?event=3&category_id=78&show_new=
   1 http://www.av.by/public/index.php?event=3&category_id=1135&country_id=0&order_id=3&show_new=0&page=2
   1 http://www.av.by/
   1 http://forum.av.by/viewtopic.php?p=586579
   1 http://av.by/public/public.php?event=View&public_id=2063090
   1 http://av.by/public/public.php?event=Image_Next&public_id=2240666&image_id=2
   1 http://av.by/news/index.php?event=View&news_id=13276
Blocking the in progress branch releases. We intend to build tomorrow for a different issue and need this resolved by then
blocking1.9.1: ? → .19+
blocking1.9.2: ? → .17+
as mentioned this is mostly affecting 3.6.

checking --- ExecuteTree 20110416-crashdata.csv
found in: 3.6.16 3.6.13 4.0 3.6.3 3.6 3.6.8 3.6.6 3.6.15 3.6.10 3.6.12 3.6.14 3.6b2 3.6.4 3.6.11 3.6.2 3.6b5 3.6b1 3.6.7 3.6.3plugin1 4.0b12 3.6b4 3.6b3 3.6.9 4.0b9 4.0b8 4.0b7 4.0b13pre 4.0.1 3.6.17pre
release total-crashes
              ExecuteTree crashes
                         pct.
all     270520  3247    0.0120028
3.6.16  117456  2441    0.0207823
3.6.13  13552   250     0.0184475
4.0     66559   86      0.00129209
3.6.3   4403    72      0.0163525
3.6     3510    69      0.0196581
3.6.8   2862    62      0.0216632
3.6.6   1509    38      0.0251822
3.6.15  4134    37      0.00895017
3.6.10  2576    35      0.013587

I think we did some thing in 4.0 to improve in this area, and I'm guessing unless we figure out what changes to backport the only other option would be some site outreach to figure out content work arounds.
(In reply to comment #9)
> as mentioned this is mostly affecting 3.6.

Can you please check 3.5? It should show similar stats.

> I think we did some thing in 4.0 to improve in this area, and I'm guessing
> unless we figure out what changes to backport the only other option would be
> some site outreach to figure out content work arounds.

Yes, see my comment 3. One of those changes fixed specifically my case with linuxforums.org. I will check if I can reproduce it on av.by.

I also think that we cannot easily compare 3.6.16 with older branch releases. All of them down to 3.6 are crashing, and most users will now have updated to 3.6.16.
(In reply to comment #7)
> 1428 wyciwyg: http: www.av.by
>  378 wyciwyg: http: av.by
> 
> http://www.av.by/public/public.php?event=View&public_id=2221144

Perfect URL which crashes Namoroka immediately. I will work on a reduced testcase.
(In reply to comment #11)
> > http://www.av.by/public/public.php?event=View&public_id=2221144

This site is using the exact same method as linuxforums.org:

<style>#eva {background:url(data:,ring.fromC)}</style><script type="text/javascript" src="public.php_files/acode.js"></script>

Could be part of a framework which has been updated lately which causes this spike on different websites.
(In reply to comment #13)
> http://www.google.de/#sclient=psy&hl=de&site=&source=hp&q=%22data:%2Cring.fromC%22&aq=f&aqi=&aql=&oq=&pbx=1&fp=ab3834006b24e88c

Hit Return too early. All those listed pages are crashing for me while the pages get loaded.
Severity: normal → critical
Some more resources on the web revealed that this is some sort of attack vector. Users of several Wordpress installations have seen this problem on their blogs and never have added those files and content themselves.
Henrik, did you check this in 1.9.1 as well?
CC'ing some JS guys who wrote the code in that area.
Donato Ferrante mailed the security alias about a similar crash with the testcase

  <script>
    var Fun=eval("eval");             // required
    for(u=0; u <3; u++){ Fun(); }     // required - it MUST be >= 3
    alert("Not bug :(");              // extra - can be removed
  </script>

like the minimized testcase this has Fun=eval("eval") and a for loop. The fact that it requires at least 3 iterations makes me want to blame TraceMonkey.

The particular signature looks like it crashes at 0x4 which might be safe, depending on where we're getting the null and what else we might get there.
I suspect the problem is gone in Firefox 4 because JaegerMonkey was added and TraceMonkey is used to compile less code than previously.
(In reply to comment #3)
> On mozilla-central the crash has been fixed between 10022405 and 10022503.
> 
> Pushlog:
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ddfecbc93934&tochange=a29d44f196a6

I missed the tracemonkey merge during that day. So Nicholas comment would apply.
The fix range in comment 3 includes a Tracemonkey merge,
http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=69627183a733

If we still have TM nightlies from that long ago a fix range on that branch would help.

Confirmed that the code crashes with the JIT turned on (default), doesn't with the JIT turned off.
Given URL not valid anymore because they have removed the bogus code. Google should still be able to list a lot of other instances.
(In reply to comment #20)
> Donato Ferrante mailed the security alias about a similar crash with the
> testcase
> 
>   <script>
>     var Fun=eval("eval");             // required
>     for(u=0; u <3; u++){ Fun(); }     // required - it MUST be >= 3
>     alert("Not bug :(");              // extra - can be removed
>   </script>
> 
> like the minimized testcase this has Fun=eval("eval") and a for loop. The fact
> that it requires at least 3 iterations makes me want to blame TraceMonkey.
> 
> The particular signature looks like it crashes at 0x4 which might be safe,
> depending on where we're getting the null and what else we might get there.

Thanks (and thank Donato) for the test case. You are correct, this is a crash in tracejit code. The 0x4 comes from here:

037AFE70  mov         ecx,dword ptr ds:[4]  

AFAIK, the tracejit does not use segment registers. I don't think that much does, and so I don't know much about them. In particular, I don't know if an attacker would be able to control them somehow. If not, then this seems not exploitable.

I'll poke at it a little more to see if I can spot the bug.
The "ds:" above seems to be an artifact of the MSVC disassembler. Anyway, the tracejit is definitely trying to generate code that loads from NULL[4], so this should not be exploitable.
Does this still happen on trunk if mjit is disabled?  That is, is this really branch-specific, or is it just a matter of heuristics whether we hit it on trunk?
(In reply to comment #28)
> Does this still happen on trunk if mjit is disabled?  That is, is this really
> branch-specific, or is it just a matter of heuristics whether we hit it on
> trunk?

Ooo, good idea. I just tested it on 4.0.1 and it doesn't happen there with mjit disabled, so maybe it has been fixed.
And we're ending up on trace in the relevant code with mjit disabled?
I got this fix range from nightlies:

http://hg.mozilla.org/tracemonkey/pushloghtml?startdate=2010-02-15&enddate=2010-02-17

The fix is probably this one:

http://hg.mozilla.org/tracemonkey/rev/e91417e33a53

Not sure if it would land cleanly to 1.9.2 or how much work it would be to rebase.
Thanks for the quick investigation guys! We're not going to hold the current release for this but want it for the next 3.6 one.
blocking1.9.1: .19+ → .20+
blocking1.9.2: .17+ → .18+
Now it's topcrash #3. But it looks like the numbers are getting lower again, now that website admins are removing the suspicious code from their websites.
Ugh, http://hg.mozilla.org/tracemonkey/rev/e91417e33a53 is exactly the type of patch that you don't want to backport; its rather big and depends on all the scary things that have been in flux since 1.9.2.

By the way, the probable reason the patch appears to fix the bug is that it changes eval from a slow native to a fast native which causes a different codegen path to be taken in the tracer.
(In reply to comment #32)
> Thanks for the quick investigation guys! We're not going to hold the current
> release for this but want it for the next 3.6 one.

CC'ing Cheng, so support is also aware of the current situation with this crash.
(In reply to comment #34)
> Ugh, http://hg.mozilla.org/tracemonkey/rev/e91417e33a53 is exactly the type
> of patch that you don't want to backport

Luke: can you think of a less scary way to fix this in 3.6? I don't care if we slow things down by dropping off trace if that's what it takes (within reason, hopefully).

> By the way, the probable reason the patch appears to fix the bug is that it
> changes eval from a slow native to a fast native which causes a different
> codegen path to be taken in the tracer.

Does that mean the most recent releases might still be vulnerable if we found a different function that was still a slow native? Does the bug even have anything to do with slow natives directly, that is, would there be other ways to send us down the path that leads to bad codegen or is it the "deal with slow natives" code itself that's buggy?
Assignee: general → luke
Whiteboard: [sg:critical?]
(In reply to comment #36)
> (In reply to comment #34)
> > Ugh, http://hg.mozilla.org/tracemonkey/rev/e91417e33a53 is exactly the type
> > of patch that you don't want to backport
> 
> Luke: can you think of a less scary way to fix this in 3.6? I don't care if
> we slow things down by dropping off trace if that's what it takes (within
> reason, hopefully).

Yeah, disable tracing slow natives.  Very low risk.  Measuring in the shell, its a 7% SS regression and 2% V8 regression.  Seems reasonable to me; I've always wanted to regress SS :)

> > By the way, the probable reason the patch appears to fix the bug is that it
> > changes eval from a slow native to a fast native which causes a different
> > codegen path to be taken in the tracer.
> 
> Does that mean the most recent releases might still be vulnerable if we
> found a different function that was still a slow native?

Nah; I killed slow natives in bug 581263.
Attached patch don't trace slow natives (obsolete) — Splinter Review
This fixes the crash for the test case in comment 20.
Attachment #532412 - Flags: review?(gal)
Comment on attachment 532412 [details] [diff] [review]
don't trace slow natives

What are slow natives used for these days?
Ah wait this is branch only, not trunk. r=me.
Comment on attachment 532412 [details] [diff] [review]
don't trace slow natives

Time to EOL 3.6 and 3.5.
Attachment #532412 - Flags: review?(gal) → review+
Attachment #532412 - Flags: approval1.9.2.18?
Comment on attachment 532412 [details] [diff] [review]
don't trace slow natives

Approved for 1.9.2.18. Please land this on releases/mozilla-1.9.2 default. If you are so inclined, you can land this on releases/mozilla-1.9.1 as well for 3rd parties maintaining the old branch.

Also, can you confirm that Firefox 5 is unaffected?
Attachment #532412 - Flags: approval1.9.2.18?
Attachment #532412 - Flags: approval1.9.2.18+
Attachment #532412 - Flags: approval1.9.1.20+
Based on the patch that fixed the crash, I guessed the cause was something to do with slow natives + indirect eval, but I hadn't actually done a full triage to confirm this.  To give a confident answer to Christian's questions, I looked at the generated code and indeed the bug is not with slow natives; the problem is that JSOP_CALLGVAR and record_JSOP_CALLGVAR do different things for 'this'.  Calling a slow native ('eval' being the most common) just touches 'this' in a way to crash, but I can also get an (always NULL) crash from:

  function f() {}
  function g() { this.f(); }
  var h = g;
  for (var i = 0; i < 8; ++i)
    h();

The obvious fix (attached) fixes this case and the one in comment 20.  Both testcases crash on 1.9.1 and the patch (unmodified) fixes both.  I'm fairly confident this is the right fix, but this area has all sorts of subtleties and I didn't write it so I'd appreciate careful consideration by the reviewers.
Attachment #532412 - Attachment is obsolete: true
Attachment #532761 - Flags: review?(jorendorff)
Attachment #532761 - Flags: review?(gal)
Oh, and to actually answer comment 42: this-passing logic was rewritten (again) for FF4 and for all the JSOP_CALL* cases I can see 'this' handling in the tracer matches the interpreter (although its a bit hairy so second opinions welcome).
blocking1.9.1: .20+ → needed
Whiteboard: [sg:critical?] → [sg:critical?][needs r= from gal/jorendorff]
Comment on attachment 532761 [details] [diff] [review]
(maybe) fix actual problem

gal: Looks good. This must be a leftover from the lazy this calculation. A + from jorendorff too would be nice as second opinion.
Attachment #532761 - Flags: review?(gal) → review+
Attachment #532761 - Flags: review?(jorendorff) → review+
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/ba5edb80dc24
Whiteboard: [sg:critical?][needs r= from gal/jorendorff] → [sg:critical?]
Attachment #532412 - Flags: approval1.9.2.18-
Attachment #532412 - Flags: approval1.9.2.18+
Attachment #532412 - Flags: approval1.9.1.20-
Attachment #532412 - Flags: approval1.9.1.20+
Attachment #532761 - Flags: approval1.9.2.18+
Attachment #532761 - Flags: approval1.9.1.20+
Patch doesn't apply cleanly to 1.9.1, I manually patched it though as the code in the surrounding area was the same:

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d21c370f1009

(I know we aren't doing a 1.9.1 update again, but doesn't really hurt to land it there)
blocking1.9.1: needed → .20+
branch-only bug, we can resolve this now
Status: NEW → RESOLVED
blocking1.9.1: .20+ → needed
Closed: 13 years ago
Resolution: --- → FIXED
<mutter>stupid no-warning mid-air collisions</mutter>
blocking1.9.1: needed → .20+
(In reply to comment #47)
> Patch doesn't apply cleanly to 1.9.1, I manually patched it though as the
> code in the surrounding area was the same:
> 
> http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d21c370f1009
> 
> (I know we aren't doing a 1.9.1 update again, but doesn't really hurt to
> land it there)

Patch as applied doesn't actually compile on 1.9.1
Crash Signature: [@ ExecuteTree]
Backed out of 1.9.1:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/df1ac32df5d5
blocking1.9.1: .20+ → needed
Chris dragged the new this-passing logic across the finish line, so he knows it better than I do at this point.
Verified using attached testcase with Firefox 3.6.18 build 1 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.18) Gecko/20110613 Firefox/3.6.18). Crashes cleanly in 3.6.17 and not at all in the post-fix build.
Keywords: verified1.9.2
Comment on attachment 532761 [details] [diff] [review]
(maybe) fix actual problem

I can't shed any additional light on the original source of this bug, but given what jsops.cpp does for JSOP_CALLGVAR, I think this is definitely correct.
Attachment #532761 - Flags: review+
Depends on: 663666
Group: core-security
Old TM bug, not taking for the test suite.
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.