Last Comment Bug 746036 - crash in js::GetNameFromBytecode or js::ContextStack::currentScriptWithDiagnostics mainly with Trusteer Rapport & Yahoo Toolbar or Browser For Change
: crash in js::GetNameFromBytecode or js::ContextStack::currentScriptWithDiagno...
Status: RESOLVED FIXED
[js:p1][leave open]
: crash, topcrash
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla16
Assigned To: David Mandelin [:dmandelin]
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-16 18:56 PDT by David Mandelin [:dmandelin]
Modified: 2012-07-11 08:16 PDT (History)
15 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
wontfix
+
wontfix
+
wontfix
+
fixed


Attachments
Diagnostic patch (937 bytes, patch)
2012-04-26 15:18 PDT, David Mandelin [:dmandelin]
wmccloskey: review+
Details | Diff | Splinter Review
Patch, stop querying for a script we already have (13.17 KB, patch)
2012-06-25 18:13 PDT, David Mandelin [:dmandelin]
bhackett1024: review+
Details | Diff | Splinter Review

Description David Mandelin [:dmandelin] 2012-04-16 18:56:06 PDT
Spun off from bug 737780. See 

https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=currentScriptWithDiagnostics&reason_type=contains&date=04%2F17%2F2012%2001%3A46%3A43&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=js%3A%3AContextStack%3A%3AcurrentScriptWithDiagnostics%28unsigned%20char**%29
(nightly)

https://crash-stats.mozilla.com/report/list?signature=js%3A%3AGetNameFromBytecode
(aurora)

It's the same kind of compartment mismatch as before (script->compartment() != cx_->compartment), but it doesn't have the same root cause.

Without STR, the obvious ways forward are to either (a) look at the stack and see where a compartment check/autoenter is missing, or (b) put fatal compartment assertions earlier to pinpoint the bug. Either way could take a while. We could alternatively change the signature of currentScript to take a cx, which might allow us to skip these crashes but would be masking a compartment mismatch, so it might end up crashing later.
Comment 1 David Mandelin [:dmandelin] 2012-04-26 15:18:51 PDT
Created attachment 618820 [details] [diff] [review]
Diagnostic patch

I think these are all caused by a compartment mismatch on entry to JS. Usually it's from XUL, but not always. It seems easiest to put an opt assert here, find the crashes, and add compartment entries as needed.
Comment 2 [PTO to Dec5] Bill McCloskey (:billm) 2012-04-26 15:40:29 PDT
Comment on attachment 618820 [details] [diff] [review]
Diagnostic patch

You might want to mark JS_ExecuteScript as JS_NEVER_INLINE so it's easy to find in crash reports. Or call something crashy that's JS_NEVER_INLINE and has an obvious name.
Comment 3 David Mandelin [:dmandelin] 2012-04-26 16:11:07 PDT
Diagnostic landed:

http://hg.mozilla.org/integration/mozilla-inbound/rev/79423e65d2d8

JS_ExecuteScript isn't inlined in the current crash reports, but I slapped a JS_NEVER_INLINE on for good measure.
Comment 4 Jeff Walden [:Waldo] (remove +bmo to email) 2012-04-26 18:14:01 PDT
Comment on attachment 618820 [details] [diff] [review]
Diagnostic patch

Review of attachment 618820 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi.cpp
@@ +5233,5 @@
>      AssertNoGC(cx);
>      CHECK_REQUEST(cx);
>      assertSameCompartment(cx, obj);
> +    if (cx->compartment != obj->compartment())
> +        *(int *) 0 = 0xf0;

clang warns building this:

/home/jwalden/moz/slots/js/src/jsapi.cpp:5237:9: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
        *(int *) 0 = 0xf0;
        ^~~~~~~~~~
/home/jwalden/moz/slots/js/src/jsapi.cpp:5237:9: note: consider using __builtin_trap() or qualifying pointer with 'volatile'

So the easiest fix would be to make that |volatile int*|.

mfbt has a MOZ_CRASH() macro which expands into inline code which will trigger a crash -- and because it's inline, Breakpad reports will point to the location where it was used.  But I'm guessing your 0xf0 is important here as skidmark crash diagnosis, and MOZ_CRASH() behaves identically in all cases, so that macro doesn't work.  Perhaps we should augment it for this.  Hmm.
Comment 5 :Ms2ger (⌚ UTC+1/+2) 2012-04-27 06:56:35 PDT
https://hg.mozilla.org/mozilla-central/rev/79423e65d2d8
Comment 6 Jeff Walden [:Waldo] (remove +bmo to email) 2012-04-28 10:20:01 PDT
In a spate of warning-fixing just now, I ended up adding |volatile| to this:

https://hg.mozilla.org/integration/mozilla-inbound/rev/27d42a3feef1
Comment 7 Ryan VanderMeulen [:RyanVM] 2012-04-29 15:58:21 PDT
Warning fix:
http://hg.mozilla.org/mozilla-central/rev/27d42a3feef1
Comment 8 David Mandelin [:dmandelin] 2012-04-30 13:28:40 PDT
No data has come back yet. I'll check again in a week.
Comment 9 David Mandelin [:dmandelin] 2012-05-02 17:28:20 PDT
I don't think this is serious enough to worry about for 13. It'll take time to get back enough data to do something about it.
Comment 10 Lukas Blakk [:lsblakk] use ?needinfo 2012-06-02 11:39:19 PDT
Is there some data now?  I'm going to discontinue tracking this for 14 since I see only 1 or 2 crashes per day and it's at #255 on the 14.0a2 topcrashers list. Please renominate for tracking if this increases in severity for user experience.
Comment 11 Scoobidiver (away) 2012-06-08 02:22:19 PDT
It's #10 top browser crasher in 13.0 and #98 in 14.0a2.

It's correlated to Trusteer Rapport and Yahoo! Toolbar in 13.0:
js::GetNameFromBytecode|EXCEPTION_ACCESS_VIOLATION_READ (867 crashes)
    103% (897/867) vs.   4% (3125/87387) atl80.dll
     73% (635/867) vs.   3% (2242/87387) rooksdol.dll
         43% (373/867) vs.   2% (1340/87387) 2.110.0.0
         30% (262/867) vs.   1% (895/87387) 2.115.0.0
     73% (635/867) vs.   3% (2242/87387) RapportUtil.dll
         43% (373/867) vs.   2% (1340/87387) 3.5.1108.78
         30% (262/867) vs.   1% (897/87387) 3.5.1201.76
     73% (635/867) vs.   3% (2242/87387) rookscom.dll
         73% (635/867) vs.   3% (2240/87387) 2.16.0.0
          0% (0/867) vs.   0% (2/87387) 2.9.0.0
     73% (635/867) vs.   3% (2264/87387) rooksbas.dll
         43% (373/867) vs.   2% (1364/87387) 2.56.0.0
         30% (262/867) vs.   1% (898/87387) 2.57.0.0
     73% (635/867) vs.   3% (2305/87387) riched20.dll
          1% (7/867) vs.   0% (36/87387) 5.30.23.1221
          0% (2/867) vs.   0% (12/87387) 5.30.23.1228
         14% (125/867) vs.   1% (534/87387) 5.30.23.1230
          0% (2/867) vs.   0% (5/87387) 5.30.23.1231
          0% (1/867) vs.   0% (2/87387) 5.31.23.1225
          1% (5/867) vs.   0% (11/87387) 5.31.23.1227
          1% (7/867) vs.   0% (30/87387) 5.31.23.1228
         20% (172/867) vs.   1% (571/87387) 5.31.23.1229
         36% (314/867) vs.   1% (1104/87387) 5.31.23.1230
     75% (652/867) vs.  12% (10772/87387) GdiPlus.dll
          0% (1/867) vs.   0% (354/87387) 5.1.3102.5512
          0% (1/867) vs.   0% (22/87387) 5.2.6000.16386
          1% (5/867) vs.   0% (108/87387) 5.2.6000.16782
          1% (7/867) vs.   0% (162/87387) 5.2.6001.18551
          1% (11/867) vs.   1% (529/87387) 5.2.6001.22319
          0% (1/867) vs.   0% (69/87387) 5.2.6002.18342
         16% (136/867) vs.   2% (1343/87387) 5.2.6002.18581
          1% (7/867) vs.   0% (292/87387) 5.2.6002.22509
         14% (119/867) vs.   2% (2162/87387) 5.2.6002.22791
          2% (14/867) vs.   1% (709/87387) 6.1.7600.16385
          3% (24/867) vs.   1% (595/87387) 6.1.7600.17007
          0% (2/867) vs.   0% (291/87387) 6.1.7601.17514
         37% (324/867) vs.   4% (3370/87387) 6.1.7601.17825
     40% (350/867) vs.   1% (1196/87387) RapportTanzan13.DLL
         21% (178/867) vs.   1% (663/87387) 3.5.1108.78
         20% (172/867) vs.   1% (533/87387) 3.5.1201.76
     33% (285/867) vs.   1% (1037/87387) RapportTanzan13.dll
         22% (195/867) vs.   1% (674/87387) 3.5.1108.78
         10% (90/867) vs.   0% (363/87387) 3.5.1201.76

js::GetNameFromBytecode|EXCEPTION_ACCESS_VIOLATION_READ (867 crashes)
     46% (403/867) vs.   8% (6758/87387) {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar, https://addons.mozilla.org/addon/2032)
     14% (119/867) vs.   0% (177/87387) browseforchange@browseforchange.com
     12% (103/867) vs.   2% (1583/87387) crossriderapp2258@crossrider.com
      9% (76/867) vs.   0% (210/87387) jid1-vfCLiWMJHrGCNw@jetpack
     10% (84/867) vs.   3% (2539/87387) toolbar@ask.com

Bug 681521 and bug 761598 also implies Trusteer Rapport on Mac in 13.0.
Comment 12 Alex Keybl [:akeybl] 2012-06-08 15:20:54 PDT
(In reply to Scoobidiver from comment #11)
> It's #10 top browser crasher in 13.0 and #98 in 14.0a2.

Do you know if this jumped post-release, or in one of our betas?
Comment 13 Scoobidiver (away) 2012-06-09 00:12:35 PDT
(In reply to Scoobidiver from comment #11)
>      46% (403/867) vs.   8% (6758/87387)
> {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar,
> https://addons.mozilla.org/addon/2032)
Mainly the latest version of Yahoo! Toolbar:
46% (399/867) vs.   7% (6514/87387) 2.4.8.20120412011105

(In reply to Alex Keybl [:akeybl] from comment #12)
> Do you know if this jumped post-release, or in one of our betas?
It affects betas too at high volume: #30 in 13.0b3 over the last 4 weeks, #40 in 13.0b4 over the last 3 weeks, #34 in 13.0b5 over the last 2 week, #21 in 13.0b6 and #13 in 13.0b7 over the last week.
Correlations to Trusteer Rapport first appeared on June 7, two days after 13.0 was released. Before this date, there were a correlation to Browser For Change (about 40%).
Comment 14 Alex Keybl [:akeybl] 2012-06-11 09:45:33 PDT
Adding qawanted to try to do exploratory testing with the latest version of Yahoo Toolbar. Also adding Kev to see if he has contacts at Yahoo that can help out here.
Comment 15 juan becerra [:juanb] 2012-06-12 13:18:08 PDT
I'll get some eyes on it.
Comment 16 Paul Silaghi, QA [:pauly] 2012-06-13 05:35:51 PDT
I'm not able to see any crashes on:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0 and Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0beta7
with Yahoo! Toolbar 2.4.8.20120412011105.
Comment 17 Scoobidiver (away) 2012-06-13 07:03:29 PDT
On June 13 in 13.0, it's still correlated to both Yahoo! Toolbar 2.4.8.20120412011105 and Trusteer Rapport:
js::GetNameFromBytecode|EXCEPTION_ACCESS_VIOLATION_READ (609 crashes)
     50% (304/609) vs.   7% (6831/94312) {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar 2.4.8.20120412011105, https://addons.mozilla.org/addon/2032)
     66% (402/609) vs.   2% (1780/94312) RapportUtil.dll
     42% (254/609) vs.   1% (1076/94312) 3.5.1108.78
     24% (144/609) vs.   1% (657/94312) 3.5.1201.76
        0% (2/609) vs.   0% (6/94312) 3.5.1201.77
        0% (2/609) vs.   0% (27/94312) 3.5.1201.78
Comment 18 David Mandelin [:dmandelin] 2012-06-21 18:19:30 PDT
Hmmm, I see this with any volume only on 14, and very few with the latest build, so it looks like something could have fixed it. I'll just keep watching for now.
Comment 19 Scoobidiver (away) 2012-06-22 05:25:33 PDT
It's still high: #21 in 13.0.1.

Based on crash correlations, there are no crashes correlated to Trusteer Rapport in Beta, Aurora and Nightly, so it's only used by release version users.
Comment 20 Marcia Knous [:marcia - use ni] 2012-06-25 05:07:26 PDT
Alex asked me to check status on this again - currently in 13.0.1 js::GetNameFromBytecode ranks as #23 top crash. It appears in 14.0b7 and b8 in much smaller numbers but is still present and is not present at all on Aurora or Trunk according to the signature summary.

Current correlations pulled manually:

js::GetNameFromBytecode|EXCEPTION_ACCESS_VIOLATION_READ (329 crashes)
     31% (102/329) vs.   0% (210/131542) browseforchange@browseforchange.com
     19% (63/329) vs.   2% (2449/131542) crossriderapp2258@crossrider.com
     24% (78/329) vs.   8% (10493/131542) {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar, https://addons.mozilla.org/addon/2032)
     16% (51/329) vs.   4% (5189/131542) plugin@yontoo.com
     16% (53/329) vs.   6% (7361/131542) ffxtlbr@babylon.com
     12% (41/329) vs.   3% (3995/131542) toolbar@ask.com
      9% (31/329) vs.   0% (241/131542) playbryte@playbryte.com
      9% (29/329) vs.   2% (2777/131542) ffxtlbr@funmoods.com
     11% (37/329) vs.   5% (6925/131542) avg@toolbar
      6% (21/329) vs.   1% (1466/131542) appbar@alot.com
      7% (22/329) vs.   2% (2160/131542) {2D3F3651-74B9-4795-BDEC-6DA2F431CB62}
      7% (23/329) vs.   2% (2569/131542) {BBDA0591-3099-440a-AA10-41764D9DB4DB}

For js::ContextStack::currentScriptWithDiagnostics(unsigned char**) - this signature only shows on 14 beta starting with B6. The correlations for that crash show:

js::ContextStack::currentScriptWithDiagnostics(unsigned char**)|EXCEPTION_ACCESS_VIOLATION_WRITE (12 crashes)
     33% (4/12) vs.   2% (1456/61706) mozilla_cc@internetdownloadmanager.com (IDM CC, https://addons.mozilla.org/addon/6973)
     17% (2/12) vs.   0% (2/61706) npaocfzzzv@npaocfzzzv.org
     17% (2/12) vs.   0% (2/61706) {c4d0e6bc-dc18-4b82-9c27-816cd4935e1b}
     17% (2/12) vs.   0% (2/61706) zwhkppjpzp@zwhkppjpzp.org
     17% (2/12) vs.   0% (276/61706) OneClickDownload@OneClickDownload.com
      8% (1/12) vs.   0% (1/61706) phphhyygcv@phphhyygcv.org
      8% (1/12) vs.   0% (6/61706) dictionary@vi.mozdev.org (Vietnamese Dictionary, https://addons.mozilla.org/addon/7479)
      8% (1/12) vs.   0% (9/61706) {DCE88800-9606-11DC-8919-D33056D89593}
      8% (1/12) vs.   0% (9/61706) contact@youtube2mp3.to
      8% (1/12) vs.   2% (1195/61706) {336D0C35-8A85-403a-B9D2-65C292C39087}
      8% (1/12) vs.   3% (1800/61706) {EB9394A3-4AD6-4918-9537-31A1FD8E8EDF}
      8% (1/12) vs.   3% (1913/61706) ffxtlbr@funmoods.com
Comment 21 Scoobidiver (away) 2012-06-25 05:26:34 PDT
As stated in other bugs, no users on Beta, Aurora or Nightly populations use Trusteer Rapport based on crash correlations per module.
That explains the difference of volume between those who have Trusteer Rapport & Yahoo Toolbar or Browser For Change (#23 in 13.0.1) and those who have Browser For Change (14.0 Beta).
Comment 22 David Mandelin [:dmandelin] 2012-06-25 18:10:49 PDT
- For the add-on-correlated crashes, if the add-on has binary components, then it's a compartment mismatch bug in the add-on. We need to teach them to use the compartment enter APIs. An example is:

  https://crash-stats.mozilla.com/report/index/b8c76511-406f-440c-8ace-e03a52120624

where RapportTanzan13.DLL is almost certainly using compartment incorrectly.

- For the small number of other crashes on Beta, it's hard to say what's going on. It could be a bug in currentScript. Anyway, the fact that GetNameFromBytecode even calls currentScript is a design flaw, because its (transitive) callers always have a script, so they should just pass it through. I'm about to post a patch for that.

An issue with the patch is that it will also cause us not to crash at this point when there is a compartment mismatch. This is good, because we don't crash; but it's bad, because it masks a compartment mismatch, which will cause other problems down the line. We should really do *something* to catch these, but I'll take that up on the JS mailing list.
Comment 23 David Mandelin [:dmandelin] 2012-06-25 18:13:09 PDT
Created attachment 636565 [details] [diff] [review]
Patch, stop querying for a script we already have

Brian, the main question I have is whether I'm using VMFrame::script correctly. Btw, I also happened to notice this:

void JS_FASTCALL
ic::SetProp(VMFrame &f, ic::PICInfo *pic)
{
    JSScript *script = f.fp()->script();

in other code. Should that be f.script()?
Comment 24 Brian Hackett (:bhackett) 2012-06-28 07:27:34 PDT
Comment on attachment 636565 [details] [diff] [review]
Patch, stop querying for a script we already have

Review of attachment 636565 [details] [diff] [review]:
-----------------------------------------------------------------

The uses of f.script() with f.pc() are correct in the mjit stubs, these give the current location regardless of any inlining the JIT did.  The PIC compilers use f.fp()->script() because that is the script whose JIT contains the PICs themselves.  Looking again now, the script member of the compilers are only used for diagnostics (spew(...)) and should just be removed.  (Probably was used in the past for fetching the JITScript, but now the compiler uses f.chunk() for that)

::: js/src/jsinterp.cpp
@@ +917,5 @@
>                                  JSObject *start_, JSObject *found,
>                                  PropertyCacheEntry *entry)
>  {
>      jsbytecode *pc;
>      cx->stack.currentScript(&pc);

currentScript() will return the JSScript*, so I don't think the script parameter is necessary for this function.
Comment 25 David Mandelin [:dmandelin] 2012-06-28 16:22:34 PDT
(In reply to Brian Hackett (:bhackett) from comment #24)
> currentScript() will return the JSScript*, so I don't think the script
> parameter is necessary for this function.

Thanks, I made that change.

http://hg.mozilla.org/integration/mozilla-inbound/rev/a43b6c4f3006

Sheriffs can still leave this open--it's intended to make this crash signature go away, so we'll want to verify that in a few days.
Comment 26 Ed Morley [:emorley] 2012-06-29 00:46:03 PDT
https://hg.mozilla.org/mozilla-central/rev/a43b6c4f3006
Comment 27 Paul Silaghi, QA [:pauly] 2012-07-02 05:32:03 PDT
Removing qa wanted as there isn't much that really needs testing for now. Please add again if required.
Comment 28 David Mandelin [:dmandelin] 2012-07-02 10:38:00 PDT
This is showing a pretty low frequency on branches, so for now, I recommend against fixing it on branches.

Note You need to log in before you can comment on or make changes to this bug.