Closed Bug 856796 Opened 7 years ago Closed 6 years ago

crash in JSC::Yarr::Interpreter when selecting 'stop script'

Categories

(Core :: JavaScript Engine, defect, critical)

21 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla31
Tracking Status
firefox20 --- unaffected
firefox21 --- wontfix
firefox22 --- wontfix
firefox28 --- wontfix
firefox29 + verified
firefox30 + verified
firefox31 --- verified
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: TimAbraldes, Assigned: sstangl)

Details

(4 keywords)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-33cf0e93-e795-4e12-a178-6bb2e2130401 .
============================================================= 

See also:
report bp-ce0c7903-0122-4994-a330-d27bb2130328

Enabling Zimbra zimlets and using the Ajax interface can cause terrible performance.  In my case, I had the "Mozilla Bugz" zimlet enabled, and as a result the Zimbra Ajax interface was unusable in Aurora, Chrome, and Internet Explorer.

In Aurora, I would get the "unresponsive script" dialog.  I would sometimes allow the script operation to continue, but would always eventually have to 'stop script'.  Clicking 'stop script' sometimes caused Firefox to crash.
This is #12 in early 26.0 data and has the following correlation in Firefox 26 on 2013-12-10:
    97% (132/136) vs.   5% (1930/42359) wrc@avast.com
Keywords: topcrash-win
In yesterday's 26 data, this is still correlated heavily to Avast:
95% (473/497) vs.   6% (4119/68889) wrc@avast.com
Oh, and note that next to that add-on correlation in comment #3, this also has high correlation to two libraries, at least the first of those is Avast as well, AFAIK:

94% (466/497) vs.  13% (9250/68889) aswJsFlt.dll
95% (473/497) vs.  16% (10957/68889) snxhk.dll
Looking at bp-609bf3f6-cda2-434c-9640-427b22131217

This doesn't look like an OOM crash to me:

dereferencing null here:
http://hg.mozilla.org/releases/mozilla-release/annotate/39faf812aaec/js/src/yarr/YarrInterpreter.cpp#l84
which means that at some point in here `backTrack` became null:
http://hg.mozilla.org/releases/mozilla-release/annotate/39faf812aaec/js/src/yarr/YarrInterpreter.cpp#l949

Across a selection of crashes I loaded, we have plenty of available VM and physical memory as well as a low system memory use %. So this is probably caused by avast, not OOM unlike the other Yarr bug.
OK, this is actually #9 on 26 no, but also #8 on 25.0.1, so not a 26 regression.

That said, we should both get into contact with Avast over it and have our JS people investigate if this is a Yarr bug we can get fixed.
Here's an extract of correlations for 2014-01-07 data of Firefox 26.0:
 JSC::Yarr::Interpreter<wchar_t>::popParenthesesDisjunctionContext(JSC::Yarr::Interpreter<wchar_t>::BackTrackInfoParentheses*)|EXCEPTION_ACCESS_VIOLATION_READ (878 crashes)
     96% (841/878) vs.  14% (15562/115242) aswJsFlt.dll
          0% (0/878) vs.   0% (1/115242) 
          0% (0/878) vs.   0% (1/115242) 6.0.1270.0
          0% (0/878) vs.   0% (59/115242) 6.0.1289.0
          0% (0/878) vs.   0% (227/115242) 6.0.1367.0
          0% (0/878) vs.   0% (1/115242) 7.0.1396.0
          0% (0/878) vs.   0% (1/115242) 7.0.1403.0
          0% (0/878) vs.   0% (33/115242) 7.0.1407.0
          0% (0/878) vs.   0% (1/115242) 7.0.1414.0
          0% (0/878) vs.   0% (274/115242) 7.0.1426.0
          0% (0/878) vs.   0% (3/115242) 7.0.1451.402
          0% (0/878) vs.   0% (9/115242) 7.0.1455.186
          0% (0/878) vs.   0% (167/115242) 7.0.1456.418
          0% (0/878) vs.   0% (1/115242) 7.0.1464.504
          0% (0/878) vs.   0% (307/115242) 7.0.1466.549
          0% (0/878) vs.   0% (37/115242) 7.0.1473.755
          0% (0/878) vs.   1% (979/115242) 7.0.1474.765
          0% (0/878) vs.   0% (5/115242) 8.0.1479.32
          0% (0/878) vs.   0% (1/115242) 8.0.1480.37
          0% (0/878) vs.   0% (1/115242) 8.0.1481.43
          0% (0/878) vs.   0% (113/115242) 8.0.1482.45
          0% (1/878) vs.   1% (670/115242) 8.0.1483.72
          0% (0/878) vs.   0% (80/115242) 8.0.1488.286
          0% (0/878) vs.   2% (1953/115242) 8.0.1489.300
          0% (0/878) vs.   0% (30/115242) 8.0.1490.363
          0% (0/878) vs.   0% (1/115242) 8.0.1496.353
          0% (0/878) vs.   1% (1264/115242) 8.0.1497.376
          0% (0/878) vs.   0% (7/115242) 8.0.1498.384
          0% (0/878) vs.   0% (2/115242) 8.0.1499.385
          0% (1/878) vs.   0% (31/115242) 8.0.1500.387
          0% (4/878) vs.   0% (316/115242) 8.0.1501.391
          0% (0/878) vs.   0% (152/115242) 8.0.1504.397
          0% (0/878) vs.   0% (2/115242) 8.0.1600.393
          0% (0/878) vs.   0% (4/115242) 9.0.2000.50
          0% (0/878) vs.   0% (2/115242) 9.0.2001.87
          0% (0/878) vs.   0% (7/115242) 9.0.2003.120
          0% (0/878) vs.   0% (5/115242) 9.0.2005.141
          7% (60/878) vs.   1% (881/115242) 9.0.2006.159
          3% (26/878) vs.   0% (429/115242) 9.0.2007.172
         30% (264/878) vs.   2% (2707/115242) 9.0.2008.177
          0% (0/878) vs.   0% (9/115242) 9.0.2010.255
         55% (485/878) vs.   4% (4789/115242) 9.0.2011.263

This means that mostly the newest 9.0 versions are affected. (Unfortunately, I don't get versions for add-on correlations.)
At least it looks like volume of this crash has gone from ~30 crashes per million ADI to ~200 crashes / M ADI with the switch from 28 to 29 on the Beta channel, so it sounds like there is some kind of regression there after all.
Adding till and sstangl, who did the last Yarr update, not sure who else to CC on Yarr issues.
Given the correlation with avast, and the fact that I can't see any way for this pointer becoming NULL, I'm going with bsmedberg's theory from comment 5.

The increase in 29 might indicate that, while the issue is caused by avast, some changes in 29 aggravate it. I don't think that that gives us any additional ways to act on it, though.
The Avast correlation persists in 29:

Modules:
99% (358/363) vs.  16% (6628/41533) aswJsFlt.dll
99% (360/363) vs.  18% (7306/41533) snxhk.dll

Add-ons:
99% (359/363) vs.   5% (2233/41533) wrc@avast.com


Should we blocklist something there?
Flags: needinfo?(sstangl)
So, even with some additional information, I still can't reproduce the crash. This is my best guess so far:

The "backTrack" data structure in our case is a BackTrackInfoParentheses, which is a struct with the same memory layout as a BackTrackInfoBackReference. These structures share the same memory, accessed through the context, but are exclusively owned by some function that maintains them in a single mode.

The "lastContext" field of a BackTrackInfoParentheses shares the same memory location as the "matchAmount" field of a BackTrackInfoBackReference. The same Greedy mode under which the current assert is tripping also has an entry in matchBackReference() that sets backTrack->matchAmount = 0.

That's my best guess, based on what looks suspicious, but nothing I've tried so far shows any sign of going awry.

The signature for the crash appears to always be the same, however. So the attached patch just detects the issue at a probably-safe zone and returns an error rather than crashing. You may find this preferable to having a topcrash. We could also potentially run tests on users, but probably not on release, and I suspect there isn't much overlap between Nightly users and Avast users.

I'll keep looking at the bug, but I doubt that I'll make much headway without it reproducing here.
Attachment #8402910 - Flags: review?(till)
Comment on attachment 8402910 [details] [diff] [review]
Detect Crash Workaround

Review of attachment 8402910 [details] [diff] [review]:
-----------------------------------------------------------------

Nice idea, much better than just letting the crashes continue.
/me crosses fingers and hopes bhackett's investigation into importing V8's RE engine is fruitful
Attachment #8402910 - Flags: review?(till) → review+
I remember that we had some contact with Avast and they are using a really huge regex that probably makes us run into this. They shared the regex itself in a private thread we had with them, and also said "To avoid this problem, users should go to Avast add-on settings and disable
the Do No Track feature" so that points in the direction of how one could potentially reproduce.

Benjamin, does it help to (can we) share the regex with the JS devs involved here? Are you going to follow up with Avast about publicly sharing it, or should I?
Flags: needinfo?(benjamin)
Somebody appears to have put it up online: http://jsperf.com/avastregtest
Flags: needinfo?(benjamin)
It looks like Avast has fixed this. No crashes on .2016 despite a similar number of users as .2013. 

  96% (202/211) vs.   4% (1743/41994) 9.0.2013.292
  0% (0/211) vs.   4% (1708/41994) 9.0.2016.330

I had an old 9.0.2013.292 installer sitting around. With that version, the crash report URL that I tried [1] was painfully slow to load. Avast then offered to update itself to 9.0.2016.330, and the same page loaded without problems.

This issue will likely fix itself over time, but it may still be worth taking a code change for the people who are slow to update.

[1] http://games.torchbrowser.com/?utm_source=406&utm_medium=na&utm_campaign=IconDrop
(In reply to David Major [:dmajor] (UTC+12) from comment #16)
> It looks like Avast has fixed this. No crashes on .2016 despite a similar
> number of users as .2013. 

That's awesome to hear (using that huge regex probably never was a really good idea anyhow)!


That said, I agree we should get the workaround in - it's surely better to do slow execution than to crash.

Sean, can you land this and request uplift to Aurora and Beta?
Minor change from JSRegExpErrorNoMatch to JSRegExpErrorInternal, which makes no difference whatsoever. Also moved on top of resetMatches(), which is a no-op.

Carrying r+, will land
Attachment #8404925 - Flags: review+
Flags: needinfo?(sstangl)
Tree has been closed every time I tried to push, so setting checkin-needed. Please push only the "Updated Crash Workaround" patch.
Keywords: checkin-needed
Sean, could you fill the uplift request today ? (Monday) Beta 8 is going to build today. Thanks
Flags: needinfo?(sstangl)
https://hg.mozilla.org/mozilla-central/rev/b49f91ecc198
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Comment on attachment 8404925 [details] [diff] [review]
Updated Crash Workaround

Review of attachment 8404925 [details] [diff] [review]:
-----------------------------------------------------------------

Request for uplifting patch to beta and aurora.

Patch works around a topcrash in the YARR interpreter by detecting crashing conditions and reporting internal failure.
Attachment #8404925 - Flags: approval-mozilla-beta?
Attachment #8404925 - Flags: approval-mozilla-aurora?
Apologies for missing the beta window. I'm on vacation this week, but I filed the request above.

If the requests are granted, please commit on my behalf.
Flags: needinfo?(sstangl)
Comment on attachment 8404925 [details] [diff] [review]
Updated Crash Workaround

Sean, since it is a critical bug, I am going to accept it but next please, please fill the uplift request template. Especially the risk evaluation... Anyway, thanks for taking the time during your holidays!
Attachment #8404925 - Flags: approval-mozilla-beta?
Attachment #8404925 - Flags: approval-mozilla-beta+
Attachment #8404925 - Flags: approval-mozilla-aurora?
Attachment #8404925 - Flags: approval-mozilla-aurora+
I'll keep an eye on crash-stats for this over the next few days to verify.
The frequency on beta is also noticeably lower (I guess we're down to about half the volume for this signature), so the patch surely helped some.

I guess we probably only will get this fully fixed if we switch to a different engine than Yarr or fix Yarr itself, but what we have here is surely already some help.
There are only one or two crashes per day across all versions, on builds since 2014042122, so I'm verifying this as fixed.  

We can open a new bug for the remaining low level of crashes, if anyone thinks that's important to do.
Status: RESOLVED → VERIFIED
(In reply to Ioana Budnar, QA [:ioana] from comment #31)
> There are 423 crashes on the RC, which only appeared last week

This is expected and OK as long as the volume has dropped, which it has. Setting back to verified/fixed.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
I'm only setting this for 29 as I have verified lower volume (roughly half from before) there myself.
Status: RESOLVED → VERIFIED
Ioana, there were some crashes for 29, but they were on earlier builds. What I'm doing is going to the crash signature, then to the Reports tab, then sorting by build.  If there aren't any crashes, or a sharp drop in crashes, on recent builds, I would probably verify the fix.  In this case the latest crashes all happened on the 20140421221237  build which tells me that around April 21st someone checked in code that fixed most of the crashes. So your point in comment 17 makes me think that Sean's patch (merged April 16th) only fixed part of the issue, and something else that was checked in on the 21st or 22nd made that fix more effective.   

So, I'm not sure how to verify "this patch fixed things" at this point, but at least we can verify on the level that the crash isn't happening on newer builds.   I hope that makes sense!
I took another look the pattern of crashes. The crash signature is still showing up, notably with a spike on the 2014050615 build, so some other fix or code change must be affecting this. 

Kairo, what do you think?
Flags: needinfo?(kairo)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #33)
> I'm only setting this for 29 as I have verified lower volume (roughly half
> from before) there myself.

One possibility is that the patch didn't address the issue, and the declining volume just comes from users updating to a fixed version of Avast.

From module correlations, the ratio of fixed build to affected builds of Avast is about 3:1 now. In comment 16 it was about 1:1.
(In reply to Liz Henry :lizzard from comment #36)
> I took another look the pattern of crashes. The crash signature is still
> showing up, notably with a spike on the 2014050615 build, so some other fix
> or code change must be affecting this. 
> 
> Kairo, what do you think?

As mentioned previously, we know that not all of those crashes were fixed, what has been done here is just a stopgap measure for some of them, and that's probably all that can be done at this point. The work to replace Yarr completely will "fix" this completely at some point.

Also, you don't say what branch the build is from, so I can't talk to that.
Flags: needinfo?(kairo)
Verifying for 30 and 31 based on the dramatic decrease in crashes since April 16th.
You need to log in before you can comment on or make changes to this bug.