Closed Bug 770238 Opened 8 years ago Closed 8 years ago

Fatal jemalloc assertion in js::gc::Arena::finalize<JSString>

Categories

(Core :: JavaScript Engine, defect)

15 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla20
Tracking Status
firefox17 + wontfix
firefox18 + verified
firefox19 --- verified
firefox20 --- verified
firefox-esr17 18+ verified

People

(Reporter: justin.lebar+bug, Assigned: sfink)

References

(Blocks 1 open bug)

Details

(5 keywords, Whiteboard: [js:p1:fx20])

Crash Data

Attachments

(1 file)

[@ arena_dalloc_small | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int) ]

e.g. https://crash-stats.mozilla.com/report/index/965f569b-3561-4842-8b63-b79962120629

We're dying with EXCEPTION_BREAKPOINT, which means we're hitting a fatal jemalloc assertion.  If we knew which assertion we were hitting, that might explain what's going wrong here.

It's possible that one of our other heap corruption bugs is causing this.
Keywords: crash
It's currently #31 in 15.0b6 and #25 in 16.0b1.
15.0 is unaffected.

It first appeared in 16.0a1/20120615 (discontinuously across builds) and 15.0a2/20120625. The regression ranges are:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3f408698a03f&tochange=da8c6039c25e (might be)
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=e967f0934767&tochange=07c03ce3989c
It's likely a regression from bug 764192.

It's correlated to Yandex Bar in 16.0:
     72% (44/61) vs.   1% (243/16616) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)

Some comments in Russian say (translated by Google):
"When closing multiple tabs, and exit the program crash"
"Runninf Foldinf@Home in the background with CPU and GPU functions."
Assignee: nobody → general
Severity: normal → critical
Component: General → JavaScript Engine
No longer depends on: jemalloc-assertions
Keywords: regression
OS: Windows XP → Windows 7
Version: Trunk → 15 Branch
bsmedberg, we could use some disassembly love here.

I'm thinking we should also modify these assertions so they add a stack frame when they fail, which should let us figure out where they're being called from without asking someone to disassemble.
> I'm thinking we should also modify these assertions so they add a stack frame when they 
> fail, which should let us figure out where they're being called from without asking 
> someone to disassemble.

I filed bug 787675 on this.
A very similar signature in 15.0 appears to be [@ je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int) ] which is also filed as bug 698296.

I examined https://crash-stats.mozilla.com/report/index/f1cd1733-e55a-44d4-85df-3ce5b2120904

and we appear to be hitting this assert: http://hg.mozilla.org/releases/mozilla-beta/annotate/c4e1ca772267/memory/mozjemalloc/jemalloc.c#l3301
Crash Signature: [@ arena_dalloc_small | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int) ] → [@ arena_dalloc_small | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int) ] [@ je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int) ]
> RELEASE_ASSERT(diff == regind * size); 

Hm...that's a strange assertion to be hitting.

Here's what's happening: We've called free(p).  jemalloc has determined that p lives in the "run" located at r.  The run has a header of length h, and then is followed by a bunch of slots for allocations of size |size|.  First, we assign

  diff = p - r - h

that is, |diff| is how many bytes into the slots our pointer |p| is.

Then we want to calculate the index of diff in the slots -- that is, we want |diff / size|.  jemalloc has a bunch of different ways of doing this, in an attempt to do an actual division only rarely.  We let regind = diff / size (however it's calculated).

Then we assert that diff = regind * size, and that's apparently failing.

So what's going wrong here?  My guess is that we're somehow getting a |size| or
|diff| that's different from what we expect, and thus the calculation fails.
For example, we do

  diff = ptr - run - bin->reg0_offset; // reg0_offset is what we called |h| above.

so if bin->reg0_offset is corrupted, maybe |diff| takes on a bogus value.

Or, alternatively, |size| is coming from the bin as well (in
arena_dalloc_small, we do size = bin->reg_size) -- maybe |size| is bogus.

This is pretty weird, though...
Whiteboard: [js:p1:fx18]
I just checked, and it turns out this algorithm will completely fall over for the vast majority of inputs.

In normal operation, we only give it a limited few inputs, and then it's fine.  But if the heap gets corrupted and size or diff is wrong, then it's going to fall over, hard.
(In reply to Justin Lebar [:jlebar] from comment #5)
> Then we assert that diff = regind * size, and that's apparently failing.

This could also mean that p points into the middle of a slot, right? That is, we found the distance into the slots; divided by size and tossed the remainder; and then multiplied back and didn't get the original value. If tossing the remainder lost information, then that would also trigger the assert.
> This could also mean that p points into the middle of a slot, right?

Indeed!  Looks like this might be the first time we'd notice if we were freeing an interior pointer to a non-huge allocation (i.e. smaller than 1mb).

Good call.
Ended up here reviewing a user's crashes on 16.0 in SUMO[1].
Linking in case you need to gather more data from the system with the crash.

bp-dfe1a086-f449-46f0-b493-2af092120926

1| https://support.mozilla.org/en-US/questions/938141
(In reply to alex_mayorga from comment #9)
> Ended up here reviewing a user's crashes on 16.0 in SUMO[1].
I suspect his RAM is corrupted because crash signatures are various.
It's #10 top browser crasher in 17.0b6 and is also correlated to Yandex Bar (any versions this time) like in bug 698296 for 18.0a2:
  arena_dalloc_small | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int)|EXCEPTION_BREAKPOINT (98 crashes)
     71% (70/98) vs.   2% (414/18719) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
          2% (2/98) vs.   1% (166/18719) 6.9.1
          1% (1/98) vs.   0% (20/18719) 7.1.1
          0% (0/98) vs.   0% (5/18719) 7.1.1.1
          4% (4/98) vs.   0% (17/18719) 7.2
         14% (14/98) vs.   0% (69/18719) 7.2.1
          3% (3/98) vs.   0% (9/18719) 7.2.2
         47% (46/98) vs.   1% (122/18719) 7.2.3
     65% (64/98) vs.   1% (251/18719) vb@yandex.ru
          1% (1/98) vs.   0% (28/18719) 1.1
         47% (46/98) vs.   1% (118/18719) 1.3
          3% (3/98) vs.   0% (16/18719) 2.0
         11% (11/98) vs.   0% (78/18719) 2.0.1
          3% (3/98) vs.   0% (7/18719) 2.0.2
Keywords: topcrash
Do we have someone at Yandex whom we can inform that they probably have a malloc error?
Flags: needinfo?(jorge)
(In reply to Justin Lebar [:jlebar] from comment #12)
> Do we have someone at Yandex whom we can inform that they probably have a
> malloc error?

I've reached out to the Yandex Toolbar team. We should try to reproduce internally as well. Including QA to help delegate that.
Flags: needinfo?(jorge)
QA Contact: anthony.s.hughes
Marcia, can you please look into this ASAP?
QA Contact: anthony.s.hughes → mozillamarcia.knous
I am investigating - I found that I had to get the crashing version in the correlations (7.2.3) directly from the Yandex site and not from addons. So far I haven't generated a crash, but still working.

(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #14)
> Marcia, can you please look into this ASAP?
This now is the #5 crash on 17.0b6, I fear it will be just as high and bad on release, if not even worse, as the release population is more likely to have this toolbar installed.
We haven't been able to reproduce, so we're still trying to reach out to the Yandex Toolbar team and looping in the JS team to see if there's anything we can investigate in the meantime.
Just as another point of information, this is the #1 crash in early 17.0 release data, and we see the following add-on correlations there:

  arena_dalloc_small | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int)|EXCEPTION_BREAKPOINT (195 crashes)
     84% (164/195) vs.   3% (481/18601) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
     83% (161/195) vs.   3% (466/18601) vb@yandex.ru
     27% (52/195) vs.   2% (404/18601) {37964A3C-4EE8-47b1-8321-34DE2C39BA4D}
     12% (24/195) vs.   1% (122/18601) {B100D0FF-0001-8CE4-2790-AACE49B8AE35}
      9% (18/195) vs.   2% (338/18601) jqs@sun.com (Java Quick Starter, http://java.sun.com/javase/downloads/)
     12% (24/195) vs.   5% (1022/18601) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
     11% (22/195) vs.   5% (976/18601) wrc@avast.com


So this is still mainly correlated with Yandex.Bar, even if I suspect that it's not only happening with that one.
Possibly interesting comments (with Google translation):

При закрытии мозилы выскакивает предупреждение о аварийном завершении работы.
(When you close the pop up warning mozily crash.)

яндекс бар не работает?
(Yandex bar does not work?)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #18)
>      27% (52/195) vs.   2% (404/18601) {37964A3C-4EE8-47b1-8321-34DE2C39BA4D}
Google search seems to identify this as the Mail.ru toolbar.

>      12% (24/195) vs.   1% (122/18601) {B100D0FF-0001-8CE4-2790-AACE49B8AE35}
Google search seems to identify this as magicscanner.dll which is provided by AlterGeo browser helper object. Apparently this is installed by the Mail.ru Agent software (not an add-on).
I've installed the following software:
 * Avast Internet Security 7 Trial
 * Mail.ru Agent 6.0 build 6005

I've installed the following add-ons:
 * Adblock Plus 2.2 from AMO
 * Mail.ru Toolbar
 * Java Console 6.0.37
 * Yandex Toolbar 5.2.3 from AMO
 * Yandex.ru Toolbar 10.13.40.15 from their website

The current addons in about:support:
Adblock Plus 2.2 - true - {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}
avast! WebRep 7.0.1474 - true - wrc@avast.com
Feedback 1.2.2 - true - testpilot@labs.mozilla.com
Java Console6.0.37 - true - {CAFEEFAC-0016-0000-0037-ABCDEFFEDCBA}
www.mail.ru 10.13.40.15 - true - {d2ec0c9c-5e2d-411f-981e-a5b89401fcea}
Yandex 10.13.40.15 - true - {b71b6dca-c7e8-4f51-8c63-f07249dc8f2a}
Спутник @Mail.Ru 2.5.3.12 - true - {37964A3C-4EE8-47b1-8321-34DE2C39BA4D}
Яндекс.Бар 5.2.3 - true - yasearch@yandex.ru

I have yet to witness a crash through basic dogfooding with all of this installed. Can someone please advise me on scenarios to test which might hit this?
Firefox 17 (https://download.mozilla.org/?product=firefox-17.0&os=win&lang=en-US), Windows 7, new profile.

Steps to reproduce:
1. Install XPI from https://dl.dropbox.com/u/19707478/YandexBar723-with-vb13.xpi
2. Turn off all extensions except Yandex.Elements.
3. Turn off Yandex.Elements widgets, plugins and features except "SmartBar" plugin.
4. Restart browser.
5. Type "moz" in the address bar. Some of "mozilla.org" links will be shown with "favorite" icon, like "http://mozilla.com/en-US/about".
6. Close browser.

Inside "SmartBar" feature we are loading some images by custom "xb://" protocol. For example,

.searchSettings image {
    margin-top: 2px;
    list-style-image: url("/icons/gear.png");
    width: 16px;
    height: 16px;
}

will be loaded like "xb://55c30542-68b6-d041-b468-f63b3316745f/icons/gear.png" or something like that (guid part in this url is different between browser sessions).

There is no crash if this image is loaded by "chrome://" protocol ("chrome://global/skin/arrow/panelarrow-horizontal.png", for example).

We are going to change urls in "SmartBar" to 'list-style-image: url("data:image/png;base64,...")' and will update Yandex.Elements to 7.2.5 as soon as possible.

We hope that this information helps to find the causes of the crash and that in future browser versions it will be possible to use custom protocols to load images.
(In reply to Danil Ivanov from comment #22)
> We hope that this information helps to find the causes of the crash and that
> in future browser versions it will be possible to use custom protocols to
> load images.

David - can you help us investigate what may have changed in Firefox 17 to cause this? If Yandex.Elements shouldn't have been using xb://, please do make a recommendation for their future development. Thanks!
Assignee: general → dvander
Danil, have you guys pushed out a fix for this already? Right now, we still see rising amounts of cases of this crash.
Yes, https://ye-update.yandex.ru/update-fx.xml
This may be a similar case from another place of add-on?
Crash Signature: [@ arena_dalloc_small | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int) ] [@ je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int) ] → [@ arena_dalloc_small | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int) ] [@ je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind unsigned int) ] [@ moz_abort | je_free | js::gc::Arena::fin…
> We hope that this information helps to find the causes of the crash and that in future 
> browser versions it will be possible to use custom protocols to load images.

It's also possible that the implementation of the custom xb protocol is buggy, and that's what is causing the crash.  Without source code for a small dummy extension which reproduces this crash, it's hard to say.
(In reply to Danil Ivanov from comment #26)
> This may be a similar case from another place of add-on?
Likely. Indeed, in 18.0b1 it's still correlated to the latest version of Yandex Bar:
     88% (166/188) vs.   3% (313/9876) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
          2% (4/188) vs.   0% (4/9876) 6.6
          1% (1/188) vs.   0% (1/9876) 6.9
          5% (10/188) vs.   0% (14/9876) 6.9.1
          2% (3/188) vs.   0% (4/9876) 7.2.1
          2% (3/188) vs.   0% (3/9876) 7.2.3
          1% (1/188) vs.   0% (1/9876) 7.2.5
         77% (144/188) vs.   3% (284/9876) 7.2.6
     74% (139/188) vs.   3% (283/9876) vb@yandex.ru
         34% (64/188) vs.   2% (157/9876) 1.3
         19% (35/188) vs.   1% (62/9876) 1.4
          2% (3/188) vs.   0% (7/9876) 2.0
         10% (19/188) vs.   0% (27/9876) 2.0.1
         10% (18/188) vs.   0% (25/9876) 2.0.3
(In reply to Justin Lebar [:jlebar] from comment #27)
> > We hope that this information helps to find the causes of the crash and that in future 
> > browser versions it will be possible to use custom protocols to load images.
> 
> It's also possible that the implementation of the custom xb protocol is
> buggy, and that's what is causing the crash.  Without source code for a
> small dummy extension which reproduces this crash, it's hard to say.

Justin, as I just checked by looking into the XPI linked in comment #22, this add-on seems to have no binary components, all code is JS, so it's very much possible that the actual problem is in our code and they managed to just trigger this problem. We should investigate this. That said, it looks like their code is huge, so trying to find a minimized testcase would be nice.

Danil, any chance someone from your side can help in finding a minimized testcase (i.e. the exact piece of code, without all the other code around it) that triggers the problem?
Unfortunately, it looks like there was only a one-day dip and in fact, over multiple days, there is no downward trend here. I fear the new version of Yandex Bar didn't actually improve this at all. :(
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #29)
> Danil, any chance someone from your side can help in finding a minimized
> testcase (i.e. the exact piece of code, without all the other code around
> it) that triggers the problem?

OK, I'll try create minimized extension.
Crash Signature: unsigned int) ] [@ moz_abort | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int)] → unsigned int) ] [@ moz_abort | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int)] [@ isalloc_validate | ozone_size]
OS: Windows 7 → All
Hardware: x86 → All
Assignee: dvander → sphink
Whiteboard: [js:p1:fx18] → [js:p1:fx20]
Testing this on 64-bit Linux, I see somewhat different behavior:

 - my first attempt to reproduce failed. I am now speculating that it's because you have to mouse over the links in the dropdown after typing "moz", but I'm not sure.
 - in an opt build, I get a shutdown hang instead of a crash. The hang is on a mutex inside of a call to realloc.
 - in a debug build, I get this assertion: ((trc)->callback == __null || (trc)->callback == GCMarker::GrayCallback)

Finally, in my earlier attempts to reproduce with a slightly different debug build, I saw these messages:

###!!! ASSERTION: Should have inner window here!: '!win || win->IsInnerWindow
()', file ../../../../content/base/src/nsDocument.cpp, line 3902
###!!! ASSERTION: Should have inner window here!: 'win->IsInnerWindow()', fil
e ../../../../content/base/src/nsDocument.cpp, line 3886
WARNING: Wrong inner/outer window combination!: file ../../../../content/base/s
rc/nsDocument.cpp, line 3889
Assertion failure: probe == scope, at ../../../../js/xpconnect/wrappers/WrapperFactory.cpp:221

I will catch this in a debugger next. Thanks for the nice steps to reproduce.
(In reply to Steve Fink [:sfink] from comment #32)
> Testing this on 64-bit Linux, I see somewhat different behavior:

The crashes with the js::gc::Arena::finalize signature are all on Windows, from what I saw in a quick glance at crash-stats.

>  - in a debug build, I get this assertion: ((trc)->callback == __null ||
> (trc)->callback == GCMarker::GrayCallback)

That at least already sounds like GC, and our crashes are in the GC. :)

I'm looking forward to seeing what comes out of all this, good to see that we can reproduce at least the fact that there are problems occurring.
The problem is that XBLFinalize is getting invoked during a GC, and it ends up invoking JS code because of the script-implemented protocol handler. This patch defers the release.
Attachment #688985 - Flags: review?(bugs)
Comment on attachment 688985 [details] [diff] [review]
Defer releasing an nsXBLDocumentInfo to avoid calling JS during finalization

Use C++ casting
static_cast<nsIScriptGlobalObjectOwner*>(docInfo)
Attachment #688985 - Flags: review?(bugs) → review+
Danil, FYI, some other problems that were reported when running under a debug build:

Lots of errors like this:

XML Parsing Error: not well-formed
Location: 
Line Number 1, Column 51:
<span xmlns="http://www.w3.org/1999/xhtml">illa&ie=utf-8&oe=utf-8&aq=t&rls=org.</
--------------------------------------------------^

which looks like an escaping problem (it thinks "&ie" is the beginning of an entity but doesn't see a terminating semicolon).

It complained about lots of leaked URLs on shutdown. Perhaps these were all seen by the custom protocol handler?

Leaked URLs:
  http://bar.yandex.ru/presets/default.xml
  file:///home/sfink/.mozilla/firefox/ofbi54sk.YandexCrash/yasearch-xb/presets/http%253A%252F%252Fbar.yandex.ru%252Fpresets%252Fdefault.xml
  file:///home/sfink/.mozilla/firefox/ofbi54sk.YandexCrash/yasearch-xb/presets/http%253A%252F%252Fbar.yandex.ru%252Fpresets%252Fdefault.xml
  file:///home/sfink/.mozilla/firefox/ofbi54sk.YandexCrash/yasearch-xb/presets/http%253A%252F%252Fbar.yandex.ru%252Fpresets%252Fdefault.xml
  file:///home/sfink/.mozilla/firefox/ofbi54sk.YandexCrash/yasearch-xb/presets/http%253A%252F%252Fbar.yandex.ru%252Fpresets%252Fdefault.xml
  resource://gre-resources/ua.css
  resource://gre-resources/html.css
  chrome://global/content/xul.css
  resource://gre-resources/quirk.css
  resource://gre-resources/full-screen-override.css
  chrome://global/skin/scrollbars.css
  resource://gre-resources/forms.css
  chrome://global/content/bindings/scrollbar.xml#scrollbar
  chrome://global/skin/scrollbar/slider.gif
  chrome://global/skin/arrow/arrow-lft.gif
  chrome://global/content/bindings/scrollbar.xml#thumb
  chrome://global/skin/arrow/arrow-rit.gif
  chrome://global/skin/arrow/arrow-up.gif
  chrome://global/skin/arrow/arrow-dn.gif
  chrome://global/content/bindings/scrollbar.xml#scrollbar-base
  chrome://global/skin/arrow/arrow-rit-dis.gif
  chrome://global/skin/arrow/arrow-lft-dis.gif
  chrome://global/skin/arrow/arrow-dn-dis.gif
  chrome://global/skin/arrow/arrow-up-dis.gif
  chrome://global/content/bindings/general.xml#root-element
  chrome://global/content/bindings/popup.xml#tooltip
  chrome://global/content/bindings/stringbundle.xml#stringbundleset
  chrome://global/content/bindings/general.xml#deck
  chrome://global/content/bindings/stringbundle.xml#stringbundle
  chrome://global/content/bindings/popup.xml#popup
  chrome://global/content/bindings/popup.xml#arrowpanel
  chrome://global/content/bindings/popup.xml#panel
  chrome://global/content/bindings/toolbar.xml#toolbox
  chrome://global/content/bindings/toolbar.xml#toolbar
  chrome://global/content/bindings/toolbar.xml#menubar
  chrome://global/content/bindings/menu.xml#menu-menubar
  chrome://global/content/bindings/text.xml#text-label
  chrome://global/content/bindings/toolbarbutton.xml#toolbarbutton
  chrome://global/content/bindings/general.xml#image
  chrome://global/content/bindings/autocomplete.xml#history-dropmarker
  chrome://global/content/bindings/textbox.xml#input-box
  chrome://global/content/platformHTMLBindings.xml#inputFields
  chrome://global/content/bindings/general.xml#dropmarker
  chrome://global/content/bindings/button.xml#menu
  chrome://global/content/bindings/scrollbox.xml#scrollbox
  chrome://global/content/bindings/toolbarbutton.xml#menu
  chrome://global/content/bindings/splitter.xml#splitter
  chrome://global/content/bindings/tabbox.xml#tabpanels
  chrome://global/content/bindings/notification.xml#notificationbox
  chrome://global/content/bindings/browser.xml#browser
  chrome://global/content/bindings/general.xml#statusbar
  chrome://global/content/bindings/general.xml#statusbarpanel
  chrome://global/content/bindings/resizer.xml#resizer
  chrome://global/content/bindings/menu.xml#menuseparator
  chrome://global/content/bindings/toolbar.xml#toolbardecoration
  chrome://global/content/bindings/menu.xml#menuitem-iconic
  chrome://global/content/bindings/menu.xml#menu
  chrome://global/content/bindings/autocomplete.xml#autocomplete-richlistbox
  resource://gre-resources/loading-image.png
  resource://gre-resources/broken-image.png
  resource://gre-resources/loading-image.png
  resource://gre-resources/broken-image.png
  chrome://global/content/bindings/autocomplete.xml#autocomplete-tree
  https://www.google.com/complete/search?client=firefox&q=mo
  chrome://global/content/bindings/text.xml#text-base
  https://www.google.com/complete/search?client=firefox&q=moz
  chrome://mozapps/skin/places/defaultFavicon.png
Can we land on central in preparation for uplifting to beta before Tuesday for beta 4?
Attachment #688985 - Flags: checkin+
I think affects every version going way back. I'm not sure what exactly is needed to trigger it, but it's the scripted custom protocol handler that's exposing the problem. I don't know if there's a way to implement one without triggering this problem on shutdown.
Flags: needinfo?(sphink)
Comment on attachment 688985 [details] [diff] [review]
Defer releasing an nsXBLDocumentInfo to avoid calling JS during finalization

[Approval Request Comment]
Bug caused by (feature/regressing bug #): longstanding bug
User impact if declined: a scripted custom protocol handler may trigger a crash on shutdown, and possibly other times
Testing completed (on m-c, etc.): on m-i for a few hours now
Risk to taking this patch (and alternatives if risky): minimal as far as I can tell, though I'm not an expert on this xpc::DeferredRelease stuff. (But I believe my reviewer is.)
String or UUID changes made by this patch: none
Attachment #688985 - Flags: approval-mozilla-beta?
Attachment #688985 - Flags: approval-mozilla-aurora?
Flags: needinfo?(sphink)
It's #2 top browser crasher in 17.0.1esr.
https://hg.mozilla.org/mozilla-central/rev/c98fd0f8f3f3
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
(In reply to Steve Fink [:sfink] from comment #34)
> Created attachment 688985 [details] [diff] [review]
> Defer releasing an nsXBLDocumentInfo to avoid calling JS during finalization
> 
> The problem is that XBLFinalize is getting invoked during a GC, and it ends
> up invoking JS code because of the script-implemented protocol handler. This
> patch defers the release.

To validate my understanding of the jemalloc assertions: Was this a double-free, use-after-free, or something else?
Comment on attachment 688985 [details] [diff] [review]
Defer releasing an nsXBLDocumentInfo to avoid calling JS during finalization

Oh, crud, that's right. This is in the esr too.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: topcrash
User impact if declined: crashes
Fix Landed on Version: 20, so far
Risk to taking this patch (and alternatives if risky): low, the fix is pretty straightforward
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #688985 - Flags: approval-mozilla-esr17?
(In reply to Justin Lebar [:jlebar] from comment #43)
> (In reply to Steve Fink [:sfink] from comment #34)
> > Created attachment 688985 [details] [diff] [review]
> > Defer releasing an nsXBLDocumentInfo to avoid calling JS during finalization
> > 
> > The problem is that XBLFinalize is getting invoked during a GC, and it ends
> > up invoking JS code because of the script-implemented protocol handler. This
> > patch defers the release.
> 
> To validate my understanding of the jemalloc assertions: Was this a
> double-free, use-after-free, or something else?

Sadly, I don't know. The stack pointed to the core problem so I didn't investigate the final symptom that much. But it probably involved a use-after-free, at least. The debug stack is basically: 

  assert
  garbage collector (for UnmarkGrey)
  destructors
  sweep phase of garbage collector
  CC shutdown
  XPCOM shutdown

So the corruption was probably caused by the GC-within-sweep, where it was using (marking) freed arenas or something.

Note that I touched the UnmarkGray call involved in bug 730208, but that was just a syntactic change; it was originally added in bug 614347. (Though I did add additional UnmarkGray calls, so it's possible one of those triggered the relese build problem. That was in Fx15, though.)
Attachment #688985 - Flags: approval-mozilla-esr17?
Attachment #688985 - Flags: approval-mozilla-esr17+
Attachment #688985 - Flags: approval-mozilla-beta?
Attachment #688985 - Flags: approval-mozilla-beta+
Attachment #688985 - Flags: approval-mozilla-aurora?
Attachment #688985 - Flags: approval-mozilla-aurora+
Verified on Windows 7, Windows XP and Mac OS X 10.7.5 that Firefox 18 beta 4 does not crash when using the STR from Comment 21.

But when checking the Socorro crash reports I  found some crashes on Firefox 18 beta 4 for one of the signatures that was added to the Crash Signature field.

The crashes  are related with the signature [@ moz_abort | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int)]:
http://bit.ly/TgYo0E. 

Is this expected in any way? Are the crashes related with this bug or should I file a new one?
(In reply to Simona B [QA] from comment #48)
> Verified on Windows 7, Windows XP and Mac OS X 10.7.5 that Firefox 18 beta 4
> does not crash when using the STR from Comment 21.

Nice, so *something* was fixed here, which is a good step for sure. After comment #47, I wondered if even that is the case.

> But when checking the Socorro crash reports I  found some crashes on Firefox
> 18 beta 4 for one of the signatures that was added to the Crash Signature
> field.

This is probably not the only way to trigger a crash in js::gc::Arena::finalize, which I guess might be expected.
We also have bug 698296 and bug 702531 open on js::gc::Arena::finalize crashes, so we can track remaining crashes there.

The important thing with this fix will be to see if the volume of those signatures drops significantly and using the Yandex Bar is becoming more stable.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #49)
> The important thing with this fix will be to see if the volume of those
> signatures drops significantly and using the Yandex Bar is becoming more
> stable.

There's definitely more problems out there that result in js::gc::Arena::finalize crashes, but let's call this one verified as "moz_abort | je_free | js::gc::Arena::finalize<JSString>(js::FreeOp*, js::gc::AllocKind, unsigned int)" is #35 topcrash in 18.0b4 (with this patch) while it's #3 in 18.0b3. I only verified in 18, but let's call it that way for the other branches as well.

Further work on js::gc::Arena::finalize problems should go into different bugs.
Blocks: 698296
Verified on Windows 7, Windows XP and Mac OS X 10.7.5 that Firefox 17.0.2 ESR does not crash when using the STR from Comment 21.

Checked also the Socorro reports and I could not find any crashes on Firefox 17.0.2 ESR for the signatures that are listed in this bug.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0
Build ID: 20130107124423
You need to log in before you can comment on or make changes to this bug.