Closed Bug 1103375 Opened 5 years ago Closed 5 years ago

Firefox crashes when measuring the memory in about:memory ( [@ StatsCellCallback<int> ] ) with Yandex Elements extension installed

Categories

(Core :: JavaScript Engine, defect)

33 Branch
x86_64
Windows 8.1
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla41
Tracking Status
firefox37 --- wontfix
firefox38 + wontfix
firefox38.0.5 + fixed
firefox39 + verified
firefox40 --- verified
firefox41 --- verified
firefox-esr38 39+ verified
relnote-firefox --- 39+

People

(Reporter: zeidspex, Assigned: njn)

Details

(Keywords: crash, crashreportid, topcrash)

Crash Data

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Build ID: 20141106120505

Steps to reproduce:

After some heavy use of Firefox, I entered about:memory to check out the different memory usages.


Actual results:

Firefox crashed

Firefox 33.1 Crash Report [@ StatsCellCallback<int> ]

https://crash-stats.mozilla.com/report/index/7946894c-1f66-45d6-9e77-5b80f2141122


Expected results:

Showing the results of memory measurement.
To clarify, I clicked the "Measure" button under "Show memory reports", and then the crash happened.
Can you reproduce this, in other words, does this happen every time you use that button in about:memory?
Crash Signature: [@ StatsCellCallback<int> ]
Component: Untriaged → JavaScript Engine
Flags: needinfo?(zeidspex)
Keywords: crash, crashreportid
Product: Firefox → Core
Summary: Firefox crashes when measuring the memory in about:memory → Firefox crashes when measuring the memory in about:memory ( [@ StatsCellCallback<int> ] )
The JS memory reporting code iterates over every cell in the GC heap. If there's any kind of GC heap corruption there's a good chance it'll trip over it.
I can confirm this issue on the latest Aurora 38.0a2(buildID: 20150224004223) on Windows 7 64bit: after some heavy use of Firefox (high memory usage: ~1,4 GB RAM), I entered about:memory and click on Measure button -> crash occurs. Here is the crash report: https://crash-stats.mozilla.com/report/index/8f8eac43-c11f-43d5-a56b-d6a3e2150225
Status: UNCONFIRMED → NEW
Ever confirmed: true
[Tracking Requested - why for this release]: This signature is in the top-10 on 37.0.1. 

There is high correlation with the Yandex toolbars:
    100% (340/341) vs.   4% (2335/59423) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
     91% (309/341) vs.   4% (2477/59423) vb@yandex.ru

I'd normally send this to njn but he's away. Till are you the right person to fill in?
Flags: needinfo?(till)
> Till are you the right person to fill in?

I'm not, but maybe Terrence is. Looks like an invalid JSObject pointer.
Flags: needinfo?(till) → needinfo?(terrence)
Tracking this since it's a top crash.
Jorge, do we have contacts at yandex for their toolbar?  There might be a bug on our side, but it's also a bit strange that the yandex toolbar shows up so prominently, so it might be a problem with the addon.
Flags: needinfo?(jorge)
I think Mike Connor has good contacts there.
Flags: needinfo?(jorge) → needinfo?(mconnor)
This is consistently reproducible for me with 38 beta:
- Install Yandex Elements (successor to Yandex.Bar) from https://elements.yandex.com.tr
- Go to about:memory
- Click on Measure
- Crash
David, is this happening because of a c++ binary the toolbar is injecting, or is this a bug in our JS engine that is exposed by the addon somehow?
Flags: needinfo?(dmajor)
I haven't seen any signs of binary injection, so I have to assume the latter.
Flags: needinfo?(dmajor)
This is too late for 38 but we can take a patch for 39.
in fact, it crashes not only when measuring in about:memory, but also in random moments of time.
It is unfortunate that this bug has been sitting for so long. Although it doesn't look so prominent on our overall crash reports, it's a severe crash for a subset of users: among locale==ru this is easily the #1 crash on release, more than twice the rate of OOM|small.
(In reply to David Major [:dmajor] from comment #15)
> It is unfortunate that this bug has been sitting for so long. Although it
> doesn't look so prominent on our overall crash reports, it's a severe crash
> for a subset of users: among locale==ru this is easily the #1 crash on
> release, more than twice the rate of OOM|small.

This statement seems to validate comment 14, that this crash is not limited to the use of about:memory. If that's the case I think we need to prioritize this bug more highly. Is it expected that we need a fix from Yandex, in which case I'll follow up with mconnor, or is it that we think that we need a fix in Firefox? (Clearly a Firefox fix to prevent an add-on from causing this behaviour is a good idea either way.)

Naveed - Can you please help prioritize this bug so that we can try to address the issue in the 39 cycle?
Flags: needinfo?(nihsanullah)
> This statement seems to validate comment 14, that this crash is not limited
> to the use of about:memory. 

That's surprising. AFAIK calls to this function (StatsCellCallback()) should not be triggered outside of about:memory. Except maybe from https://dxr.mozilla.org/mozilla-central/source/toolkit/devtools/server/actors/utils/memory-bridge.js#344, but I would have thought that was obscure.
Reproduced on a fresh install of Windows 7 with FF38 as per comment 10.
@Lawrence I pinged GC team last night on this bug. Hopefully we will see quick progress since Jon can reproduce it now.
Flags: needinfo?(nihsanullah)
It seems that our memory reporters are causing JS to run while we're iterating the heap.  This happens via calling methods on an nsIURI object in GetCompartmentName() and causes an assertion failure on debug builds:

>	xul.dll!js::AssertHeapIsIdle(JSRuntime * rt) Line 195	C++
 	xul.dll!JS::AutoSaveExceptionState::AutoSaveExceptionState(JSContext * cx) Line 5295	C++
 	xul.dll!mozilla::Maybe<JS::AutoSaveExceptionState>::emplace<JSContext * &>(JSContext * & <aArgs_0>) Line 387	C++
 	xul.dll!AutoScriptEvaluate::StartEvaluating(JS::Handle<JSObject *> scope) Line 55	C++
 	xul.dll!nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS * wrapper, unsigned short methodIndex, const XPTMethodDescriptor * info_, nsXPTCMiniVariant * nativeParams) Line 954	C++
 	xul.dll!nsXPCWrappedJS::CallMethod(unsigned short methodIndex, const XPTMethodDescriptor * info, nsXPTCMiniVariant * params) Line 533	C++
 	xul.dll!PrepareAndDispatch(nsXPTCStubBase * self, unsigned int methodIndex, unsigned int * args, unsigned int * stackBytesToPop) Line 85	C++
 	xul.dll!SharedStub() Line 113	C++
 	xul.dll!xpc::CompartmentPrivate::GetLocation() Line 3701	C++
 	xul.dll!GetCompartmentName(JSCompartment * c, nsCString & name, int * anonymizeID, bool replaceSlashes) Line 1705	C++
 	xul.dll!xpc::XPCJSRuntimeStats::initExtraCompartmentStats(JSCompartment * c, JS::CompartmentStats * cstats) Line 2791	C++
 	xul.dll!StatsCompartmentCallback(JSRuntime * rt, void * data, JSCompartment * compartment) Line 345	C++
 	xul.dll!IterateCompartmentsArenasCells(JSRuntime * rt, JS::Zone * zone, void * data, void (JSRuntime *, void *, JSCompartment *) * compartmentCallback, void (JSRuntime *, void *, js::gc::Arena *, JSGCTraceKind, unsigned int) * arenaCallback, void (JSRuntime *, void *, void *, JSGCTraceKind, unsigned int) * cellCallback) Line 41	C++
 	xul.dll!js::IterateZonesCompartmentsArenasCells(JSRuntime * rt, void * data, void (JSRuntime *, void *, JS::Zone *) * zoneCallback, void (JSRuntime *, void *, JSCompartment *) * compartmentCallback, void (JSRuntime *, void *, js::gc::Arena *, JSGCTraceKind, unsigned int) * arenaCallback, void (JSRuntime *, void *, void *, JSGCTraceKind, unsigned int) * cellCallback) Line 73	C++
 	xul.dll!JS::CollectRuntimeStats(JSRuntime * rt, JS::RuntimeStats * rtStats, JS::ObjectPrivateVisitor * opv, bool anonymize) Line 728	C++
 	xul.dll!xpc::JSReporter::CollectReports(nsDataHashtable<nsUint64HashKey,nsCString> * windowPaths, nsDataHashtable<nsUint64HashKey,nsCString> * topWindowPaths, nsIMemoryReporterCallback * cb, nsISupports * closure, bool anonymize) Line 2884	C++
 	xul.dll!nsWindowMemoryReporter::CollectReports(nsIMemoryReporterCallback * aCb, nsISupports * aClosure, bool aAnonymize) Line 547	C++

I'm guessing that the toolbar changes something that causes a method to be implemented in JS.

Nick, any idea what the best way forward is?  Maybe we do the compartment name lookup after we finish iterating the heap.
I can reproduce. Comment 20 is definitely bad. I'll look into a fix tomorrow.
Flags: needinfo?(zeidspex)
Flags: needinfo?(terrence)
Flags: needinfo?(mconnor)
So there's a general problem which is that add-ons can implement certain XPCOM
interfaces via JS code, and then those interfaces get called while getting
compartment names, which causes havoc. There is already some code in the JS
reporter that is working around this fact, but the Yandex Elements add-on
exposes two more cases.

I don't have a good solution to avoid this problem in general, so for now I
suggest we paper over the two new cases. That's enough to fix the obvious
crashes obtained when clicking on "Measure" in about:memory when the add-on
is installed.

I can't promise that this will fix all the crashes -- see comment 17 -- but it
will hopefully get rid of a lot.
Summary: Firefox crashes when measuring the memory in about:memory ( [@ StatsCellCallback<int> ] ) → Firefox crashes when measuring the memory in about:memory ( [@ StatsCellCallback<int> ] ) with Yandex Elements extension installed
We can't call JS code while iterating over the JS heap in the JS memory
reporter. The Yandex Elements add-on is causing this in two cases.

- The add-on implements some nsIURI objects. This one's easy to work around,
  because GetLocation() can just skip any JS-implemented nsIURI objects.

- The add-on implements some nsIProtocolHandler objects in order to
  implement a custom "xb://" scheme. This one is harder to workaround because
  the call to the JS object's method occurs deep within NS_NewURI(), well
  beyond the JS reporter code. So we just skip "xb://" URLs.
Attachment #8604418 - Flags: review?(mrbkap)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Nils, you should take note of this -- one of the two cases was triggered from GetLocationURI(), which I think you added.
(In reply to Nicholas Nethercote [:njn] from comment #23)
> - The add-on implements some nsIProtocolHandler objects in order to
>   implement a custom "xb://" scheme. This one is harder to workaround because
>   the call to the JS object's method occurs deep within NS_NewURI(), well
>   beyond the JS reporter code. So we just skip "xb://" URLs.

Won't this affect any JS-implemented protocolhandlers, not just the Yandex ones? Would it make sense to whitelist the URI schemes we're interested in (chrome, resource, file, jar) instead? Soon we won't have any binary XPCOM components in add-ons, so presumably any add-on provided protocol handler will be implemented in JS, so it seems like we might as well be restrictive here...
> Would it make sense to whitelist the URI schemes we're interested in
> (chrome, resource, file, jar) instead?

If you're confident that those are the only ones that need whitelisting and that any others that occur will be from add-ons, then it sounds like a good idea. (Is it likely that we'll add any new builtin URI schemes in the future?)
(In reply to Nicholas Nethercote [:njn] from comment #26)
> > Would it make sense to whitelist the URI schemes we're interested in
> > (chrome, resource, file, jar) instead?
> 
> If you're confident that those are the only ones that need whitelisting and
> that any others that occur will be from add-ons, then it sounds like a good
> idea. (Is it likely that we'll add any new builtin URI schemes in the
> future?)

I mean, there *are* other builtin URI schemes, like moz-page-thumb and the favicon service's URLs, then there's "data:" and remote ones like http, https, ftp.

I don't know much about memory reporting to know what the context is here. It just struck me that it should be easier.

As best I can tell from code inspection of XPCJSRuntime.cpp, the only case where this function is used is for the source location of privileged JS. That should never be http/https/ftp or other remote things, so I am 90% sure data, file, jar, chrome and resource should cover everything. It might be wise to ask someone like bz/bholley to doublecheck that I'm not forgetting something, or add some kind of telemetry info about other protocols that we hit?
[Tracking Requested - why for this release]:
This is the #9 crash on 38 release with 1.6% of all crashes, would be awesome if we'd get a fix in before we ship 38.0.5.
> This is the #9 crash on 38 release with 1.6% of all crashes
And 12.6% of release crashes for Russian-locale users.
Comment on attachment 8604418 [details] [diff] [review]
Fix some crashes triggered from about:memory

Review of attachment 8604418 [details] [diff] [review]:
-----------------------------------------------------------------

This seems fine to me, though a whitelist (as suggested by Gijs would certainly be safer).

One other idea would be to set a bit somewhere in XPConnect when we're iterating the JS heap and to refuse (in nsXPCWrappedJSClass) to call back into JS from C++ (we'd probably want to assert in debug builds as well as throwing in optimized builds) when the bit is set. That way, other bugs like this wouldn't crash us.

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +287,5 @@
> +        // calling into the extension's own JS-implemented nsIProtocolHandler
> +        // object, which we can't allow while we're iterating over the JS heap.
> +        // So just skip any such URL.
> +        // -- GROSS HACK ALERT --
> +        if (StringBeginsWith(uristr, NS_LITERAL_CSTRING("xb"))) {

Nit: No {} for single-line if statements in XPConnect.

::: js/xpconnect/src/xpcprivate.h
@@ +3731,5 @@
> +                // We cannot call into JS-implemented nsIURI objects, because
> +                // we are iterating over the JS heap at this point.
> +                location =
> +                    NS_LITERAL_CSTRING("<JS-implemented nsIURI location>");
> +

Nit: Nuke the extra newline here.

Non-nit (above and below): location.AssignLiteral(...)
Attachment #8604418 - Flags: review?(mrbkap) → review+
I landed the patch as-is because it's good enough to fix the crashes:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9193e0e55d64

I filed bug 1166209 about a more general fix.
https://hg.mozilla.org/mozilla-central/rev/9193e0e55d64
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Can we uplift this to 38.0.5?
Flags: needinfo?(n.nethercote)
(In reply to David Major [:dmajor] from comment #34)
> Can we uplift this to 38.0.5?

Yes. Do I need to do an approval-mozilla-release request?
Flags: needinfo?(n.nethercote)
Yes please. Request all the things!
This is the version that landed. It was slightly tweaked from the earlier version.

Approval Request Comment

[Feature/regressing bug #]: Yandex Elements addon reliably causes crashes when the "Measure" button is pressed in about:memory.

[User impact if declined]: More crashes.

[Describe test coverage new/current, TreeHerder]: Landed on m-i yesterday.

[Risks and why]: Low. It's a small change that makes us take simpler code paths in two places.

[String/UUID change made/needed]: None.
Attachment #8604418 - Attachment is obsolete: true
Attachment #8608411 - Flags: review+
Attachment #8608411 - Flags: approval-mozilla-release?
Attachment #8608411 - Flags: approval-mozilla-aurora?
Attachment #8608411 - Flags: approval-mozilla-beta?
Comment on attachment 8608411 [details] [diff] [review]
Fix some crashes triggered from about:memory

Top crash #10 in the release. Taking it (in ESR38 too)
Attachment #8608411 - Flags: approval-mozilla-release?
Attachment #8608411 - Flags: approval-mozilla-release+
Attachment #8608411 - Flags: approval-mozilla-esr38+
Attachment #8608411 - Flags: approval-mozilla-beta?
Attachment #8608411 - Flags: approval-mozilla-beta+
Attachment #8608411 - Flags: approval-mozilla-aurora?
Attachment #8608411 - Flags: approval-mozilla-aurora+
Flags: qe-verify+
Reproduced on 38.0b1 with steps from comment 10:
* bp-69569266-771c-4db4-a12a-5f9d42150616
* bp-ffca3a20-49cf-4b2c-a599-c47622150616
No crash encountered with 39.0b6 (Build ID: 20150615125213) under Windows 7 64-bit and Windows 8.1 64-bit.

After this fix, there is 1 crash for 39.0b3: https://crash-stats.mozilla.com/signature/?signature=StatsCellCallback%3Cint%3E&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1 
Since there is only 1 crash in Socorro and none with latest Beta build, marking as verified.
Also verified fixed with ESR 38.1.0 (Build ID: 20150624141534), latest 40.0a2 and 41.0a1 (both from 2015-06-25), under Windows 7 64-bit and Windows 8.1 64-bit.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.